Currently, real-time precision marketing is required in the Internet, education, and gaming industries. User profiling enables user search based on combined criteria. Example:
These use cases have the following characteristics in common:
Roaring bitmaps in GaussDB(DWS) can efficiently generate, compress, and parse bitmap data, and supports the most common bitmap aggregation operations (AND, OR, NOT, and XOR). This feature meets the requirements of real-time precision marketing and quick user selection in the case of a large amount of data with hundreds of millions of users and tens of millions of labels.
Assume that there is a web page browsing information table userinfo. The fields in the table are as follows:
1 2 3 4 5 6 7 | CREATE TABLE userinfo (userid int, age int, gender text, salary int, hobby text )WITH (orientation=column); |
The data in the userinfo table increases with the change of user information. For example, if a user has multiple hobby attributes, there will be multiple records in the userinfo table.
If a user wants to filter out males with income greater than CNY10,000, age greater than 30, and a hobby of phishing, and then push specific messages to these target groups.
The traditional method is to directly query the original table. The statement is as follows:
1 | SELECT distinct userid FROM userinfo WHERE salary > 10000 AND age > 30 AND gender ='m' AND hobby ='fishing'; |
If the userinfo table contains a small amount of data, indexes are created in the salary, age, gender, and hobby columns to meet the query requirements. However, if the userinfo table contains a large amount of data and a large number of labels, the preceding statement cannot meet the requirements. The reasons are as follows:
Roaring bitmaps perform better in this case.
1 2 3 4 5 6 7 8 | CREATE TABLE userinfoset ( age int, gender text, salary int, hobby text, userset roaringbitmap, PRIMARY KEY(age,gender,salary,hobby) )WITH (orientation=column); |
1 2 3 4 5 | INSERT INTO userinfoset SELECT age, gender, salary, hobby, rb_build_agg(userid) FROM userinfo GROUP BY age, gender, salary, hobby; |
1 | SELECT rb_iterate(rb_or_agg(userset)) FROM userinfoset WHERE salary > 10000 AND age > 30 AND gender ='m' AND hobby ='fishing'; |
After data aggregation, the data volume of the table userinfoset is much smaller than that of the source table, so the scanning performance of the base table is much faster. In addition, based on the advantages of Roaring bitmaps, the performance of calculating rb_or_agg and rb_iterate is better. Compared with the traditional method, the performance is significantly improved.