This function is used to return the cumulative distribution, which is equivalent to calculating the proportion of data in the partition that is greater than or equal to, or less than or equal to, the current row.
The restrictions on using window functions are as follows:
cume_dist() over([partition_clause] [orderby_clause])
Parameter |
Mandatory |
Description |
---|---|---|
partition_clause |
No |
Partition. Rows with the same value in partition columns are considered to be in the same window. |
orderby_clause |
No |
How data is sorted in a window |
The return value is of the DOUBLE type.
If the value of a is NULL, NULL is returned.
CREATE EXTERNAL TABLE salary ( dept STRING, -- Department name userid string, -- Employee ID sal INT -- Salary ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
Adds the following data:
d1,user1,1000 d1,user2,2000 d1,user3,3000 d2,user4,4000 d2,user5,5000
select dept, userid, sal, cume_dist() over(order by sal) as cume1 from salary; -- Result: d1 user1 1000 0.2 d1 user2 2000 0.4 d1 user3 3000 0.6 d2 user4 4000 0.8 d2 user5 5000 1.0
select dept, userid, sal, cume_dist() over (partition by dept order by sal) as cume2 from salary; -- Result: d1 user1 1000 0.3333333333333333 d1 user2 2000 0.6666666666666666 d1 user3 3000 1.0 d2 user4 4000 0.5 d2 user5 5000 1.0
select dept, userid, sal, cume_dist() over(order by sal desc) as cume3 from salary; -- Result: d2 user5 5000 0.2 d2 user4 4000 0.4 d1 user3 3000 0.6 d1 user2 2000 0.8 d1 user1 1000 1.0 select dept, userid, sal, cume_dist() over(partition by dept order by sal desc) as cume4 from salary; -- Result: d1 user3 3000 0.3333333333333333 d1 user2 2000 0.6666666666666666 d1 user1 1000 1.0 d2 user5 5000 0.5 d2 user4 4000 1.0