Function | Syntax | Description |
approx_distinct | approx_distinct(x) | Returns the approximate number of distinct input values (column x). |
approx_percentile | approx_percentile(x,percentage) | Sorts the values in the x column in ascending order and returns the value approximately at the given `percentage` position. |
| approx_percentile(x,array[percentage01, percentage02...]) | Sorts the values in the x column in ascending order and returns the values approximately at the given `percentage` positions (percentage01, percentage02...). |
approx_distinct
function is used to get the approximate number of distinct input values of a field. The standard result deviation is 2.3%.approx_distinct(x)
Parameter | Description |
x | The parameter value can be of any data type. |
count
function to calculate the PV value and use the approx_distinct
function to get the approximate number of distinct input values of the client_ip
field and use it as the UV value.* | SELECT count(*) AS PV, approx_distinct(ip) AS UV
approx_percentile
function is used to sort values of the target field in ascending order and return the value in the position around percentage
. It uses the T-Digest algorithm for estimation, which has a low deviation and can meet the most statistical analysis requirements. If needed, you can use * | select count_if(x<(select approx_percentile(x,percentage))),count(*)
to accurately count the number of field values below percentage
and the total number of field values respectively and then verify the statistical deviation.percentage
positionapprox_percentile(x, percentage)
percentage
positions (percentage01,percentage02...)approx_percentile(x, array[percentage01,percentage02...])
Parameter | Description |
x | Value type: double |
percentage | Value range: [0,1] |
* | select approx_percentile(resTotalTime,0.5)
* | select approx_percentile(resTotalTime, array[0.2,0.4,0.6])
문제 해결에 도움이 되었나요?