ClickHouse supports multiple data types such as integer, floating point, character, date, enumeration, and array.
Type | Name | Type Identifier | Value Range or Description |
---|---|---|---|
Integer | 1-byte integer | Int8 | -128–127 |
2-byte integer | Int16 | -32768–32767 | |
4-byte integer | Int32 | -2147483648–2147483647 | |
8-byte integer | Int64 | -9223372036854775808–9223372036854775807 | |
1-byte unsigned integer | UInt8 | 0–255 | |
2-byte unsigned integer | UInt16 | 0–65535 | |
4-byte unsigned integer | UInt32 | 0–4294967295 | |
8-byte unsigned integer | UInt64 | 0–18446744073709551615 | |
Floating point | Single-precision floating point | Float32 | 6–7 significant digits |
Double-precision floating point | Float64 | 15–16 significant digits | |
Custom-precision floating point | Decimal32(S) | 1–9 significant digits (specified by `S`) | |
Decimal64(S) | 10–18 significant digits (specified by `S`) | ||
Decimal128(S) | 19–38 significant digits (specified by `S`) | ||
Character | Varchar | String | The string length is unlimited |
Char | FixedString(N) | The string length is fixed | |
UUID | UUID | The `UUID` is generated by the built-in function `generateUUIDv4` | |
Time | Date | Date | It stores the year, month, and day in the format of `yyyy-MM-dd` |
Timestamp (second-level) | DateTime(timezone) | Unix timestamp that is accurate down to the second | |
Timestamp (custom precision) | DateTime(precision, timezone) | You can specify the time precision | |
Enumeration | 1-byte enumeration | Enum8 | 256 values (-128–127) are provided |
2-byte enumeration | Enum16 | 65,536 values (-32768–32767) are provided | |
Array | Array | Array(T) | It indicates an array consisting of data in `T` type. You are not recommended to use nested arrays |
The following sample code is used to store the gender information of users in a site:
CREATE TABLE user (uid Int16, name String, gender Enum('male'=1, 'female'=2)) ENGINE=Memory;
INSERT INTO user VALUES (1, 'Gary', 'male'), (2,
'Jaco', 'female');
# Query data
SELECT * FROM user;
┌─uid─┬─name─┬─gender─┐
│ 1 │ Gary │ male │
│ 2 │ Jaco │ female │
└─────┴──────┴────────┘
# Use the `cast` function to query the enumerated integers
SELECT uid, name, CAST(gender, 'Int8') FROM user;
┌─uid─┬─name─┬─CAST(gender, 'Int8')─┐
│ 1 │ Gary │ 1 │
│ 2 │ Jaco │ 2 │
└─────┴──────┴──────────────────────┘
The following sample code is used to record the IDs of users who log in to the site every day so as to analyze active users.
CREATE TABLE userloginlog (logindate Date, uids Array(String)) ENGINE=TinyLog;
INSERT INTO userloginlog VALUES ('2020-01-02', ['Gary', 'Jaco']), ('2020-02-03', ['Jaco', 'Sammie']);
# Query result
SELECT * FROM userloginlog;
┌──logindate─┬─uids──────────────┐
│ 2020-01-02 │ ['Gary','Jaco'] │
│ 2020-02-03 │ ['Jaco','Sammie'] │
└────────────┴───────────────────┘
ClickHouse uses the CREATE
statement to create a database or table.
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
...
) ENGINE = engine
Databases and tables can be created on local disks or in a distributed manner. Distributed creation can be implemented in the following two methods:
CREATE
statement on all servers where clickhouse-server resides.ON CLUSTER
clause to create a database/table with ZooKeeper.If you use clickhouse-client to query a local table of server B on server A, the error "Table xxx doesn't exist.." will be reported. If you want that all servers in the cluster can query a table, you are recommended to use distributed tables.
For more information, please see CREATE Queries.
ClickHouse uses the SELECT
statement to query data.
SELECT [DISTINCT] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list]
[LIMIT [offset_value, ]n BY columns]
[LIMIT [n, ]m]
[UNION ALL ...]
[INTO OUTFILE filename]
[FORMAT format]
For more information, please see SELECT Queries Syntax.
ClickHouse uses the INSERT INTO
statement to write data.
INSERT INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...
INSERT INTO [db.]table [(c1, c2, c3)] SELECT ...
For more information, please see INSERT.
ClickHouse uses the DROP
or TRUNCATE
statement to delete data.
Note:
DROP
deletes metadata and data, whileTRUNCATE
deletes only data.
DROP DATABASE [IF EXISTS] db [ON CLUSTER cluster]
DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
TRUNCATE TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
ClickHouse uses the ALTER
statement to modify the table structure.
# Column operations on table
ALTER TABLE [db].name [ON CLUSTER cluster] ADD COLUMN [IF NOT EXISTS] name [type] [default_expr] [codec] [AFTER name_after]
ALTER TABLE [db].name [ON CLUSTER cluster] DROP COLUMN [IF EXISTS] name
ALTER TABLE [db].name [ON CLUSTER cluster] CLEAR COLUMN [IF EXISTS] name IN PARTITION partition_name
ALTER TABLE [db].name [ON CLUSTER cluster] COMMENT COLUMN [IF EXISTS] name 'comment'
ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL]
# Partition operations on table
ALTER TABLE table_name DETACH PARTITION partition_expr
ALTER TABLE table_name DROP PARTITION partition_expr
ALTER TABLE table_name CLEAR INDEX index_name IN PARTITION partition_expr
# Attribute operations on table
ALTER TABLE table-name MODIFY TTL ttl-expression
For more information, please see ALTER.
SHOW
statementSHOW DATABASES [INTO OUTFILE filename] [FORMAT format]
SHOW PROCESSLIST [INTO OUTFILE filename] [FORMAT format]
SHOW [TEMPORARY] TABLES [{FROM | IN} <db>] [LIKE '<pattern>' | WHERE expr] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
SHOW DICTIONARIES [FROM <db>] [LIKE '<pattern>'] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
For more information, please see SHOW Queries.DESCRIBE
statementDESC|DESCRIBE TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
There are two types of ClickHouse functions: regular functions and aggregate functions. The difference is that a regular function can generate the result for each row, while an aggregate function needs a set of data to generate the result.
In this type of functions, all fields in the table engage in the arithmetic calculation.
Function Name | Purpose | Use Case |
---|---|---|
plus(a, b), a + b | Calculates the sum of two fields | plus(table.field1, table.field2) |
minus(a, b), a - b | Calculates the difference between two fields | minus(table.field1, table.field2) |
multiply(a, b), a * b | Calculates the product of two fields | multiply(table.field1, table.field2) |
divide(a, b), a / b | Calculates the quotient of two fields | divide(table.field1, table.field2) |
modulo(a, b), a % b | Calculates the remainder between two fields | modulo(table.field1, table.field2) |
abs(a) | Calculates absolute value | abs(table.field1) |
negate(a) | Calculates opposite | negate(table.field1) |
Function Name | Purpose | Use Case |
---|---|---|
=, == | Determines whether the values are the same | table.field1 = value |
!=, <> | Determines whether the values are different | table.field1 != value |
> | Determines whether the former value is greater than the latter value | table.field1 > value |
>= | Determines whether the former value is greater than or equal to the latter value | table.field1 >= value |
< | Determines whether the former value is smaller than the latter value | table.field1 < value |
<= | Determines whether the former value is smaller than or equal to the latter value | table.field1 <= value |
Function Name | Purpose | Use Case |
---|---|---|
AND | Returns result if both two conditions are met | - |
OR | Returns result if either condition is met | - |
NOT | Returns result if none conditions are met | - |
Overflow may occur when you use a type conversion function. The data types of overflowed values are the same as those in C.
Function Name | Purpose | Use Case |
---|---|---|
toInt(8|16|32|64) | Converts String value to Int value |
The result of toInt8('128') is -127 |
toUInt(8|16|32|64) | Converts String value to UInt value |
The result of toUInt8('128') is 128 |
toInt(8|16|32|64)OrZero | Converts String value to Int value and returns 0 if failed |
The result of toInt8OrZero('a') is 0 |
toUInt(8|16|32|64)OrZero | Converts String value to UInt value and returns 0 if failed |
The result of toUInt8OrZero('a') is 0 |
toInt(8|16|32|64)OrNull | Converts String value to Int value and returns NULL if failed |
The result of toInt8OrNull('a') is NULL |
toUInt(8|16|32|64)OrNull | Converts String value to UInt value and returns NULL if failed |
The result of toUInt8OrNull('a') is NULL |
Functions similar to those above are also provided for the floating point and date types.
For more information, please see Type Conversion Functions.
For more information, please see Functions for Working with Dates and Times.
For more information, please see Functions for Working with Strings.
For more information, please see Functions for Working with UUID.
For more information, please see Functions for Working with JSON.
Function Name | Purpose | Use Case |
---|---|---|
count | Counts the number of rows or non-NULL values | count(expr), COUNT(DISTINCT expr), count(), count(*) |
any(x) | Returns the first encountered value. The result is indeterminate | any(column) |
anyHeavy(x) | Returns a frequently occurring value using the heavy hitters algorithm. The result is generally indeterminate | anyHeavy(column) |
anyLast(x) | Returns the last encountered value. The result is indeterminate | anyLast(column) |
groupBitAnd | Applies bitwise AND |
groupBitAnd(expr) |
groupBitOr | Applies bitwise OR |
groupBitOr(expr) |
groupBitXor | Applies bitwise XOR |
groupBitXor(expr) |
groupBitmap | Returns cardinality | groupBitmap(expr) |
min(x) | Calculates the minimum | min(column) |
max(x) | Calculates the maximum | max(x) |
argMin(arg, val) | Returns the arg value for a minimal val value |
argMin(c1, c2) |
argMax(arg, val) | Returns the arg value for a maximum val value |
argMax(c1, c2) |
sum(x) | Calculates the sum | sum(x) |
sumWithOverflow(x) | Calculates the sum. If the sum exceeds the maximum value for this data type, the function will return an error | sumWithOverflow(x) |
sumMap(key, value) | Sums the value array of the same key and returns the tuples of value and key arrays: keys in sorted order, and value sum of corresponding keys |
- |
skewPop | Calculates skewness | skewPop(expr) |
skewSamp | Calculates sample skewness | skewSamp(expr) |
kurtPop | Calculates kurtosis | kurtPop(expr) |
kurtSamp | Calculates sample kurtosis | kurtSamp(expr) |
timeSeriesGroupSum(uid, timestamp, value) | Sums the timestamps in time-series grouped by uid . It uses linear interpolation to add missing sample timestamps before summing the values |
- |
timeSeriesGroupRateSum(uid, ts, val) | Calculates the rate of time-series grouped by uid and then sum rates together |
- |
avg(x) | Calculates the average | - |
uniq | Calculates the approximate number of different values | uniq(x[, ...]) |
uniqCombined | Calculates the approximate number of different values. Compared with uniq , it consumes less memory and is more accurate but has slightly lower performance |
uniqCombined(HLL_precision)(x[, ...]), uniqCombined(x[, ...]) |
uniqCombined64 | Functions in the same way as uniqCombined but uses 64-bit values to reduce the probability of result value overflow |
- |
uniqHLL12 | Calculates the approximate number of different values. It is not recommended. Please use uniq and uniqCombined instead |
- |
uniqExact | Calculates the exact number of different values | uniqExact(x[, ...]) |
groupArray(x), groupArray(max_size)(x) | Returns an array of x values. The array size can be specified by max_size |
- |
groupArrayInsertAt(value, position) | Inserts value into array at specified position | - |
groupArrayMovingSum | - | - |
groupArrayMovingAvg | - | - |
groupUniqArray(x), groupUniqArray(max_size)(x) | - | - |
quantile | - | - |
quantileDeterministic | - | - |
quantileExact | - | - |
quantileExactWeighted | - | - |
quantileTiming | - | - |
quantileTimingWeighted | - | - |
quantileTDigest | - | - |
quantileTDigestWeighted | - | - |
median | - | - |
quantiles(level1, level2, …)(x) | - | - |
varSamp(x) | - | - |
varPop(x) | - | - |
stddevSamp(x) | - | - |
stddevPop(x) | - | - |
topK(N)(x) | - | - |
topKWeighted | - | - |
covarSamp(x, y) | - | - |
covarPop(x, y) | - | - |
corr(x, y) | - | - |
categoricalInformationValue | - | - |
simpleLinearRegression | - | - |
stochasticLinearRegression | - | - |
stochasticLogisticRegression | - | - |
groupBitmapAnd | - | - |
groupBitmapOr | - | - |
groupBitmapXor | - | - |
A dictionary is a mapping between a key and attributes and can be used as a function for query, which is simpler and more efficient than the method of combining referencing tables with a JOIN
clause.
There are two types of data dictionaries, namely, internal and external dictionaries.
ClickHouse supports one type of internal dictionaries, i.e., geobase. For more information on the supported functions, please see Functions for Working with Yandex.Metrica Dictionaries.
ClickHouse allows you to add external dictionaries from multiple data sources. For more information on the supported data sources, please see Sources of External Dictionaries.
p](https://clickhouse.tech/docs/en/query_language/agg_functions/reference/#skewpop) | Calculates skewness | skewPop(expr) |
| skewSamp | Calculates sample skewness | skewSamp(expr) |
| kurtPop | Calculates kurtosis | kurtPop(expr) |
| kurtSamp | Calculates sample kurtosis | kurtSamp(expr) |
| timeSeriesGroupSum(uid, timestamp, value) | Sums the timestamps in time-series grouped by uid
. It uses linear interpolation to add missing sample timestamps before summing the values | - |
| timeSeriesGroupRateSum(uid, ts, val) | Calculates the rate of time-series grouped by uid
and then sum rates together | - |
| avg(x) | Calculates the average | - |
| uniq | Calculates the approximate number of different values | uniq(x[, ...]) |
| uniqCombined | Calculates the approximate number of different values. Compared with uniq
, it consumes less memory and is more accurate but has slightly lower performance | uniqCombined(HLL_precision)(x[, ...]), uniqCombined(x[, ...]) |
| uniqCombined64 | Functions in the same way as uniqCombined
but uses 64-bit values to reduce the probability of result value overflow | - |
| uniqHLL12 | Calculates the approximate number of different values. It is not recommended. Please use uniq
and uniqCombined
instead | - |
| uniqExact | Calculates the exact number of different values | uniqExact(x[, ...]) |
| groupArray(x), groupArray(max_size)(x) | Returns an array of x
values. The array size can be specified by max_size
| - |
| groupArrayInsertAt(value, position) | Inserts value into array at specified position | - |
| groupArrayMovingSum | - | - |
| groupArrayMovingAvg | - | - |
| groupUniqArray(x), groupUniqArray(max_size)(x) | - | - |
| quantile | - | - |
| quantileDeterministic | - | - |
| quantileExact | - | - |
| quantileExactWeighted | - | - |
| quantileTiming | - | - |
| quantileTimingWeighted | - | - |
| quantileTDigest | - | - |
| quantileTDigestWeighted | - | - |
| median | - | - |
| quantiles(level1, level2, …)(x) | - | - |
| varSamp(x) | - | - |
| varPop(x) | - | - |
| stddevSamp(x) | - | - |
| stddevPop(x) | - | - |
| topK(N)(x) | - | - |
| topKWeighted | - | - |
| covarSamp(x, y) | - | - |
| covarPop(x, y) | - | - |
| corr(x, y) | - | - |
| categoricalInformationValue | - | - |
| simpleLinearRegression | - | - |
| stochasticLinearRegression | - | - |
| stochasticLogisticRegression | - | - |
| groupBitmapAnd | - | - |
| groupBitmapOr | - | - |
| groupBitmapXor | - | - |
A dictionary is a mapping between a key and attributes and can be used as a function for query, which is simpler and more efficient than the method of combining referencing tables with a JOIN
clause.
There are two types of data dictionaries, namely, internal and external dictionaries.
ClickHouse supports one type of internal dictionaries, i.e., geobase. For more information on the supported functions, please see Functions for Working with Yandex.Metrica Dictionaries.
ClickHouse allows you to add external dictionaries from multiple data sources. For more information on the supported data sources, please see Sources of External Dictionaries.
Was this page helpful?