CREATE CATALOG es PROPERTIES ("type"="es","hosts"="http://127.0.0.1:9200");
default_db
. After switching to the ES Catalog, you will be in the default_db
, so you don't need to execute the USE default_db
command.Parameter | Mandatory | Default Value | Description |
hosts | Yes | - | ES address, can be one or multiple addresses, or the Cloud Load Balancer address of ES. |
user | No | Empty | ES username |
password | No | Empty | Password of the corresponding user |
doc_value_scan | No | true | Whether to obtain value of the target field by ES/Lucene columnar storage |
keyword_sniff | No | true | Whether to sniff the text.fields in ES based on keywords. If this is set to false, the system will perform matching after according to the content after tokenization. |
nodes_discovery | No | true | Whether to enable ES node discovery, set to true by default, set to false in network isolation environments and only connected to the specified nodes |
ssl | No | false | Whether to enable HTTPS access mode for ES, currently follows a "Trust All" method in FE/BE |
mapping_es_id | No | false | Whether to map the _id field in the ES index |
like_push_down | No | true | Whether to convert like into wildcard push down to ES, which will increase ES consumption of CPU. |
/_cluster/state/ and_nodes/http
; If you have not enabled security authentication for the cluster, you dont't need to set the user and password.ES Type | Doris Type | Comment |
null | null | - |
boolean | boolean | - |
byte | tinyint | - |
short | smallint | - |
integer | int | - |
long | bigint | - |
unsigned_long | largeint | - |
float | float | - |
half_float | float | - |
double | double | - |
scaled_float | double | - |
date | date | It only supports default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis format |
keyword | string | - |
text | string | - |
ip | string | - |
nested | string | - |
object | string | - |
other | unsupported | - |
doris
structural annotation can be added to the_metasection of the index mapping. For Elasticsearch 6.x and before versions, please see_meta.doc
containing the following data structure:{"array_int_field": [1, 2, 3, 4],"array_string_field": ["doris", "is", "the", "best"],"id_field": "id-xxx-xxx","timestamp_field": "2022-11-12T12:08:56Z","array_object_field": [{"name": "xxx","age": 18}]}
_meta.doris
property of the target index mapping.# ES 7.x and abovecurl -X PUT "localhost:9200/doc/_mapping?pretty" -H 'Content-Type: application/json' -d '{"_meta": {"doris":{"array_fields":["array_int_field","array_string_field","array_object_field"]}}}'# ES 6.x and beforecurl -X PUT "localhost:9200/doc/_mapping?pretty" -H 'Content-Type: application/json' -d '{"_doc": {"_meta": {"doris":{"array_fields":["array_int_field","array_string_field","array_object_field"]}}}}
array_fields
: used to indicate a field that is an array type.SQL syntax | ES 5.x+ syntax |
= | term query |
in | terms query |
> , < , >= , ⇐ | range query |
and | bool.filter |
or | bool.should |
not | bool.must_not |
not in | bool.must_not + terms query |
is\\_not\\_null | exists query |
is\\_null | bool.must_not + exists query |
esquery | ES-native json QueryDSL |
"enable_docvalue_scan" = "true"
_source
._source
, which is in row storage and JSON format. Compared to columnar storage,_source
is slow in batch read. In particular, when the system only needs to read small number of columns, the performance of docvalue is about a dozen times faster than that of_source
.text
fields in ES. Thus, if you need to obtain fields containingtext
values, you will need to obtain them from_source
.> = 25
), the performances ofdocvalue
andsource
are basically equivalent."enable_keyword_sniff" = "true"
text
andkeyword
types. This is how the multi-field feature of ES works. The mapping is as follows:"k4": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}
k4 = "Doris On ES"
"term" : {"k4": "Doris On ES"}
text
, it will be tokenized by the analyzer set for k4 (or by the standard analyzer if no analyzer has been set for k4) after data import. As a result, it will be tokenized into three terms: doris, on, and ES.. The details are as follows:POST /_analyze{"analyzer": "standard","text": "Doris On ES"}
{"tokens": [{"token": "doris","start_offset": 0,"end_offset": 5,"type": "<ALPHANUM>","position": 0},{"token": "on","start_offset": 6,"end_offset": 8,"type": "<ALPHANUM>","position": 1},{"token": "es","start_offset": 9,"end_offset": 11,"type": "<ALPHANUM>","position": 2}]}
"term" : {"k4": "Doris On ES"}
Doris On ES
, no results will be returned. However, if you have setenable_keyword_sniff: true
, the system will automatically convertk4 = "Doris On ES"
tok4.keyword = "Doris On ES"
to fully match the SQL semantics. The converted ES query DSL will be:"term" : {"k4.keyword": "Doris On ES"}
k4.keyword
is ofkeyword
type, so the data is written into ES as a complete term, allowing for successful matching."nodes_discovery" = "true"
"ssl" = "true"
. A temporary solution is to implement a "Trust All" method in FE/BE. The real user configuration certificate will be used in the future.select * from es_table where k1 > 1000 and k3 ='term' or k4 like 'fu*z_'
esquery(field, QueryDSL)
function can be used to push some queries that cannot be expressed in SQL, such as match_phrase, and geoshape, etc., to ES for filtering. The first parameter (the column name) inesquery
is used to associate withindex
, and the second parameter is the JSON expression of basicQuery DSL
in ES, which is surrounded by {}
. Theroot key
in JSON is unique, which can be match_phrase
, geo_shape
, orbool
, etc.match_phrase
query:select * from es_table where esquery(k4, ' {"match_phrase": {"k4": "doris on es"}}');
geo
related query:select * from es_table where esquery(k4, ' {"geo_shape": {"location": {"shape": {"type": "envelope","coordinates": [[13,53],[14,52]]},"relation": "within"}}}');
bool
query:select * from es_table where esquery(k4, ' {"bool": {"must": [{"terms": {"k1": [11,12]}},{"terms": {"k2": [100]}}]}}');
"dt": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}
date
ordatetime
(orvarchar
) when creating this field in Doris. You can use the following SQL statements to push the filter conditions down to ES:select * from doe where k2 > '2020-06-21';select * from doe where k2 < '2020-06-21 12:00:00';select * from doe where k2 < 1593497011;select * from doe where k2 < now();select * from doe where k2 < date_format(now(), '%Y-%m-%d');
format
is not set for the time field in ES, the default format of time fields is:strict_date_optional_time||epoch_millis
ms
.ms
is the internal processing format in ES; otherwise, errors will occur in ES external tables._id
_id
, will be given a globally unique _id
, which is the primary key. Users can also specify an _id
with unique business meanings to the files during import;varchar
-typed _id
when creating tables:CREATE EXTERNAL TABLE `doe` (`_id` varchar COMME "",`city` varchar COMMENT "") ENGINE=ELASTICSEARCHPROPERTIES ("hosts" = "http://127.0.0.1:8200","user" = "root","password" = "root","index" = "doe"}
"mapping_es_id" = "true"
._id
field only supports=
andin
filtering._id
field must be of varchar
type.+----------------------------------------------+| || Doris +------------------+ || | FE +--------------+-------+| | | Request Shard Location| +--+-------------+-+ | || ^ ^ | || | | | || +-------------------+ +------------------+ | || | | | | | | | || | +----------+----+ | | +--+-----------+ | | || | | BE | | | | BE | | | || | +---------------+ | | +--------------+ | | |+----------------------------------------------+ || | | | | | || | | | | | || HTTP SCROLL | | HTTP SCROLL | |+-----------+---------------------+------------+ || | v | | v | | || | +------+--------+ | | +------+-------+ | | || | | | | | | | | | || | | DataNode | | | | DataNode +<-----------+| | | | | | | | | | || | | +<--------------------------------+| | +---------------+ | | |--------------| | | || +-------------------+ +------------------+ | || Same Physical Node | || | || +-----------------------+ | || | | | || | MasterNode +<-----------------+| ES | | || +-----------------------+ |+----------------------------------------------+
HTTP Scroll
method from_source
ordocvalue
.
Was this page helpful?