Foreword
CTSDB is a distributed and scalable time series database that supports near-real time data search and analysis. It is compatible with common Elasticsearch APIs. Many users may not find a good way to import MySQL data into CTSDB. Therefore, this document provides a simple and easy way to sync MySQL data into CTSDB.
Overview
go-mysql-elasticsearch is an open-source high-performance Elasticsearch tool developed in Go for syncing MySQL data. It is easy to compile and use, and it works in a very simple way: you first use mysqldump to get the current MySQL data, then use name
and position
in the current binlog to get the incremental data, and finally construct RESTful APIs based on the binlog to write the data into Elasticsearch. As CTSDB is developed based on Elasticsearch, it is perfectly compatible with go-mysql-elasticsearch for MySQL data import.
Syncing MySQL Data to CTSDB
MySQL sample data construction
As you want to import MySQL data into CTSDB, it is assumed that you have installed MySQL. To make the process complete, this document starts from the import of sample data. Here, a simple tool written in Go is used to generate some sample data and import it into MySQL with the following table structure:
mysql> desc test_table;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| timestamp | bigint(20) | YES | | NULL | |
| cpu_usage | float | YES | | NULL | |
| host_ip | varchar(20) | YES | | NULL | |
| region | varchar(20) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
Use the above code to create a table named test_table
. Then, import 2,000 records of sample data into the table. Some data records are as shown below:
mysql> select * from test_table;
+------+------------+-----------+-------------+-----------+
| id | timestamp | cpu_usage | host_ip | region|
+------+------------+-----------+-------------+-----------+
|1 | 1527676339 | 0.23 | 192.168.1.1 | beijing |
|2 | 1527676399 | 0.78 | 192.168.1.2 | shanghai |
|3 | 1527676459 | 0.2 | 192.168.1.3 | guangzhou |
|4 | 1527676519 | 0.47 | 192.168.1.4 | shanghai |
|5 | 1527676579 | 0.13 | 192.168.1.5 | beijing |
|6 | 1527676639 | 0.15 | 192.168.1.1 | beijing |
|7 | 1527676699 | 0.07 | 192.168.1.2 | shanghai |
|8 | 1527676759 | 0.17 | 192.168.1.3 | guangzhou |
|9 | 1527676819 | 0.94 | 192.168.1.4 | shanghai |
|10| 1527676879 | 0.06 | 192.168.1.5 | beijing |
At this point, MySQL sample data has been prepared.
Creating metric in CTSDB
Create a metric in CTSDB with the same structure as that of the MySQL table by using the following API to store the corresponding data:
POST /_metric/test_metric
{
"time": {
"name": "timestamp",
"format": "strict_date_optional_time || epoch_second"
},
"tags": {
"region": "string",
"host_ip": "string"
},
"fields": {
"cpu_usage": "float"
}
}
At this point, the CTSDB metric structure has been prepared. Then, use go-mysql-elasticsearch to sync data.
go-mysql-elasticsearch usage
As go-mysql-elasticsearch is developed in Go, you first need to install Go on a version above 1.6. Go installation is very simple. You can download it at https://golang.org/dl/ and install it as instructed in Download and install. The entire process is as follows: $ go get github.com/siddontang/go-mysql-elasticsearch
$ cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
$ make
After the tool is installed, you need to properly configure it first before using it. The following is a configuration sample with corresponding comments:
my_addr = "127.0.0.1:3306"
my_user = "root"
my_pass = "123456"
my_charset = "utf8"
es_addr = "9.6.174.42:13982"
es_user = "root"
es_pass = "changeme"
data_dir = "./var"
stat_addr = "127.0.0.1:12800"
server_id = 1001
flavor = "mysql"
mysqldump = "mysqldump"
bulk_size = 512
flush_bulk_time = "200ms"
skip_no_pk_table = true
[[source]]
schema = "mysql_es"
tables = ["test_*"]
[[rule]]
schema = "mysql_es"
table = "test_table"
index = "test_metric"
type = "doc"
The above configuration is only for test. If you have more advanced requirements, please see the official documentation for proper configuration. After completing the configuration, run go-mysql-elasticsearch as follows:
$ ./bin/go-mysql-elasticsearch -config=./etc/river.toml
2018/05/31 21:43:44 INFO create BinlogSyncer with config {1001 mysql 127.0.0.1 3306 root utf8 false false <nil> false false 0 0s 0s 0}
2018/05/31 21:43:44 INFO run status http server 127.0.0.1:12800
2018/05/31 21:43:44 INFO skip dump, use last binlog replication pos (mysql-bin.000002, 194296) or GTID %!s(<nil>)
2018/05/31 21:43:44 INFO begin to sync binlog from position (mysql-bin.000002, 194296)
2018/05/31 21:43:44 INFO register slave for master server 127.0.0.1:3306
2018/05/31 21:43:44 INFO start sync binlog at binlog file (mysql-bin.000002, 194296)
2018/05/31 21:43:44 INFO rotate to (mysql-bin.000002, 194296)
2018/05/31 21:43:44 INFO rotate binlog to (mysql-bin.000002, 194296)
2018/05/31 21:43:44 INFO save position (mysql-bin.000002, 194296)
Note that as go-mysql-elasticsearch needs to use binlogs in row-based format, you must configure the following parameters in MySQL:
log_bin=mysql-bin
binlog_format = ROW
server-id=1
Now, check whether the MySQL data has been successfully imported into CTSDB:
GET test_metric/_search?size=1000
{
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"docvalue_fields": ["timestamp", "host_ip", "region", "cpu_usage"]
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2000,
"max_score": null,
"hits": [
{
"_index": "test_metric@1525363200000_30",
"_type": "doc",
"_id": "2000",
"_score": null,
"fields": {
"host_ip": [
"192.168.1.5"
],
"region": [
"beijing"
],
"cpu_usage": [
0.05000000074505806
],
"timestamp": [
1527807286000
]
},
"sort": [
1527807286000
]
},
......
}
Summary
As you can see, with go-mysql-elasticsearch, you only need to write rules in the configuration file to easily sync MySQL data to Elasticsearch. The above section only provides some simple examples. If you have more needs, please see the official documentation of go-mysql-elasticsearch.
In addition to the tool described in this document, two other tools are also recommended:
py-mysql-elasticsearch-sync. It is written in Python, and it works in a way similar to go-mysql-elasticsearch, as both of them use binlogs to sync data. For more information on how to install and use it, please see py-mysql-elasticsearch-sync. Logstash. To use it to sync data, you need to install the logstash-input-jdbc
and logstash-output-elasticsearch
plugins. For more information on how to use it, please see Jdbc input plugin and Elasticsearch output plugin.
If you have any questions when using the above tools, please submit a ticket for assistance.