Tencent Cloud

Recent Pages

同义词配置

最后更新时间：2020-08-12 09:44:21
腾讯云 Elasticsearch Service 支持以下两种方式配置同义词：上传同义词文件、直接引用同义词。
方式一：上传同义词文件
注意事项
上传同义词文件操作将触发集群滚动重启
新上传/新变更的同义词文件对老索引不生效，需要重建索引。例如现有的索引myindex使用了synonym.txt同义词文件，当该同义词文件的内容变更并重新上传后，现有的索引myindex不会动态加载更新后的同义词，需要对现有索引进行reindex操作，否则更新后的同义词文件只对新建的索引生效。
同义词文件要求每行一个同义词表达式（表达式支持 Solr 规则和 WordNet 规则），并且文件需要为utf-8编码，扩展名为.txt。例如：
快乐水,可乐 => 可乐,快乐水
elasticsearch,es => es
同义词文件单个文件最大为10M，上传文件总数最多为10个。
操作步骤
1. 登录腾讯云Elasticsearch Service控制台
2. 在集群列表页，单击集群 ID 进入集群详情页。
3. 单击高级配置，进入同义词配置页面。
﻿
4. 单击更新词典，在更新同义词页面上传同义词文件。
﻿
5. 上传完成后，单击保存。
使用同义词文件
以下实例使用 filter 过滤器配置同义词，使用synonym.txt作为测试文件，文件内容为elasticsearch,es => es
1. 登录已上传同义词文件的集群对应的Kibana控制台。登录控制台的具体步骤请参考 通过 Kibana 访问集群。
2. 单击左侧导航栏的 Dev Tools。
3. 在 Console 中执行如下的命令，创建索引。
PUT /my_index
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_ik": {
            "type": "custom",
            "tokenizer": "ik_smart",
            "filter": [
              "my_synonym"
            ]
          }
        },
        "filter": {
          "my_synonym": {
            "type": "synonym",
            "synonyms_path": "analysis/synonym.txt"
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_ik",
          "search_analyzer": "my_ik"
        }
      }
    }
  }
}
4. 执行如下命令，验证同义词配置。
GET /my_index/_analyze
{
  "analyzer": "my_ik",
  "text":"tencet elasticsearch service"
}
 命令执行成功，将返回如下结果。
{
  "tokens": [
    {
      "token": "tencet",
      "start_offset": 0,
      "end_offset": 6,
      "type": "ENGLISH",
      "position": 0
    },
    {
      "token": "es",
      "start_offset": 7,
      "end_offset": 20,
      "type": "SYNONYM",
      "position": 1
    },
    {
      "token": "service",
      "start_offset": 21,
      "end_offset": 28,
      "type": "ENGLISH",
      "position": 2
    }
  ]
}
 输出结果中，tokenes的类型是SYNONYM同义词。
5. 执行如下命令，添加一些文档。
POST /my_index/_doc/1
{
  "content": "tencet elasticsearch service"
}
﻿
POST /my_index/_doc/2
{
  "content": "hello es"
}
6. 执行如下命令，搜索同义词。
GET my_index/_search
{
  "query" : { "match" : { "content" : "es" }},
  "highlight" : {
    "pre_tags" : ["<tag1>", "<tag2>"],
    "post_tags" : ["</tag1>", "</tag2>"],
    "fields" : {"content": {}}
  }
}
 命令执行成功后，返回如下结果。
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.25811607,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "content": "hello es"
        },
        "highlight": {
          "content": [
            "hello <tag1>es</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.25316024,
        "_source": {
          "content": "tencet elasticsearch service"
        },
        "highlight": {
          "content": [
            "tencet <tag1>elasticsearch</tag1> service"
          ]
        }
      }
    ]
  }
}
方式二：直接引用同义词
1. 登录集群对应的 Kibana 控制台。登录控制台的具体步骤请参考 通过 Kibana 访问集群。
2. 单击左侧导航栏的 Dev Tools。
3. 在 Console 中执行如下的命令，创建索引。
PUT /my_index
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_ik": {
            "type": "custom",
            "tokenizer": "ik_smart",
            "filter": [
              "my_synonym"
            ]
          }
        },
        "filter": {
          "my_synonym": {
            "type": "synonym",
            "synonyms": [
              "elasticsearch,es => es"
            ],
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_ik",
          "search_analyzer": "my_ik"
        }
      }
    }
  }
}
 这里与使用同义词文件方式的区别是，在 filter 中定义同义词时，直接引用了同义词，而不是同义词文件："synonyms": ["elasticsearch,es => es"]
4. 执行如下命令，验证同义词配置。
GET /my_index/_analyze
{
  "analyzer": "my_ik",
  "text":"tencet elasticsearch service"
}
 命令执行成功，将返回如下结果。
{
  "tokens": [
    {
      "token": "tencet",
      "start_offset": 0,
      "end_offset": 6,
      "type": "ENGLISH",
      "position": 0
    },
    {
      "token": "es",
      "start_offset": 7,
      "end_offset": 20,
      "type": "SYNONYM",
      "position": 1
    },
    {
      "token": "service",
      "start_offset": 21,
      "end_offset": 28,
      "type": "ENGLISH",
      "position": 2
    }
  ]
}
 输出结果中，tokenes的类型是SYNONYM同义词。
5. 执行如下命令，添加一些文档。
POST /my_index/_doc/1
{
  "content": "tencet elasticsearch service"
}
﻿
POST /my_index/_doc/2
{
  "content": "hello es"
}
6. 执行如下命令，搜索同义词。
GET my_index/_search
{
  "query" : { "match" : { "content" : "es" }},
  "highlight" : {
    "pre_tags" : ["<tag1>", "<tag2>"],
    "post_tags" : ["</tag1>", "</tag2>"],
    "fields" : {"content": {}}
  }
}
 命令执行成功后，返回如下结果。
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.25811607,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "content": "hello es"
        },
        "highlight": {
          "content": [
            "hello <tag1>es</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.25316024,
        "_source": {
          "content": "tencet elasticsearch service"
        },
        "highlight": {
          "content": [
            "tencet <tag1>elasticsearch</tag1> service"
          ]
        }
      }
    ]
  }
}
﻿