es全文搜索

注意

查询语法跟 Elasticsearch 的版本有关系，这里使用的是 7.10.2，使用大版本 7 的应该都可以。官方文档 (opens new window)

# 构建数据

# 1.创建模板

# template_accounts：模板的名称
# @template_accounts.json：模板文件路径
curl -H 'Content-Type: application/json' 'localhost:9200/_template/template_accounts' -d '@template_accounts.json'

1
2
3

# 2.插入数据

点击下载：accounts.json 文件

# accounts-2023-06-30：索引名称，这个索引名称满足模板中的要求，创建数据时会自动引用模板中定义的数据类型。
# --data-binary：表示发送二进制数据，避免了数据在传输过程中被转义或编码
# @accounts.json：数据文件路径
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/accounts-2023-06-30/_bulk?pretty' --data-binary "@accounts.json"

1
2
3
4

# 匹配查询（match）

curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"match": {"firstname": "Amber"}}}'

1
2

# 返回条数（size）

#  默认返回 10 条数据，使用 size 可以改变，下面只返回一条结果
curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"match": {"gender": "M"}}, "size": 1}'

1
2
3

# 分页

# from/size

报错信息

不推荐使用，会消耗内存，因为查询会把所有数据载入内存中进行分页查询，超过 100000 可能会报错。这个 100000 可以通过更改 max_result_window 参数进行配置。默认好像是 10000。

Result window is too large, from + size must be less than or equal to: [100000] but was [100010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

# 通过 from 指定位移，默认从位置 0 开始，下面是从位置 1 开始只返回一条结果
curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"match": {"gender": "M"}}, "size": 1, "from": 1}'

1
2
3

# scroll

提示

滚动查询，第一次查询。scroll 官方文档 (opens new window)

scroll 游标查询使用的是 kibana 进行的操作，不是用的 shell 命令。

size：设置每次返回的数量，后面滚动都会按这个数量返回，直至没有数据返回。

1m：滚动的时间间隔单位为 1 分钟，超过一分钟返回的_scroll_id 就会失效

GET accounts-2023-06-30/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "size": 20
}

1
2
3
4
5
6
7

第一次返回示例：

{
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFklRQkJlcjQzUlgybUE3V2hXRWozc1EAAAAAAAFAKRZOMmJTZ19QVlJidW0zV09XdlNGSDR3",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "accounts-2023-06-30",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        },
        ... 省略其他数据，列表中的数量通过size控制。
      }
    ]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

提示

滚动查询，后续查询，每次都要带上之前返回的_scroll_id

1m：滚动的时间间隔单位为 1 分钟，超过一分钟返回的_scroll_id 就会失效

scroll_id：上次返回的 _scroll_id

GET _search/scroll
{
  "scroll": "1m",
  "scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFklRQkJlcjQzUlgybUE3V2hXRWozc1EAAAAAAAFAKRZOMmJTZ19QVlJidW0zV09XdlNGSDR3"
}

1
2
3
4
5

后续滚动返回示例：

{
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFklRQkJlcjQzUlgybUE3V2hXRWozc1EAAAAAAAFAXRZOMmJTZ19QVlJidW0zV09XdlNGSDR3",
  "took" : 2,
  "timed_out" : false,
  "terminated_early" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "accounts-2023-06-30",
        "_type" : "_doc",
        "_id" : "102",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 102,
          "balance" : 29712,
          "firstname" : "Dena",
          "lastname" : "Olson",
          "age" : 27,
          "gender" : "F",
          "address" : "759 Newkirk Avenue",
          "employer" : "Hinway",
          "email" : "denaolson@hinway.com",
          "city" : "Choctaw",
          "state" : "NJ"
        }
      },
        ... 省略其他数据，列表中的数量通过第一次查询中的size控制。
      }
    ]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

删除游标：

提示

xxxxxx 对应实际的 scroll_id。为了避免游标太多发生问题，用完以后可以手动删除游标。

为了防止打开太多卷轴导致的问题，游标默认最多打开 500 个，可以通过 search.max_open_scroll_context设置。

DELETE _search/scroll
{
  "scroll_id": ["xxxxxx", "xxxxxx"]
}

1
2
3
4

# 查询指定字段

# 查询指定字段使用fields，_source，字段对应的数据返回的是数组
curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"match_all": {}}, "fields": ["firstname", "lastname", "age"], "_source": false}'

1
2
3

# 逻辑运算

# 多个单词查询 or

# 如果有多个搜索关键字（匹配单词），默认是 or 的关系（要执行词组搜索使用 match_phrase）
# 注意：address 字段类型应该使用 text 而不是 keyword
# 查询 address 中包含 880 或者 Street 的数据。
curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"match": {"address": "880 Street"}}}'

1
2
3
4
5

# and 关系

# 查询 address 中包含 Bristol 和 Street 的数据。
curl -H 'Content-Type: application/json' 'localhost:9200/accounts-2023-06-30/_search?pretty' -d '
{"query": {"bool": {"filter": [{"match": {"address": "Bristol"}}, {"match": {"address": "Street"}}]}}}'

1
2
3

# 使用 Kibana 查询

提示

使用 Kibana 前需要先安装 Kibana。

为了更好看的演示查询，这里使用 Kibana 作为演示。查询语法跟命令行一样，这里用 json 格式更直观。

# match_all 所有

GET /accounts-2023-06-30/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5
}

1
2
3
4
5
6
7

# 全文查询

提示

全文查询使您能够搜索分析的文本字段。更多信息请查看官方文档 (opens new window)。

# match 条件查询

提示

match 会使用分词查询。要使用分词查询，还要满足数据类型是 text 而不是 keyword。

GET /accounts-2023-06-30/_search
{
  "query": {
    "match": {
      "address": "880 Holmes Lane"
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
            "address": "880 Holmes Lane"
          }
        }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "filter": {
        "match": {
            "address": "880 Holmes Lane"
          }
        }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13

// Make sure to add code blocks to your code group

# match_phrase 条件查询

提示

match_phrase 不进行分词查询。下面的案例中，只会查询到一条匹配的数据。而使用 match 时会匹配多条，只要满足分词其中一个单词都会命中。

GET /accounts-2023-06-30/_search
{
  "query": {
    "match_phrase": {
      "address": "880 Holmes Lane"
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "address": "880 Holmes Lane"
        }
      }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "filter": {
        "match": {
          "address": "880 Holmes Lane"
        }
      }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13

// Make sure to add code blocks to your code group

# multi_match 多字段查询

提示

多个字段相同条件查询，满足其中一个字段即可。更多用法查看官方文档 (opens new window)

GET /accounts-2023-06-30/_search
{
  "query": {
    "multi_match": {
      "query": "Duke", 
      "fields": ["firstname", "lastname"]
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "Duke",
          "fields": ["firstname", "lastname"]
        }
      }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14

GET /accounts-2023-06-30/_search
{
  "query": {
    "bool": {
      "filter": {
        "multi_match": {
          "query": "Duke",
          "fields": ["firstname", "lastname"]
        }
      }
    }
  },
  "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14

// Make sure to add code blocks to your code group

# 精确查找

提示

根据结构化数据中的精确值查找文档。更多信息请查看官方文档 (opens new window)。

下面的案例都可以结合 bool 中的 must 和 filter，结构都是类似的这里就不做演示，单个查询条件直接使用 query 即可。

# term

注意

查询完全匹配的数据。避免在 text 类型字段中使用，会查不到数据。 text 类型使用 match 查询。更多信息请查看官方文档 (opens new window)

GET /accounts-2023-06-30/_search
{
    "query": {
        "term": {
            "firstname": {
                "value": "Amber"
            }
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11

# terms

terms同term类似，只是terms支持多个值查询。满足其中一个值即可。

GET /accounts-2023-06-30/_search
{
    "query": {
        "terms": {
          "firstname": [
            "Amber",
            "Duke"
          ]
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12

# exists 存在

查询存在该字段的数据。如果字段不存在，不返回。

GET /accounts-2023-06-30/_search
{
    "query": {
        "exists": {
          "field": "firstname"
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10

# range 区间

GET /accounts-2023-06-30/_search
{
    "query": {
        "range": {
          "age": {
            "gte": 10,
            "lte": 20
          }
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12

# prefix 前缀

GET /accounts-2023-06-30/_search
{
    "query": {
        "prefix": {
          "city": {
            "value": "C"
          }
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11

# regexp 正则

GET /accounts-2023-06-30/_search
{
    "query": {
        "regexp": {
          "email": "\\w*@pyrami.com"
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9

# bool

提示

多个条件互相关联查询。更多信息请查看官方文档 (opens new window)

# must

满足所有条件，并计入分数(_score)。

GET /accounts-2023-06-30/_search
{
    "query": {
        "bool": {
          "must": [
            {
              "match": {
                "address": "Court"
              }
            },
            {
              "term": {
                "age": 28
              }
            }
          ]
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# filter

提示

满足所有条件，不计入分数(_score)。

通过 must 和 filter 案例，最后结果(_source)是一样的，但是分数(_score)不一样。

must 和 filter 的区别：filter 查询的分数(_score)将被忽略，值为 0。

GET /accounts-2023-06-30/_search
{
    "query": {
        "bool": {
          "filter": [
            {
              "match": {
                "address": "Court"
              }
            },
            {
              "term": {
                "age": 28
              }
            }
          ]
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# should

满足至少其中之一条件，并计入分数(_score)。

GET /accounts-2023-06-30/_search
{
    "query": {
        "bool": {
          "should": [
            {
              "match": {
                "address": "Court"
              }
            },
            {
              "term": {
                "age": 28
              }
            }
          ]
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# must_not

结果不能出现在匹配结果中，不计入分数(_score)。

GET /accounts-2023-06-30/_search
{
    "query": {
        "bool": {
          "should": [
            {
              "match": {
                "address": "Court"
              }
            },
            {
              "term": {
                "age": 28
              }
            }
          ]
        }
    },
    "size": 5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

#ES

上次更新: 2023/08/08, 20:00:46

← es基本命令 mapping→