搜索请求的结构

ES的搜索是基于JSON文档或者基于URL的请求。

1.搜素范围

所有的搜索请求使用_search的端点，既可以是GET 也可以是POST，既可以搜索整个集群，也可以在搜索URL中指定索引或则类型的名称来限制搜索的范围。还可以使用别名来所有多个索引。为了更好的性能，尽可能的将查询范围限制在最小数量的索引和类型。

2.搜索请求的基本组成

确定了要所有的范围后，最重要的就是配置搜索请求中最为重要的模块。下面详细介绍一些配置：

query:这是最重要的i部分，它配置了基于评分返回的最佳文档。也包含了不希望返回哪些文档。
size：返回文档的数量
from:和上面的配置一起使用，用于分页操作。
_source:指定_source字段如何返回。默认是返回完整的_source字段。通过配置该项，将过滤返回的字段。如果文档很大，而且不需要返回全部的内容，就可以使用该配置项。
sort:默认的排序是基于文档的得分，如果不关心得分，可以使用该字段，控制文档的返回。

多说几点：

from和size:用于指定结果的开始点和每页的结果的数量，例如from=2，size=10就是说返回3,4,…等10个结果，这个from和传统意义上的page还是不一样的，使用的时候需要特别注意。再举个例子，比如我们想获取第二页的10条数据，使用的查询是这样的：from=10,size=10.而不是from=2.懂了么。
在查询中使用q=查询中指定的查询条件，比如q=title:elastic表明了正在title字段中查找关键字elastic.

3.基于请求主体的搜索

结果中返回的字段

搜索结果返回的字段列表，可以在_source模块中来指定。如果请求中没有指定该模块，默认返回整个_source。或者如果_source没有存储，那么将只返回匹配文档的元数据_id,_type,_index,_score

curl 'localhost:9200/get-together/_search' -d '{
    "query": {
        "match_all":{}
    },
    "_source":["date", "name"]
}'

_source返回字段中的通配符

可以返回字段列表，还可以指定通配符，比如想返回title, 和 title_ext.可以配置_source: tit*.也可以使用通配符字符串的数组来指定多个通配符。比如_source:["tit*","summ*"]

除了可以指定哪些字段需要返回，还可以指定哪些字段不要返回

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "match_all":{}
    },
    "_source":{
        "include":["name*", date],
        "exclude":["local*"]
    }
}'

查询和过滤器DSL

上面我们已经使用了查询的基本样例match_all，现在具体讲解一下它，这个查询会匹配所有的文档。应用场景为：

使用过滤器的时候
返回被搜索的索引或者类型中全部的文档

示例：

使用过滤器

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "filtered":{
            "query":{
                "match_all":{}
            },
            "filter":{
                ....
            }
        }
    }
}'

全部

curl 'localhost:9200/_search' -d '{
    "query":{
        "match_all":{}
    }
}'

是不是很简单，但是上面的查询很少使用，因为大家都会配置更加细致的搜索条件，查询全部的内容。

再来说一下query_string查询

默认情况下，query_string查询将会搜索_all字段。如果需要修改这一点，可以通过查询来设置字段。例如：title:elastic。或者是通过请求来设置default_field。例如；

curl -XPOST 'localhost:9200/go-together/_search' -d '{
    "query":{
        "query_string":{
            "default_field":"title",
            "query":"elastic"
        }
    }
}'

不仅如此，这种语法还支持AND和or这样的布尔操作，还可以使用-在结果集中排除文档。

➜  alliance curl -XPOST -H 'Content-Type:application/json' 'localhost:9200/get-together/_search?pretty' -d '{
    "query":{
        "query_string":{
            "default_field":"name",
            "query":"name:Elasticsearch and -description:nosql"
        }
    }
}'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "new-events",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "name" : "Late Night with Elasticsearch"
        }
      }
    ]
  }
}

query_string查询的一个显著缺点是，他实在是太强大了，允许用于拥有这么大的权限，可能对集群是个极大的风险。不推荐使用。下面看一下有没有替换方案。

term查询和term过滤器

它可以指定需要搜索的文档字段和词条，注意，由于被搜索的词条是没有经过分析的，文档中的词条必须要精确匹配才能作为结果返回。

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "term":{
            "tags":"elastic"
        }
    },
    "_source":["title", "tags"]
}'

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "filtered": {
            "query":{
                "match_all":{}
            },
            "filter":{
                "term":{
                    "tags":"elastic"
                }
            }
        }
    },
    "_source":["title", "tags"]
}'

terms查询可以搜索一个文档字段中的多个词条。例如

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "terms":{
            "tags": ["elastic", "nosql"]
        }
    },
    "_source":["title", "tags"]
}'

match查询，是一个散列映射，包含了希望搜索的字段和字符串。字段可以是单个字段，也可以是特殊的搜索所有字段的_all.

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "match": {
            "title": "elastic"
        }
    }
}'

match查询可以有很多行为方式，最常用的是布尔和词组

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "match": {
            "title": {
                "query":"elastic sql",
                "operator":"and"
            }
        }
    }
}'

在文档中搜索指定的词组时，phrase查询是非常有用的。每个单词的位置之间可以留有余地。这种余地叫slop.用于表示词组中多个分词的距离。例如，你只记得某一篇文章的标题有Tokyo和oil但是不记得其他的部分。你可以在搜索词组Tokyo oil将slop设置为1或者2，而不是默认0.如此没必要知道具体的标题，就可以搜索包含该词组的结果了。

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "match": {
            "title": {
                "query":"Toyko oil",
                "slop":2,
                "type":"phrase"
            }
        }
    }
}'

phrase_prefix查询可以更进一步搜索词组，不过它是和词组中最后一条词条进行前缀匹配。对于提供搜索框的自动完成功能而言，非常实用。当使用这种搜索方式时，由于产生的结果可能是一个很大的集合，所以可以指定最大的前缀扩展数量。如此，可以在合理的时间内发挥搜索的结果。

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "match": {
            "title": {
                "query":"Toyko o",
                "max_expansions":1,
                "type":"phrase_prefix"
            }
        }
    },
    "_source":["title"]
}'

multi_match查询可以进行多字段匹配查询，比如你可以在文章的标题和摘要部分进行搜索某一个字符串。

curl 'localhost:9200/get-together/group/_search' -d '{
    "query":{
        "multi_match": {
            "query:":"elastic sql",
            "fields":["title","summary"]
        }
    }
}'

上面介绍的查询基本上可以覆盖大部分的查询功能了，但是在一些复杂场景下，可能需要组合查询和复合查询。

bool查询，允许在单个查询中组合任意数量的查询。指定的查询子句表明哪些部分是必须（must）匹配，哪些部分是应该(should)匹配，哪些是不能(must_not)匹配上索引里的数据。

must :只有匹配上这些查询结果里才会返回。
should:只有匹配上指定数量子句的文档才会被返回.
如果没指定must，文档至少匹配一个should子句才能返回。
must_not使得匹配上的文档移除结果集合。

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "bool": {
            "must":[{
               "term":{
                    "alttendees": "zjw"
               }
            }],
            "should":[{
                "term":{
                    "attendess":"wows"
                }
            },
            {
                "term":{
                    "attendees":"andy"
                }
            }
            ],
            "must_not":[{
                "range":{
                    "date":{
                        "lt":"2020-08-01T12:22"
                    }
                }
            }],
            "minimum_should_match":1
        }
    }
}'

bool过滤器版本和查询版本基本一致，只不过它组合的是过滤器，用在查询的filter部分。

range查询和过滤器查询介于一定范围内的值，适用于数字，日期甚至是字符串。

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "range":{
            "created_at":{
                "gt":"2020-06-21",
                "lt":"2020-07-21"
            }
        }
    }
}'

gt :大于某值
gte:大于等于
lt: 小于
lte: 小于等于

特别的，range查询同样支持字符串的范围，比如查询a和e之间的分组

curl 'localhost:9200/get-together/_search' -d '{
    "query":{
        "range": {
            "name":{
                "gt": "a",
                "lt": "e"
            }
        }
    }
}'

如果不确定使用查询还是过滤器，请使用过滤器。

prefix查询和过滤器允许根据给定的前缀来搜索词条，这里前缀在搜索之前是没有经过分析的。

curl 'localhost:9200/get-together/event/_search' -d '{
    "query": {
        "prefix": {
            "title": "ela"
        }
    }
}'

wildcard查询作为正则表达式的搜索方式。更像是shell通配符的工作方式。举例：

curl 'localhost:9200/get-together/event/_search' -d '{
    "query": {
        "wildcard": {
            "title": "ela*search"
            // "title":"elat?c"
        }
    }
}'

exists过滤器和missing过滤器

exists过滤器允许过滤文档。只查找哪些在特定字段有值的文档，无论值是多少。

curl 'localhost:9200/get-together/event/_search' -d '{
    "query": {
        "filtered":{
            "query":{
                "match_all":{}
            },
            "filter":{
                "exists":{
                    "fields":"title"
                }
            }
        }
    }
}'

missing过滤器让你可以搜索字段里没有值的，或者映射时指定的默认值的文档（也叫做null值）。

curl 'localhost:9200/get-together/event/_search' -d '{
    "query": {
        "filtered":{
            "query":{
                "match_all":{}
            },
            "filter":{
                "missing":{
                    "fields":"title",
                    "existence":true,
                    "null_value": true
                }
            }
        }
    }
}'

文章永久链接：https://tech.souyunku.com/41000

Elasticsearch教程之搜索

搜索请求的结构

1.搜素范围

2.搜索请求的基本组成

3.基于请求主体的搜索

查询和过滤器DSL

相关推荐

热门推荐专题

JetBrains 全家桶，激活、破解、教程