欢迎您的访问
专注于Java技术系列文章的Java技术分享网站

十五、Elasticsearch 教程: 聚合计算

文章永久连接:https://tech.souyunku.com/?p=2268

聚合框架用于收集搜索查询选择的所有数据。该框架由许多构建块组成,有助于构建复杂的数据摘要

下面的 JSON 对象使用聚合函数的一般请求正文格式

"aggregations" : {
   "<aggregation_name>" : {
      "<aggregation_type>" : {
         <aggregation_body>
      }

      [,"meta" : { [<meta_data_body>] } ]?
      [,"aggregations" : { [<sub_aggregation>]+ } ]?
   }
}

Elasticsearch 提供了大量的聚合函数,它们都有各自不同的目的

矩阵聚合 ( Metrics )

这些聚合函数可以根据聚合文档的字段值计算度量值,而且有时可以从脚本生成一些值

数字矩阵既可以是单值,也可以是平均聚合或多值统计等

平均数聚合 ( avg )

该聚合函数用于计算文档中出现的任何数字字段的平均值

例如

POST http://localhost:9200/user_admin/_search?pretty

请求正文

{
   "aggs":{
      "avg_money":{"avg":{"field":"money"}}
   }
}

返回响应结果

{
  "took" : 160,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "avg_money" : {
      "value" : 1780321.6666666667
    }
  }
}

如果一个或多个聚合文档中不存在此值,默认情况下它们会被忽略

我们可以在聚合中添加缺失字段来设置缺失字段的默认值

{
    "aggs":{
        "avg_money":{
            "avg":{
                "field":"money"
                "missing":0
            }
        }
    }
}

基数聚合 ( cardinality )

基数聚合 ( cardinality ) 用于计算特定字段的不同值的计数

例如

POST http://localhost:9200/user*/_search?pretty

请求正文

{
   "aggs":{
      "distinct_nickname_count":{"cardinality":{"field":"nickname"}}
   }
}

响应内容

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "user",
        "node" : "4zwAMlTzRCaioBeOE9PaNw",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status" : 400
}

很明显响应出错了,提示 Fielddata 要单独加载

好吧,那我们先运行下面的请求来修改下

PUT http://localhost:9200/user*/_mapping/user

请求正文

{
    "properties": {
        "nickname": { 
            "type":     "text",
            "fielddata": true
        }
    }
}

响应内容

{"acknowledged":true}

然后重新发起刚刚报错的请求,响应如下

{
  "took" : 186,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "枫晚",
          "description" : "停车坐爰枫林晚",
          "street" : "苏州大学",
          "city" : "Suzhou",
          "state" : "Jiangsu",
          "zip" : "215006",
          "location" : [
            120.65426,
            31.30797
          ],
          "money" : 10235,
          "tags" : [
            "Java",
            "Android"
          ],
          "vitality" : "3.5"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "question",
          "description" : "问题少年也是少年",
          "street" : "张江高科技园区",
          "city" : "Shanghai",
          "state" : "Shanghai",
          "zip" : "201204",
          "location" : [
            121.60632,
            31.199305
          ],
          "money" : 13648,
          "tags" : [
            "VUE",
            "HTML"
          ],
          "vitality" : "8.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "distinct_nickname_count" : {
      "value" : 9
    }
  }
}

扩展统计聚合 ( extended_stats )

此聚合用于生成有关聚合文档中特定数字字段的所有统计信息

例如

POST http://localhost:9200/user_admin/user/_search?pretty

请求正文

{
   "aggs" : {
      "money_stats" : { "extended_stats" : { "field" : "money" } }
   }
}

响应内容

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "money_stats" : {
      "count" : 3,
      "min" : 68023.0,
      "max" : 5201814.0,
      "avg" : 1780321.6666666667,
      "sum" : 5340965.0,
      "sum_of_squares" : 2.7068555211509E13,
      "variance" : 5.853306500366889E12,
      "std_deviation" : 2419360.762756743,
      "std_deviation_bounds" : {
        "upper" : 6619043.192180153,
        "lower" : -3058399.858846819
      }
    }
  }
}

最大值聚合 ( max )

最大值聚合用于查找聚合文档中特定数字字段的最大值

例如

POST http://localhost:9200/user*/_search

请求正文

{
   "aggs" : {
      "max_money" : { "max" : { "field" : "money" } }
   }
}

响应内容

{
  "took" : 22,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "max_money" : {
      "value" : 5201814.0
    }
  }
}

最小值聚合 ( min )

最小值聚合用于查找聚合文档中特定数字字段的最小值

例如

POST http://localhost:9200/user*/_search?pretty

请求正文

{
   "aggs" : {
      "min_money" : { "min" : { "field" : "money" } }
   }
}

响应内容

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "枫晚",
          "description" : "停车坐爰枫林晚",
          "street" : "苏州大学",
          "city" : "Suzhou",
          "state" : "Jiangsu",
          "zip" : "215006",
          "location" : [
            120.65426,
            31.30797
          ],
          "money" : 10235,
          "tags" : [
            "Java",
            "Android"
          ],
          "vitality" : "3.5"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "question",
          "description" : "问题少年也是少年",
          "street" : "张江高科技园区",
          "city" : "Shanghai",
          "state" : "Shanghai",
          "zip" : "201204",
          "location" : [
            121.60632,
            31.199305
          ],
          "money" : 13648,
          "tags" : [
            "VUE",
            "HTML"
          ],
          "vitality" : "8.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "min_money" : {
      "value" : 10235.0
    }
  }
}

求和聚合 ( sum )

求和聚合 ( sum ) 用于计算聚合文档中特定数字字段的总和

例如

POST http://localhost:9200/user*/_search?pretty

请求正文

{
    "aggs" :  {
      "total_money" : { "sum" : { "field" : "money" } }
    }
}

返回响应

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "枫晚",
          "description" : "停车坐爰枫林晚",
          "street" : "苏州大学",
          "city" : "Suzhou",
          "state" : "Jiangsu",
          "zip" : "215006",
          "location" : [
            120.65426,
            31.30797
          ],
          "money" : 10235,
          "tags" : [
            "Java",
            "Android"
          ],
          "vitality" : "3.5"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "雅少",
          "description" : "虚怀若谷",
          "street" : "四川大学",
          "city" : "Chengdu",
          "state" : "Sichuan",
          "zip" : "610044",
          "location" : [
            104.094537,
            30.640174
          ],
          "money" : 68023,
          "tags" : [
            "Python",
            "HTML"
          ],
          "vitality" : "7.8"
        }
      },
      {
        "_index" : "user",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "question",
          "description" : "问题少年也是少年",
          "street" : "张江高科技园区",
          "city" : "Shanghai",
          "state" : "Shanghai",
          "zip" : "201204",
          "location" : [
            121.60632,
            31.199305
          ],
          "money" : 13648,
          "tags" : [
            "VUE",
            "HTML"
          ],
          "vitality" : "8.8"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "站长",
          "description" : "搜云库技术团队 ,教程 ",
          "street" : "东四十条",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100007",
          "location" : [
            116.432727,
            39.937732
          ],
          "money" : 5201814,
          "tags" : [
            "PHP",
            "Python"
          ],
          "vitality" : "9.0"
        }
      },
      {
        "_index" : "user_admin",
        "_type" : "user",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "nickname" : "歌者",
          "description" : "程序设计也是设计,研发新菜也是研发",
          "street" : "五道口",
          "city" : "Beijing",
          "state" : "Beijing",
          "zip" : "100083",
          "location" : [
            116.346346,
            39.999333
          ],
          "money" : 71128,
          "tags" : [
            "Java",
            "Scala"
          ],
          "vitality" : "6.9"
        }
      }
    ]
  },
  "aggregations" : {
    "total_money" : {
      "value" : 5364848.0
    }
  }
}

此外,还存在一些其它聚合函数用于计算地理位置,如地理边界聚合和地理质心聚合

批量聚合 ( Bucket )

这些聚合包含了许多具有统一标准的不同类型的桶聚合,它们用于确定文档是否应该属于某个桶。

下面我们将会罗列这些桶聚合

子聚合

批量聚合会生成一组文档,这些文档将映射到父桶中

参数 type 用于定义父索引

例如,假如我们有一个品牌及其不同的模型,然后模型类型将包含以下 _parent 字段

{
   "model" : {
        "_parent" : {
            "type" : "brand"
        }
    }
}

还有很多其它的特殊的批量集合,在某些特定的情况下很好用,我们罗列如下

1、Date Histogram 聚合
2、Date Range 聚合
3、Filter 聚合
4、Filters 聚合
5、Geo Distance 聚合
6、GeoHash grid 聚合
7、Global 聚合
8、Histogram 聚合
9、IPv4 Range 聚合
10、Missing 聚合
11、Nested 聚合
12、Range 聚合
13、Reverse nested 聚合
14、Sampler 聚合
15、Significant Terms 聚合
16、Terms 聚合

聚合元数据

可以在请求时使用 meta 参数添加关于聚合的一些数据,然后就可以在响应时获取到这些数据

POST http://localhost:9200/user*/report/_search?pretty

请求正文

{
    "aggs" : {
        "min_money" : {
            "avg" : { "field" : "money" } ,
            "meta" :{"dsc" :"Lowest Moneys"}
        }
    }
}

响应内容

{
  "took" : 30,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_money" : {
      "meta" : {
        "dsc" : "Lowest Moneys"
      },
      "value" : null
    }
  }
}

干货推荐

本站推荐:精选优质专栏

附录:Elasticsearch 教程 系列文章

赞(72) 打赏



版权归原创作者所有,任何形式转载请联系作者;搜云库 » 十五、Elasticsearch 教程: 聚合计算

本站:免责声明!

评论 抢沙发

一个专注于Java技术系列文章的技术分享网站

觉得文章有用就打赏一下文章作者

微信扫一扫打赏

微信扫一扫打赏