Elasticsearch에서 search_after 기능 사용하여 조회하기

elasticsearch에서 search_after를 이용하여 데이터를 조회하는 방법을 정리해보자.

우선 사용할 인덱스를 생성하자.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
PUT wedul
{
  "mappings": {
    "cjung": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "name": {
          "type": "text",
          "analyzer": "nori",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
Colored by Color Scripter
cs

생성된 인덱스에 데이터 몇개만 삽입하여보자.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
POST wedul/cjung?pretty
{
  "id": "wemakeprice",
  "name": "원더쇼핑"
}

POST wedul/cjung
{
  "id": "dauns",
  "name": "다운"
}
cs

그리고 일반적으로 사용하는 방식으로 데이터를 조회해보자.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
GET wedul/cjung/_search
{
  "from": 0, 
  "size": 2, 
  "query": {
    "match_all": {}
  }
}
 
 
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wedul",
        "_type" : "cjung",
        "_id" : "_update",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "id" : "wedul",
            "name" : "정철"
          }
        }
      },
      {
        "_index" : "wedul",
        "_type" : "cjung",
        "_id" : "tSNYH2cBvWxWFgHQJ6J4",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "id" : "dauns",
            "name" : "다운"
          }
        }
      }
    ]
  }
}
 
Colored by Color Scripter
cs

정상적으로 조회가 된다. 하지만 여기서 만약 size가 10000이 넘은 곳을 검색하고 싶다면 어떻게 될까? 저번에 공부해서 정리한 글 처럼 10000개 이상에 데이터에 접근하려고 하면 오류가 발생한다.

참고 : https://wedul.tistory.com/518?category=680504

그럼 어떻게 조회해야 할까? 그래서 제공되는 방법이 search_after를 이용하여 검색하는 방법이다.

search_after는 라이브 커서를 제공하여 다음 페이지를 계속 조회하는 방식으로 검색기능을 제공한다.

기본적인 검색 방법은 다음과 같다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GET wedul/cjung/_search
{
  "sort": [
    {
      "id": {
        "order": "asc"
      },"name.keyword": {
        "order": "desc"
      }
    }
  ], 
  "size": 1, 
  "query": {
    "match_all": {}
  }
}
cs

이렇게 검색을하게 되면 다음과 같이 결과가 나오는데 여기서 나온 sort 필드를 이용하여 다음 필드를 조회해 나가는 것이 search_after 기능이다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "wedul",
        "_type" : "cjung",
        "_id" : "uyNoH2cBvWxWFgHQ86L9",
        "_score" : null,
        "_source" : {
          "id" : "wemakeprice",
          "name" : "원더쇼핑"
        },
        "sort" : [
          "wemakeprice",
          "원더쇼핑"
        ]
      }
    ]
  }
}
 
Colored by Color Scripter
cs

위에 나온 검색결과 sort에 출력된 wemakeprice와 원더쇼핑을 사용하여 다음 데이터를 조회한다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET wedul/cjung/_search
{
  "search_after": ["wemakeprice",
          "원더쇼핑"],
  "sort": [
    {
      "id": {
        "order": "asc"
      },"name.keyword": {
        "order": "desc"
      }
    }
  ], 
  "query": {
    "match_all": {}
  }
}
Colored by Color Scripter
cs

그렇다면 저 위에 sort 필드는 과연 무엇인가 하고 생각이 들 수 있다. sort 필드는 바로 검색 dsl에서 사용했던 sort필드의 값 들이다. 이 값 다음에 나오는 데이터를 조회 하라는 뜻이다. 그렇기 때문에 무조건 search_after를 사용하기 위해서는 데이터를 정렬하는것이 필수이다. 그리고 정말 중요한 것은 그 sort필드에 들어가는 데이터중 하나는 무조건 unique한 값 이어야 한다는 것이다. 그렇지 않으면 어디서 부터 검색을 시작해야할지 알지 못하기 때문이다.

참고

https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-search-after.html

저작자표시 비영리 변경금지 (새창열림)

'데이터베이스 > Elasticsearch' 카테고리의 다른 글

docker logstash 설치 및 log 파일 elasticsearch에 기록 (1)	2019.01.23
Elasticsearch에서 refresh 정리 (0)	2018.11.18
Elasticsearch에서 Full text queries와 Term level queries 정리 (1)	2018.11.01
Elasticsearch query string 조회시 parse exception 에러 처리 (0)	2018.10.31
Elasticsearch에서 Paging시 max_result_window 초과시 조회가 안되는 이슈 (0)	2018.10.13

'데이터베이스 > Elasticsearch' 카테고리의 다른 글

티스토리툴바