[Opensearch] Vector Search

[์›๋ณธ ๋งํฌ]

Opensearch๋Š” ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์ง€์›ํ•œ๋‹ค.
๋‹ค๋งŒ ์‹œ๊ธฐ๊ฐ€ ์ข€ ๋‹ฌ๋ผ์„œ Elasticsearch์™€๋Š” ๋…์ž์ ์ธ ํ˜•ํƒœ๋กœ ๋ถ„๊ธฐํ•ด์„œ ๋ฐœ์ „ํ•ดํ•˜๊ณ  ์žˆ๋‹ค. ๋‘˜์ด ์™„์ „ ๋‹ค๋ฅด๋‹ค.

AWS์—์„œ ๊ด€๋ฆฌํ•˜๊ณ  ์žˆ๋Š”๋ฐ๋‹ค, ๋ฒกํ„ฐ๊ฒ€์ƒ‰์ด ์š”์ฆ˜ ๊ฐ€์žฅ ๋ˆ์ด ๋˜๋Š” ๋ถ€๋ถ„์ด๊ธฐ๋„ ํ•ด์„œ ๋ฐœ์ „์„ ์—ด์‹ฌํžˆ ์‹œํ‚ค๊ณ  ์žˆ๋‹ค.




๋ฒกํ„ฐ ์ธ๋ฑ์Šค ๋งŒ๋“ค๊ธฐ

์ด๊ฒƒ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“œ๋Š” ์‹œ์ ์—์„œ ๋ฒกํ„ฐ ์˜ต์…˜์„ ์ •์˜ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

### 
PUT http://{{HOST}}:{{PORT}}/vector_index
Content-Type: application/json

{
    "settings": {
        "index.knn": true
    },
    "mappings": {
        "properties": {
            "vector": {
                "type": "knn_vector",
                "dimension": 256,
                "space_type": "innerproduct",
                "mode": "on_disk",
                "method": {
                    "name": "hnsw"
                } 
            }
        }
    }
}  
###

index.knn ์˜ต์…˜์„ ์ผœ๊ณ , ์‚ฌ์šฉํ•  ์œ ์‚ฌ๋„ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋ฒกํ„ฐ ์ฐจ์›, ์ €์žฅ ๋ฐฉ์‹ ๋“ฑ์„ ์„ ํƒํ•˜๋ฉด ๋œ๋‹ค.
์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋Š” l2, ์ฝ”์‚ฌ์ธ์€ cosinesimil, ์ ๊ณฑ์€ innerproduct๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  dimension์ด ๋ฒกํ„ฐ์˜ ๊ธธ์ด, ์ฐจ์›์ด๋‹ค.

์ธ๋ฑ์Šค๋Š” hnsw๊ณผ lvf ์ค‘์— ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค. hnsw๊ฐ€ ์ข€ ๋” ์ผ๋ฐ˜์ ์ธ ์˜ต์…˜์ด๋‹ค.

์ƒ์„ฑํ•˜๋ฉด ์ด๋Ÿฐ ์‹์œผ๋กœ ์˜ต์…˜์ด ๋งŒ๋“ค์–ด์ง€๊ณ 





๋ฒกํ„ฐ ๊ฐ’ ์‚ฝ์ž…

๋ฒกํ„ฐ ๊ฐ’์„ ๋„ฃ์„ ๋•Œ๋Š” ์ผ๋ฐ˜์ ์ธ ์ธ๋ฑ์Šค ํ•„๋“œ์ฒ˜๋Ÿผ ๊ฐ’์„ ์ง‘์–ด๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค.
๊ทธ๋ƒฅ ๋ฒกํ„ฐ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” ์‹ค์ˆ˜ ๋ฐฐ์—ด๋กœ ๋„ฃ์œผ๋ฉด ๋œ๋‹ค.




๋ฒกํ„ฐ ๊ฒ€์ƒ‰

search API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๊ธฐ์กด ๊ฒ€์ƒ‰๊ณผ ๋™์ผํ•˜๋‚˜, knn ์˜ต์…˜์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š”๊ฒŒ ๋‹ค๋ฅด๋‹ค.

### 
POST http://{{HOST}}:{{PORT}}/vector_index/_search 
Content-Type: application/json

{
    "query": {
        "knn": {
            "๋ฒกํ„ฐํ•„๋“œ๋ช…": {
                "vector": [๋ฒกํ„ฐ๊ฐ’],
                "k": 2
            }
        }
    }
}
###

vector๋Š” ๊ฒ€์ƒ‰ํ•  ๊ธฐ์ค€๊ฐ’์ด ๋˜๋Š” ๋ฒกํ„ฐ๊ณ , k๊ฐ€ ๊ฐ€์ ธ์˜ฌ ์ตœ๋Œ€ ๊ฐœ์ˆ˜๋‹ค.

์ €๋ ‡๊ฒŒ ๋‚ ๋ฆฌ๋ฉด

๊ฐ€์žฅ ๋น„์Šทํ•œ ๊ฒƒ 2๊ฐœ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค.
์ „๋ฐ˜์ ์ธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์— ์žˆ์–ด์„œ๋Š” ๋‹ค๋ฅธ Vector DB๋“ค๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง„ ์•Š๋‹ค.



์ฐธ์กฐ
https://docs.opensearch.org/docs/latest/vector-search/creating-vector-index/
https://docs.opensearch.org/docs/latest/vector-search/searching-data/
https://docs.opensearch.org/docs/latest/field-types/supported-field-types/knn-spaces/
https://docs.opensearch.org/docs/latest/query-dsl/specialized/k-nn/index/
https://docs.opensearch.org/docs/latest/vector-search/filter-search-knn/index/
https://docs.opensearch.org/docs/latest/vector-search/filter-search-knn/efficient-knn-filtering/