[Vector Search] Vector์˜ ์ข…๋ฅ˜

๋ฒกํ„ฐ ๊ฒ€์ƒ‰(Vector search)๊ณผ ๋งž๋‹ฟ์•„์žˆ๋Š” ๋‚ด์šฉ์ด๋‹ค.
https://blog.naver.com/sssang97/223790220320

๊ฒ€์ƒ‰์— ์‚ฌ์šฉํ•˜๋Š” ๋ฒกํ„ฐ ํ˜•ํƒœ๋Š” ํ•œ๊ฐ€์ง€๋งŒ ์žˆ๋Š”๊ฒŒ ์•„๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๊ฒƒ๋งŒ ์ข…๋ฅ˜๋ณ„๋กœ ์ •๋ฆฌํ•ด๋ณด๊ฒ ๋‹ค.




Dense Vector

๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ์˜ ๋ฒกํ„ฐ๋‹ค. ๋ณดํ†ต ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด ์ด๊ฒŒ ๊ธฐ๋ณธ๊ฐ’์ด๋‹ค.

๊ธธ์ด๊ฐ€ ๊ณ ์ •๋œ ํ˜•ํƒœ์˜ ๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฐ์—ด์ด๋ฉฐ, ๊ฐ ์š”์†Œ๋Š” -1~1 ๋ฒ”์œ„๊ฐ’์ด ๋“ค์–ด๊ฐ„๋‹ค.

[
    -0.013052909,
    0.020387933,
    -0.007869,
    -0.11111383,
    -0.030188112,
    -0.0053388323,
    0.0010654867,
    0.072027855,
    -0.04167721,
    0.014839341,
    -0.032948174,
    -0.062975034,
    -0.024837125,
    ....
]

์‚ฌ์‹ค ํŠน๋ณ„ํ•  ๊ฒƒ์€ ์—†์ง€๋งŒ, ์—ฌ๊ธฐ์„œ ์ข€ ์ค‘์š”ํ•œ ๊ฒƒ์€ 0์ธ ๊ฐ’์ด ๊ฑฐ์˜ ์—†์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.




Sparse Vector

๊ทธ๋ ‡๊ฒŒ ์ž์ฃผ ์“ฐ์ด๋Š”๊ฑด ์•„๋‹ˆ์ง€๋งŒ, ์ข…์ข… ์“ฐ์ด๊ธด ํ•œ๋‹ค.

์ด๊ฒƒ๋„ ์‚ฌ์‹ค dense vector์™€ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€๋Š” ์•Š๋‹ค. ๋™์ผํ•˜๊ฒŒ -1~1 ๋ฒ”์œ„์˜ ์‹ค์ˆ˜๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” ์‹ค์ˆ˜ ๋ฐฐ์—ด์ด๋‹ค.
dense vector์™€ ๋‹ค๋ฅธ ์œ ์ผํ•œ ์ง€์ ์€, ๋Œ€๋ถ€๋ถ„์˜ ์š”์†Œ๊ฐ€ 0์ด๊ณ  ์‹ค์ œ ๊ฐ’์ด ์ ๊ฒŒ ๋“ค์–ด์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์ด๊ฒŒ ๋ฒกํ„ฐ ์ธ๋ฑ์Šค ๊ตฌํ˜„์— ์žˆ์–ด์„œ๋Š” ์ตœ์ ํ™” ๋ฐฉ์‹์ด ์ข€ ๋‹ฌ๋ผ์ง€๋Š” ๋ถ€๋ถ„์ด๋ผ์„œ, Vector DB๋“ค์—์„œ๋Š” ์ด๊ฑธ ์˜ˆ์™ธ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์“ธ๋ฐ์—†๋Š” 0 ๊ฐ’์„ ๋‹ค ๋„ฃ์–ด์„œ ์ „๋‹ฌ๋ฐ›์ง€๋„ ์•Š๊ณ , ์‹ค์ œ๋กœ ๊ฐ’์ด ์žˆ๋Š” ์ง€์ ์˜ ์ธ๋ฑ์Šค์™€ ๊ฐ’์„ ๋ช…์‹œ์ ์œผ๋กœ ์ „๋‹ฌ๋ฐ›๋Š” ์‹์œผ๋กœ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜๊ณค ํ•œ๋‹ค.

์ด๋Ÿฐ ์‹์œผ๋กœ ๋ง์ด๋‹ค.

// A sparse vector with 4 non-zero elements
{
    "indexes": [1, 3, 5, 7],
    "values": [0.1, 0.2, 0.3, 0.4]
}


์ฐธ์กฐ
https://medium.com/@zilliz_learn/similarity-metrics-for-vector-search-62ccda6cfdd8