[MongoDB] Vector Search

[์›๋ณธ ๋งํฌ]

MongoDB๋Š” ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์„ ๊ณต์‹์ ์œผ๋กœ ์ง€์›ํ•œ๋‹ค.

7.* ๋ฒ„์ „์—์„œ๋Š” ์ „์šฉ ํด๋ผ์šฐ๋“œ์—์„œ๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ด์„œ ํฌ์ง€์…˜์ด ์ง„์งœ ์• ๋งคํ–ˆ๋Š”๋ฐ, 8.*์—์„œ๋Š” ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์ด ์˜คํ”ˆ์†Œ์Šค ๋ฒ„์ „์—๋„ ํฌํ•จ๋˜์–ด์žˆ๋‹ค.




์ œํ•œ ๋ฐ ํ˜ธํ™˜์„ฑ

MongoDB Cloud๋ฅผ ์“ด๋‹ค๋ฉด 7~8 ๋ฒ„์ „์—์„œ ๋ชจ๋‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ , ์˜คํ”ˆ์†Œ์Šค ๋ฒ„์ „์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด 8 ๋ฒ„์ „์„ ์‚ฌ์šฉํ•ด์•ผ๋งŒ ํ•œ๋‹ค. Linux๋ฉด ๋Œ€์ถฉ ๋‹ค ๋œ๋‹ค.
https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/compatibility-limitations/




ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์„ธํŒ…

๋จผ์ € ๋ฐ์ดํ„ฐ๋ถ€ํ„ฐ ๊น”์•„๋ณด์ž. ๋ฒกํ„ฐ ์ž์ฒด๋Š” ๊ทธ๋ƒฅ ์‹ค์ˆ˜์˜ ๋ฐฐ์—ด๋กœ ๋„ฃ์œผ๋ฉด ๋œ๋‹ค.

db.products.insertMany([
  {
    name: "๋นจ๊ฐ„ ์›ํ”ผ์Šค",
    category: "๋“œ๋ ˆ์Šค",
    description: "์—ฌ๋ฆ„์šฉ ํ”Œ๋กœ๋Ÿด ์›ํ”ผ์Šค",
    embedding: [0.1, 0.8, 0.3, 0.5, 0.2],  
  },
  {
    name: "์ฒญ๋ฐ”์ง€",
    category: "๋ฐ”์ง€",
    description: "์Šฌ๋ฆผํ• ๋ฐ๋‹˜ ํŒฌ์ธ ",
    embedding: [0.9, 0.1, 0.7, 0.2, 0.4],
  },
  {
    name: "ํฐ ๋ธ”๋ผ์šฐ์Šค",
    category: "์ƒ์˜",
    description: "์˜คํ”ผ์Šค๋ฃฉ ์‹œ์Šค๋ฃจ ๋ธ”๋ผ์šฐ์Šค",
    embedding: [0.3, 0.6, 0.1, 0.8, 0.5],
  },
  {
    name: "ํ”Œ๋ฆฌ์ธ  ์Šค์ปคํŠธ",
    category: "์Šค์ปคํŠธ",
    description: "๋ฏธ๋”” ๊ธธ์ด ์ฃผ๋ฆ„ ์Šค์ปคํŠธ",
    embedding: [0.2, 0.7, 0.4, 0.6, 0.3],
  },
]);




๋ฒกํ„ฐ ์ธ๋ฑ์Šค ์ƒ์„ฑ

๊ทธ๋Ÿผ ์ด์ œ ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“ค์–ด๋ณด์ž.
๋‹ค์Œ ์Šคํฌ๋ฆฝํŠธ๋Š” ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค๊ฐ€ ํฌํ•จ๋œ ๋ฒกํ„ฐ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

// 2. Vector Search ์ธ๋ฑ์Šค ์ƒ์„ฑ
db.products.createSearchIndex(
  "vector_index", // ์ด๋ฆ„
  "vectorSearch",
  {
    fields: [
    // ๋ฒกํ„ฐ ์ธ๋ฑ์Šค
      {
        type: "vector",
        path: "embedding",
        numDimensions: 5,
        similarity: "cosine",
      },
      // ํ•„ํ„ฐ๋ง์„ ์œ„ํ•œ ์นดํ…Œ๊ณ ๋ฆฌ ํ•„๋“œ
      {
        type: "filter",
        path: "category",
      },
    ],
  }
);

์ฐจ์›์€ 5, ์œ ์‚ฌ๋„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ฝ”์‚ฌ์ธ์œผ๋กœ ๋„ฃ์—ˆ๋‹ค.
์œ ์‚ฌ๋„ ํƒ€์ž…์€ euclidean, cosine, dotProduct๊ฐ€ ์ง€์›๋œ๋‹ค.

๋‹น์—ฐํžˆ HNSW ๊ธฐ๋ฐ˜์˜ ๊ทผ์‚ฌ์น˜ ๊ฒ€์ƒ‰ ์ธ๋ฑ์Šค๊ณ ,
๋™์ž‘์ด๋‚˜ ๋‚ด๋ถ€ ๊ตฌํ˜„ ๋ฐฉ์‹์€ Elasticsearch/Opensearch๊ณผ ๊ฑฐ์˜ ๊ฐ™๋‹ค.

์„ธ๋ถ€์ ์ธ ์˜ต์…˜์€ ๋‹ค์Œ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•œ๋‹ค.
https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/vector-search-type/?deployment-type=atlas&embedding=byo&interface=driver&language=python#mongodb-vector-search-index-fields




๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ

๋ฒกํ„ฐ ๊ฒ€์ƒ‰์€ ์ผ๋ฐ˜ find๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ , aggregate์˜ ํ•œ ์ ˆ๋กœ๋งŒ ์ง€์›๋œ๋‹ค.
๋‹ค์Œ๊ณผ ๊ฐ™์ด $vectorSearch ์ ˆ์„ ๋„ฃ์–ด์„œ ๊ฒ€์ƒ‰์„ ๋Œ๋ฆฌ๋ฉด ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

db.products.aggregate([
  {
    $vectorSearch: {
      index: "vector_index",
      path: "embedding",
      queryVector: [0.2, 0.7, 0.3, 0.6, 0.4],  // ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ ๋ฒกํ„ฐ
      numCandidates: 10,   // ํ›„๋ณด๊ตฐ (Candidates)
      limit: 3,            // ์ตœ์ข… ๋ฐ˜ํ™˜ ์ˆ˜ (K)
    },
  },
  {
    $project: {
      _id: 0,
      name: 1,
      category: 1,
      description: 1,
      score: { $meta: "vectorSearchScore" },  // ์œ ์‚ฌ๋„ ์ ์ˆ˜
    },
  },
]);

๊ทธ๋Ÿผ ์ด๋Ÿฐ ์‹์œผ๋กœ, ์œ ์‚ฌ๋„ ์ ์ˆ˜์™€ ๋”๋ถˆ์–ด ์œ ์‚ฌ๋„๋กœ ์ •๋ ฌ๋œ ๊ฐ’์„ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

filter ์ ˆ์„ ์‚ฌ์šฉํ•ด์„œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์— ์ ์šฉํ•  ํ•„ํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.

์ €๊ธฐ์— ๋“ค์–ด๊ฐ€๋Š” ํ•„ํ„ฐ ์‹์˜ ํ•„๋“œ๋Š”, ๋ฒกํ„ฐ ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“ค๋•Œ ๋„ฃ์–ด์ค€ filter ์ปฌ๋Ÿผ๋งŒ ๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค. ์•„๋‹ˆ๋ฉด ์—๋Ÿฌ๊ฐ€ ๋‚œ๋‹ค.

pre-filtering์˜ ์Šค์บ” ๋ฐฉ์‹์€ Elasticsearch/Opensearch์™€ ๊ฑฐ์˜ ๋น„์Šทํ•˜๋‹ค. ํ•„ํ„ฐ๋ง์œผ๋กœ ์ธํ•ด ๋งŽ์€ ํ•ญ๋ชฉ์ด ๊ฑธ๋Ÿฌ์ ธ์„œ HNSW ์Šค์บ” ๋ฒ”์œ„ ๋‚ด์— ์ถฉ๋ถ„ํ•œ K๊ฐ€ ์—†๋‹ค๋ฉด, ์ถฉ๋ถ„ํ•œ K๊ฐ€ ๋‚˜์˜ฌ๋•Œ๊นŒ์ง€ HNSW ์Šค์บ”์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.



์ฐธ์กฐ
https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/vector-search-overview/
https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/tutorials/vector-search-quick-start/?deployment-type=atlas&embedding=byo&interface=driver&language=python
https://www.mongodb.com/company/blog/product-release-announcements/supercharge-self-managed-apps-search-vector-search-capabilities
https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/vector-search-type/?deployment-type=atlas&embedding=byo&interface=driver&language=python