Vector Search Performance / Optimizing Vector Search with Views

Code Summary: Optimizing Vector Search with Views

The following summarizes the code used to implement a standard view on a MongoDB collection and build a vector search index on it.

Prerequisites

  • MongoDB Atlas Cluster
  • Python
  • The MongoDB Shell

Usage

Connect and Switch to the sample_mflix Database:

The following opens a mongosh session using the provided connection string, then switches to the sample_mflix database.

mongosh <connection-string>

use sample_mflix

Create a Filtered View of Movies with Embeddings:

The following creates a MongoDB view called documents_with_embeddings on top of the embedded_movies collection, filtering out any documents where the plot_embedding field is missing.

db.createView(
  "documents_with_embeddings",
  "embedded_movies",
  [
    {
      $match: {
        $expr: {
          $ne: [
            { $type: "$plot_embedding" },
            "missing"
          ]
        }
      }
    }
  ]
)

Create a Vector Search Index on the Movies View:

The following creates a 2048-dimensional cosine similarity vector search index called EmbeddingsIndex on the plot_embedding field of the documents_with_embeddings view.

db.documents_with_embeddings.createSearchIndex({
  name: "EmbeddingsIndex",
  type: "vectorSearch",
  definition: {
    fields: [{
      type: "vector",
      numDimensions: 2048,
      path: "plot_embedding",  
      similarity: "cosine"
    }]
  }
})

Create a Tenant-Scoped View:

The following creates a MongoDB view called tenantA_docs that filters the documents collection down to only records belonging to tenantA, isolating data for a specific tenant.

db.createView(
  "tenantA_docs",
  "documents",
  [
    {
      $match: {
        $expr: {
          $eq: ["$tenant_id", "tenantA"]
        }
      }
    }
  ]
)

Create a Vector Search Index on the Tenant View:

The following creates a 1024-dimensional dot product vector search index called tenantA_vector_index on the embedding field of the tenantA_docs view, scoping all vector searches to tenantA's documents only.

db.tenantA_docs.createSearchIndex({
  name: "tenantA_vector_index",
  type: "vectorSearch",
  definition: {
    fields: [{
      type: "vector",
      path: "embedding",
      numDimensions: 1024,
      similarity: "dotProduct"
    }]
  }
})