Vector Search Fundamentals / Perform a Vector Search
Welcome back. Now that we have a sense for how things work under the hood, let's configure an Atlas vector index. If you're familiar with Atlas Search, you'll find that configuring an index for vector search is similar to configuring an index for Atlas Search. The main difference is that we use a vector field type mapping along with a few other options in the search index definition that we'll explore in this video. But before we configure our vector search index, let's discuss two limitations.
The first one may seem obvious, but it's worth stating. We can only use the vector field type on fields that contain vector embeddings. Second, we can't index fields in sub documents that live inside an array field.
So you should not store your embeddings field in a document that's part of an array of documents. With that in mind, let's get to the fun part, configure our vector search index. To start, let's use the create search index method to create this index on the movies collection.
Next, we'll name the index vector plot index, since we're indexing the vector embeddings. After that, we specify that this is a vector search index. If we leave this blank, it'll default to a regular search index. Now let's configure the index definition. We use the fields option to specify which fields to filter an index. So we'll define the field type as vector.
In the path, we specify the field that contains our embeddings, so the plot underscore embeddings field.
Then we define the number of dimensions.
The Voyage three large model offers flexible dimensionality options for vector embeddings. While the default setting is one thousand twenty four dimensions, you can choose from two hundred and fifty six, five hundred and twelve, or two thousand forty eight dimensions, depending on your use case. Since we didn't specify a custom dimension count during vector generation, our embeddings will use the standard one thousand twenty four dimensional format.
Now you may be wondering what you should do if you don't know or can't remember the number of dimensions from your model.
You can always find this information in the documentation for your embedding model. Or if you've already generated the embeddings, you can count the number of elements in the embeddings array field, because each element represents one dimension. Once we've defined the number of dimensions, we need to choose a similarity function. We have three choices, Euclidean, cosine, or dot product. Each of these measure similarity between vectors across multiple dimensions in a different way, so they can lead to vastly different results.
Let's briefly go over each one of them. Euclidean similarity uses the distance between vectors in a multidimensional space. The underlying formula is derived from the Pythagorean theorem and generalized for any number of dimensions.
Cosine similarity uses the angle between vectors. Know that cosine does not take magnitude into account, and you can't use zero magnitude vectors with cosine. To measure cosine similarity, we recommend that you normalize your vectors and use dot product instead.
Finally, we have dot product. Similar to cosine, it uses the angle between the vectors. However, it also takes magnitude into consideration. To use dot product, the vector must be normalized to unit length at index and query time. So given all this, how do we choose the correct similarity function? The first thing you should do is check your embedding model's documentation to see what the model was trained with.
This will ensure optimal results. If unsure, start with dot product. It's computationally efficient and measures similarity based on both angle and magnitude.
However, if your vectors aren't normalized, test Euclidean and cosine similarity with sample queries to determine which produces the best results for your use case. Now back to our vector search index. The embedding model we choose uses dot product.
Once we've defined our similarity function, we need to consider how our vectors will be stored and processed. This is where the quantization field comes into play. Vector quantization works by compressing your high dimensional embeddings while preserving their semantic relationships.
This means you can reduce storage costs and memory usage substantially compared to full precision vectors while still maintaining search quality that's often indistinguishable from the original embeddings. This is particularly valuable when working with large scale vector datasets where storage and retrieval speed are critical factors.
We have three quantization options, none, scalar, or binary. Each of these handles vector compression differently, offering distinct trade offs between accuracy, memory usage, and performance.
Let's briefly go over each one of them. None uses full fidelity vectors with no compression. This maintains maximum accuracy, but requires the most memory and storage.
Scalar converts float thirty two values to int eight integers, providing three point seven five times RAM reduction while maintaining near identical search accuracy. It's ideal for balancing performance with quality.
Binary converts float thirty two values to single bits, delivering twenty four times RAM reduction and maximum speed. This retains up to ninety five percent search accuracy, making it perfect when storage efficiency is paramount.
So how do we choose the correct quantization method?
Consider your dataset size, accuracy requirements, and resource constraints. For smaller datasets, use none. For larger datasets needing memory optimization, Scalar provides excellent balance.
For massive scale with tight resource budgets, binary offers the most aggressive optimization.
Additionally, you have the ability to bring pre quantized vectors directly to Atlas. If your embedding provider or pre processing pipeline already outputs quantized vectors, Atlas can ingest these directly.
This gives you even more control over your quantization strategy and potentially reducing storage costs from the moment of ingestion.
Now that we know a little bit about quantization, let's update our index definition.
Since we have a large dataset and we're not concerned about resource costs, we'll go with scalar vector quantization.
With that complete, let's move on to adding a pre filter to our index. Pre filtering makes search operations more efficient by filtering out irrelevant data and narrowing down the search space. We'll use a filter when we write a query, so let's configure our index for it. We define our filter below the vector field type definition.
Then, in the path, we specify the field to filter on.
Let's filter on the year field, which will allow us to focus on movies from specific years when we search. As of filming this video, you can filter on boolean, date, object ID, numeric, string, and UUID values, including arrays of these types.
Filtering for more data types will be released in the future, so be sure to check out the documentation for any updates.
Now that we've added a filter, let's create the vector search index. We'll receive a confirmation message signifying that our index is being built.
Behind the scenes, the index is building an HNSW graph using the embeddings we generated for the movie data and any other metadata we'd like to filter our queries on. Because HNSW builds and indexes a graph that contains thousands of dimensions, it's important to keep in mind that HNSW is memory constrained.
To account for this, we recommend using dedicated search nodes for your vector search workloads. This will ensure that you have resource isolation from your main cluster operations, so you can meet your search workloads' indexing and query needs in a cost effective way. Nice job. Let's recap what we learned.
First, we learned that we need to use the vector field type when configuring a search index for vector search. Next, we learned about the Euclidean, cosine, and dot product similarity functions. Finally, we learned how to add a filter to our index. Next, we get to run some searches. See you soon.
