Voyage AI with MongoDB / Chunking and context
Your embedding model is chosen, but more decisions remain. How do you handle documents that are tens of thousands of words long? While you can generate embeddings for a large document, you may want to consider another approach to ensure fine grained, high quality retrieval.
In this video, we'll help you understand the role of context length and chunking as you plan for your workload. And we'll introduce you to voyage-context-3, which makes working with large documents much easier.
Embedding models have a limit on how much data you can provide in a single input. This limit is called the context length or context window, and it's measured in tokens. For text, a token is a small unit, such as a word, subword, or character that the model uses as its basic building block. In English, one token is roughly four characters or about three quarters of a word. The process of converting raw input into tokens is called tokenization, and it can vary from model to model. For most of Voyage AI's current models, the context length is 32,000 tokens, which is approximately 24,000 English words. That's more than enough for the vast majority of documents.
Context length matters because if your documents are longer than your model's context window, you'll need to split them into smaller pieces. And that brings us to chunking. Chunking is the process of splitting a large document into smaller segments before embedding.
But even when context length isn't a constraint, chunking can still improve retrieval accuracy. Here's why. You can think of embedding as a form of information compression. When you embed a short passage that focuses on one or a few concepts, the vector representation captures those concepts well.
But a larger document often contains many different ideas. And the embedding has to compress all of that information into a single vector. As a result, searching over that embedding may struggle to find the specific concept that you're looking for. By splitting documents into smaller chunks and embedding each one separately, the resulting vectors capture fine grained, localized information.
Your search system can then more precisely identify relevant sections. For example, imagine a fifty page legal document embedded as a single vector. A specific clause about AES-256 encryption in GCM mode could easily get diluted in the overall representation. But if you split the document into paragraphs and embed each one of those, those vectors are much more likely to preserve that detail.
However, there's a trade off. While smaller chunks help us capture fine grained context, an individual chunk may lose broader context from the full document. For instance, a paragraph might not include the client's name, making it harder to answer a query like, what encryption methods does client X require? We can try to solve this by correlating the context between chunks through common techniques such as maintaining overlap between chunks, adding metadata, or chunking the text in line with the original document's structure.
But ultimately, these techniques add implementation complexity and usually require extensive testing and continuous fine tuning. Voyage AI addresses this with voyage-context-3, a contextualized chunk embedding model that processes the entire document in a single pass while generating a distinct embedding for each chunk. Each vector encodes both the local passage information and the document level context.
The model determines which global information from other sections should be incorporated into each chunk's embedding, giving each vector both local precision and document level context. Contextualized chunk embeddings are especially effective for long, unstructured documents, cross chunk reasoning where queries span multiple sections, and high sensitivity retrieval tasks in domains like legal, medical, or finance. In short, chunking isn't just a workaround for long documents. It's a core part of how you build precise, context aware retrieval systems.
Great work. Let's recap what we covered. Context length defines how much text a model can process at once, measured in tokens.
For most Voyage AI models, that's 32,000 tokens, enough for most documents. When documents exceed that limit, or when you want more precise retrieval, chunking breaks them into smaller, more focused segments.
And for cases where cross chunk context matters, voyage-context-3 encodes both local passage information and document level context in every chunk embedding.
