Vector Search Performance / Co-located versus Dedicated Search Nodes

6:00
You confirmed it. The index is bigger than available RAM and queries are hitting disk. You bring the findings to your manager and they ask, can't we just add more memory to the existing servers? It's a reasonable question, but dedicated search nodes are a stronger alternative. In this video, we'll look at the two deployment architectures for MongoDB search, colocated nodes and dedicated search nodes. We'll explain how your choice directly affects how much RAM is available to your vector indexes and touch on how sharding lets you scale beyond what a single cluster can hold. Let's start with the two options. In the default setup, MongoDB runs mongod and mongot on the same nodes. This is called a co located or coupled architecture. Your application operational data workload and your search workload share the same hardware, which means they also share the same pool of RAM and CPU. The alternative is dedicated search nodes. In this model, mongot runs on completely separate isolated nodes. Your operational cluster handles reads and writes, and your search nodes handle all your search needs. The two workloads no longer compete for the same resources. So what does resource competition actually look like in practice? On a colocated node, mongod requires a large share of available memory for its storage engine, connections, and runtime operations. After taking the memory required for the OS file system cache, the memory left for the mongot is significantly less than the hardware spec sheet suggests. CPU is also shared. On colocated nodes, mongot's search operations compete with mongod for processing time, which limits how much processing capacity is available for query execution. On colocated nodes, there's also a cold start problem. After a database restart, mongot must reload the HNSW graph from disk one page at a time as queries arrive. This causes a latency spike while the cache warms up because the first queries to arrive are the ones paying the cost of loading data from disk. The cold start problem can happen whenever mongot restarts or the underlying machine reboots, which are normal outcomes, upgrades, failovers and maintenance windows. So back to the question, can't we just add more memory to the existing servers? Adding RAM to a colocated node does raise the ceiling. WiredTiger will still use the same percentage of memory and mongot will benefit from the additional memory available for the OS file system cache to keep its data files in memory. But the contention problem doesn't go away. It just moves to a higher threshold. The most reliable way to give mongot a predictable memory budget for your vector indexes is to move it onto dedicated search nodes where it isn't competing for resources. So what do you actually get when you move to dedicated search nodes? The most immediate benefit is memory. On a dedicated search node, mongot isn't competing with WiredTiger for RAM. Atlas search nodes allocate approximately ninety percent of node RAM to index data with the remainder used by the JVM. The cold start problem is also much less of a concern. mongot on dedicated search nodes has a cache warmer that preloads vector data at startup. So warm up is confined to search node lifecycle events, not triggered by every database restart. Dedicated search nodes also unlock concurrent segment search. The Lucene index is divided into segments, and by default, mongot searches them serially, one after another. Concurrent segment search parallelizes that work, searching multiple segments simultaneously and reducing the time each individual query spends waiting. For vector search queries on dedicated search nodes, this is on by default with no query changes required. For search queries, you can opt in per query using the concurrent option. Either way, the more CPUs available on your search node tier, the more parallelism you get, which is why CPU provisioning matters as much as RAM when sizing your search infrastructure. You also gain the ability to size and scale your search infrastructure independently. Need more query throughput? Add more search nodes. Need lower latency? Upgrade to a higher search tier. You're no longer forced to overprovision your database nodes just to give mongot enough breathing room. With all that said, collocated nodes are not always the wrong choice. If your index is small and query volume is low, collocation is fine. If your index comfortably fits within the memory available to mongot on your dev tier, you won't see the contention problems we've been describing. The signal to move to dedicated search nodes is when those conditions stop being true. If your index has grown to several gigabytes, if you're running under sustained query load, if you have production latency requirements that can't tolerate unpredictable spikes, or if the number of indexes has grown significantly, dedicated search nodes are the right choice. The contention, cold start, and sizing problems we've covered don't get better as your workload scales, they get worse. Nice work. Let's recap what we covered. The default colocated architecture runs mongod and mongot on the same nodes, which means vector search competes with WiredTiger for RAM and CPU. Dedicated search nodes solve the contention problem by giving mongot its own isolated resources, a predictable memory budget, cache warm up at the search node level, and the ability to scale search independently from your database.