Advanced Schema Patterns and Antipatterns / Identify Advanced Antipatterns
In theory, MongoDB can handle an unlimited number of collections. But the real world has limits. And they come in the form of hardware and workloads. So you'll typically see a decrease in performance if you have too many collections.
The recommended limit for the number of collections in a replica set is 10,000. But in practice, this limit depends on your workload and database resources. So the question is, what should you do if you have too many collections? In this video, we'll discuss the massive number of collections and the pattern and how to solve it.
When modeling data with MongoDB, it is natural to organize it into separate documents and separate collections. For example, imagine we are storing weather data obtained from multiple sensors that take measurements every minute. We may decide to use a collection per day to keep the number of documents per collection low. But this data model has a problem.
Unless we drop the old collections, the number of collections is unbounded. Pretty soon, our database will be managing tens of thousands of collections and indexes. Having too many collections slows down performance. This is because MongoDB's default storage engine, WiredTiger, stores a separate file for each collection and each index.
And every collection in MongoDB has at least one index file. Users who host their databases in Atlas typically begin to see a decrease in performance once they exceed 5,000 collections on an M10 cluster. On an M20 or an M30 cluster, this begins around 10,000 collections. We also see an impact once our replica set exceeds 10,000 collections.
Sharded clusters, on the other hand, handle up to 10,000 collections per shard. When our database has more than the recommended number of collections and performance is affected, we've run into the massive number of collections antipattern. One way to mitigate this is to drop or archive unused collections. To do so, we recommend regularly monitoring your database for collections that aren't being used.
So you can drop or archive them after a certain period of time. If you still have a large number of active collections after doing this, your existing schema is not an optimal solution. And you should remodel your data. Let's examine our bookstore application to learn how updating schema design can help.
We wanted to track the number of user views per book in our database. To do this, we created separate views collections for each book in our inventory. Every user view creates a separate view document in the corresponding collection. As the number of books and their related piece grows, we learned that we've made a mistake.
With more books come more piece collections. And we quickly exceed the recommended number of collections in our database. In this case, we can easily fix the problem by placing all the views in one collection. This significantly reduces the number of collections we have.
So our database remains performant. Let's recap what you've learned in this video. The massive number of collections antipattern occurs when database surpasses the recommended limit. One way to avoid or mitigate this antipattern is to drop or archive and use collections.
If the number of collections continues to be a problem, updating the schema design can help reorganize your data and decrease the number of collections in your database. Sharding your database is also another potential solution. Great job. See you in the next one.
