Advanced Schema Patterns and Antipatterns / Identify Advanced Antipatterns
Most apps these days have search bars that return the search results regardless of whether the user inputs upper- or lowercase letters. But you might have had that frustrating experience when you search for something in an app but don't get the results you want unless you use the exact capitalization. The legacy search feature in our bookstore app, which was implemented without MongoDB Atlas Search, has become very slow and isn't returning the search results our user expects. Upon digging into this further, we find out that our database queries are doing an exact match on the search terms.
This is usually not a problem. But what if we want our search terms to match more broadly. In this video, we'll discuss the case sensitivity antipattern and how to solve it. When upper- and lowercase letters are treated differently, this is known as case sensitivity.
In this example, we search for "Apple" with a capital A. And the only result returned is an exact match. On the other hand, when upper- and lowercase letters are treated the same, we refer to these as case insensitive. Here, searching for the same capitalized "Apple," the results include all "apple" words, no matter the case.
Keep in mind that MongoDB queries are case sensitive by default. Using the default MongoDB query settings but expecting search terms to ignore case is known as case-sensitivity antipattern. As well as producing unexpected results, this antipattern can also reduce performance. You may be asking, how do I configure MongoDB to support case-insensitive queries?
Great question. The answer is to use collations. A collation defines the language-specific rules that are used by MongoDB to determine how characters in a string are sorted and compared during a query. Collations must specify a locale.
So language-specific rules are used during the comparison. Collations also have strength or comparison level from 1 to 5. The default strength is 3, which effectively makes comparisons case sensitive. In this example, we set the locale to English.
To make the comparison case insensitive, we need to set the strength to 1 or 2. A value of 1 compares base characters only, ignoring diacritics or accents and case. For example, a query for "voila" will return all results, regardless of the letter casing or accenting on letters. A value of 2 includes secondary differences and diacritics.
Here a query for voila will only return results that include the accent on A. There are a couple of different ways to use collations in order to ensure our queries are case insensitive. First, we can build an index with a given collation, making sure the collation is not case sensitive. We then specify that same collation in our query.
We can also assign a default collation to a collection when we create that collection. Keep in mind, the default collation assigned to a collection applies to all indexes on that collection. This collation cannot be changed after the collection has been created. Any queries against the collection will also use the same collation.
We can override the collation at the query level only. However, we don't recommend it when using default collation on a collection. We may also be tempted to use the regex operator with the i option. The regex operator in MongoDB is well supported by indexes when used for exact matches.
However, it is not very efficient when our search terms are case insensitive. As a result, this approach isn't recommended. Let's take a look at how we can fix the case sensitivity antipattern in our bookstore app. Our online bookstore has a search box that allows you to find books by title.
As we mentioned previously, this is a legacy feature. And it uses inefficient regex matching with the i option, which is not recommended. We have an index with a case-insensitive collation. But our query is not using it, because we are using regex.
To solve this problem, we need to rewrite our query shown here to use our case-insensitive index without using regex. Remember, when we write the query, it must use the same collation as the case-insensitive index and the correct locale, or else we will experience the case-sensitivity antipattern. With this new query, bookstore users can now efficiently query books by title. Let's take a look at another example.
The bookstore also has a search box for authors, which supports case-insensitive queries. The feature is returning the expected results. But the query response time is very high. Our query includes the desired collation.
But if we take a look at explain output for the query, we can see that it's performing a collection scan. So it's not using an index. Using getIndexes to indexes, we see that there is an index in the author field. But it does not have the collation that matches the query.
And therefore, it is case sensitive and will not be used. This is another form of the case-sensitivity antipattern, where case-insensitive queries are missing a case-insensitive index to support them. To fix this problem, we need to drop the current index on the author fields and re-create it to make it case insensitive using the same collation as our queries. Now our query is supported by a case-insensitive index.
And the response time should improve. We fixed the examples of the case-sensitivity antipattern in this demonstration. But if you are running your application on MongoDB Atlas, you should consider using Atlas Search for advanced search features. While a case-insensitive index improves performance for case-insensitive queries, Atlas Search queries significantly improve the performance of text queries and offer more options for customizing your query parameters.
Let's recap what you've learned in this video. The case-sensitivity antipattern occurs when queries return unexpected results due to improperly configured case-insensitive queries and indexes. To solve this antipattern, case-insensitive queries must be supported by case-insensitive indexes and queries. And indexes must have the same collation to provide case-insensitive results.
Additionally, case-insensitive queries and indexes must have a collation strength of 1 or 2. A default collation can be set when we create a collection to ensure that queries and indexes on that collection use the same collation and produce case-insensitive results. Hopefully, you are now more sensitive to case. Until we meet again, keep learning.
