Advanced Schema Design Patterns and Anti-patterns / Identify Advanced Anti-patterns

6:47

Video Transcript (English)

Most apps these days have search bars that return desired results regardless of whether the user inputs upper or lower case letters.

But you might have had that frustrating experience when you search for something in an app but don’t get the results you want unless you use the exact capitalization.

The legacy search feature in our bookstore app, which was implemented without MongoDB Atlas Search, has become very slow and isn’t returning the search results our users expect.

Upon digging into this further, we find that our database queries are doing an exact match on the search terms.

This is usually not a problem, but what if we want our search terms matched more broadly?

In this video, we’ll discuss the case-sensitivity anti-pattern and how to solve it.

When upper and lowercase letters are treated differently, this is known as case-sensitivity.

In this example, we search for “Apple” with a capital “A” and the only result returned is an exact match.

On the other hand, when upper and lowercase letters are treated the same, we refer to this as case-insensitive.

Here, searching for the same capitalized “Apple”, the results include all apple words no matter the case.

Keep in mind that MongoDB queries are case-sensitive by default.

Using the default MongoDB query settings but expecting search terms to ignore case is known as the “case sensitivity anti-pattern.”

As well as producing unexpected results, this anti-pattern can also reduce performance.

You may be asking: How do I configure MongoDB to support case-insensitive queries?

Great question!

The answer is to use collations.

A collation defines the language-specific rules that are used by MongoDB to determine how characters in a string are sorted and compared during a query.

Collations must specify a locale so language specific rules are used during the comparison.

Collations also have a strength, or comparison level, from 1 to 5.

The default strength is 3 which effectively makes comparisons case sensitive.

In this example, we set the locale to English.

To make the comparison case insensitive we need to set the strength to 1 or 2.

A value of 1 compares base characters only. Ignoring diacritics, or accents, and case. For example, a query for “viola” will return all results regardless of the letter casing or accenting on letters.

A value of 2 will include secondary differences such as diacritics. Here, a query for “viola” will only return results that include the accent on “a”.

There are a couple of different ways to use collations in order to ensure our queries are case-insensitive.

First, we can build an index with a given collation; making sure the collation is not case sensitive.

We then specify that same collation in our query.

We can also assign a default collation to a collection when we create that collection.

Keep in mind, the default collation assigned to a collection applies to all indexes on that collection. This collation cannot be changed after the collection has been created.

Any queries against the collection will also use the same collation.

We can override the collation at the query level only. However, we don’t recommend it when using a default collation on a collection.

We may also be tempted to use the regex operator with the i option.

The regex operator in MongoDB is well supported by indexes when used for exact matches.

However, it is not very efficient when our search terms are case-insensitive.

As a result, this approach isn’t recommended. These queries cannot efficiently use case-insensitive indexes, so they can be very slow in large collections.

Let's take a look at how we can fix the Case Sensitivity Anti-Pattern in our bookstore app.

Our online bookstore has a search box that allows you to find books by title.

As we mentioned previously this is a legacy feature and it uses inefficient regex matching with the i option, which is not recommended.

We have an index with a case-insensitive collation, but our query is not using it because we are using regex.

To solve this problem we need to rewrite our query, shown here, to use our case-insensitive index without using $regex.

Remember, when we write the query, it must use the same collation as the case-insensitive index and the correct locale, or else we will experience the Case Sensitivity anti-pattern.

With this new query, bookstore users can now efficiently query books by title.

Let’s look at another example.

The bookstore also has a search box for authors which supports case-insensitive queries.

The feature is returning the expected results but the query response time is very high.

Our query includes the desired collation.

But if we take a look at the explain() output for the query we can see that it is performing a collection scan, so it is not using an index.

Using getIndexes() to check our indexes, we see that there is an index for the author field but it does not have a collation that matches the query and therefore it is case sensitive and will not be used.

This is another form of the Case Sensitivity anti-pattern, where case-insensitive queries are missing a case-insensitive index to support them.

To fix this problem we need to drop the current index on the author field and recreate it to make it case-insensitive using the same collation as our queries.

Now our query is supported by a case-insensitive index and the response time should improve.

We fixed the examples of the case-sensitivity anti-pattern in this demonstration, but if you are running your applications on MongoDB Atlas, you should consider using Atlas Search for advanced search features.

While a case-insensitive index improves performance for case-insensitive queries, Atlas Search queries significantly improve the performance of text queries and offer more options for customizing query parameters.

Let's recap what you learned in this video:

The case sensitivity antipattern occurs when queries return unexpected results due to improperly configured case insensitive queries and indexes.

To solve this antipattern, case-insensitive queries must be supported by case-insensitive indexes and queries and indexes must have the same collation to provide case-insensitive results.

Additionally, case-insensitive queries and indexes must have a collation strength of 1 or 2.

A default collation can be set when we create a collection to ensure that queries and indexes on that collection use the same collation and produce case-insensitive results.

Hopefully you are now more sensitive to case. Until we meet again. Keep learning!