Relational to Document Model / Model for Workloads
The first step in the journey toward an effective data model is identifying our workload. To identify the workload, we need to identify and quantify the database entities. In this video, we'll examine features of the bookstore application to identify database entities and their attributes. Then, we'll quantify them with information from our internal stakeholders.
As a quick recap, entities are the things that exist in our database and are unique and independent of each other. For example, an entity could be a person, a product, an organization, or a location. Attributes are the individual properties that describe an entity. They represent specific characteristics or information associated with an entity.
For example, a product's attributes could be the name, description, and price of a product. Identifying entities and their attributes first will help us understand how the data is used in the application. Later, we'll use this information to identify reads and writes. Let's take a look at the functional requirements of our app in order to understand what entities we'll be working.
With our bookstore app, we want users to be able to search for e-books, audiobooks, and printed books. Users can select a book to see more information about it, and to leave book reviews and ratings. This is just a basic outline, but we can already get a sense for the database entities we'll need. Users need to be able to search for e-books, audiobooks, and printed books.
Therefore, it's safe to assume that each of these represents an entity, since they are unique, independent things that live in our database. Each entity will have a set of attributes associated with it. Let's take a look at the e-book entity first. An e-book should have an SKU as a unique identifier.
When a user clicks on a book, they'll want to see additional information. So it should contain attributes for title, author, publisher, language, price, summary, rating, release date, and the number of pages. The other two types of books, audiobooks and printed books, will also have the same attributes as an e-books plus additional attributes. Audiobooks are the only book type that have attributes for duration and list of narrators.
Both e-books and printed books include an attribute for the number of pages. Printed books are the only physical entities that need to be stored and shipped. For that reason, only printed books have an attribute for stock levels and delivery times. So far, we've defined entities related to the media type of each book, along with the attributes.
But sometimes, an attribute can be expanded into its own entity. Author and publisher are currently attributes for our different book entities, but we'll make them their own entities, since they can be used independently. Authors will have a unique identifier, "author ID," to avoid having duplicates of the same author. Users will also be interested in learning a bit about each author, so we will include attributes such as name, birth year, biography, and their social media links.
The publisher entity will include similar attributes to the author entity, but instead of having a birth year attribute, publishers will have a founded date. After thinking about the search feature and what happens when a user clicks on a book, we've identified e-books, audiobooks, printed books, authors, and publishers as the entities. Let's examine one more feature of our app to see if there are any more. We want users to be able to rate and review a book.
For this, it looks like we need to add two more entities for users and reviews. Let's look at their attributes. A user is uniquely identified by a user ID. Users also have attributes for name, email address, phone number, delivery address, and when they become a member.
A review will have fields for the product SKU, review date, rating, and the content of the review. Great. We've identified the following entities. E-books, audiobooks, printed books, authors, publishers, users, and reviews.
The next step is to quantify these entities. For this, we'll rely on data provided by the internal stakeholders. Let's compile this data into a table to visualize each entity and its quantity. Based on the business requirements, most of the books available will be e-books, at 450,000 titles, followed by audiobooks, at 200,000.
Printed books require us to hold physical inventory, so we'll only stock what we expect to be the top sellers. The total comes to around 50,000 for printed books, according to stakeholders. Our business plan assumes we'll hit 20,000 authors within the first three years of operation. For now, we're not sure how many publishers we're going to have, but we'll make a rough estimate of 500.
Next, let's move on to quantifying the users and reviews. Like before, we have to rely on the information provided by internal stakeholders. Our business goal for this bookstore is to have 25 million users after three years of operation. The marketing strategy is to have a community-driven experience, so we will be encouraging users to leave reviews.
With this in mind, we are going to assume reviews is one of our largest entities, at 1 billion. For the most part, this is an educated guess. From our chart, we can now clearly see our entities with quantities. We can also see that users and reviews are going to be a large amount of our data, which will be important to consider.
Awesome job. In this video, we learned how to identify entities based on application usage. Once we identified the entities, we quantified them using input from internal business stakeholders.
