- Understanding Database Indexes
- Types of Indexes in MySQL
- Creating Indexes in MySQL
- Using Indexes in Queries
- Modifying and Removing Indexes
- When to use indexes
Database indexes play a crucial role in optimizing the performance of database-driven web applications. They can significantly improve the speed of read operations by enabling the database management system (DBMS) to quickly locate records based on specific conditions. In this article, we'll provide an introduction to database indexes in MySQL, explore different types of indexes, and explain how to create, modify, and use them with practical examples.
Understanding Database Indexes
A database index is a data structure that allows the DBMS to quickly locate rows in a table based on specific column values. Indexes are analogous to the index of a book, which helps readers find specific topics or keywords more quickly without reading the entire book.
Without an index, the DBMS has to perform a full table scan, which can be slow and resource-intensive, particularly for large tables. By creating an index on one or more columns, the DBMS can look up the index to find the location of the desired rows, thereby reducing the time required for query execution.
However, keep in mind that indexes come with some trade-offs. While they can significantly speed up read operations, they can also slow down write operations (INSERT, UPDATE, DELETE) because the DBMS has to maintain the index data structure. Therefore, it's essential to carefully consider when and how to use indexes in your database schema.
Types of Indexes in MySQL
MySQL supports several types of indexes, each with its own characteristics and use cases:
B-Tree Index: This is the default index type in MySQL and is suitable for most use cases. B-Tree indexes can be used with any data type and can handle equality and range queries efficiently.
Hash Index: This index type is available only for the MEMORY storage engine and is used for equality queries. Hash indexes are not suitable for range queries or queries involving the
LIKEoperator with a leading wildcard.
Spatial Index: This index type is used for spatial data types, such as geometry and geography, and is available for the MyISAM, InnoDB, and NDB storage engines.
Full-Text Index: This index type is designed for text-based columns and supports advanced text search features, such as natural language search and relevance ranking. Full-text indexes are available for the MyISAM and InnoDB storage engines.
We'll be focusing on B-Tree Indexes in this article.
MySQL have an article on the differences between B-Tree and Hash indexes.
Please note: Certain index types are only available on certain storage engines.
Creating Indexes in MySQL
To create an index in MySQL, you can use the
CREATE INDEX statement or include the index definition within the
CREATE TABLE statement.
Here's an example of a simple table named
CREATE TABLE employees ( id INT AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE, department_id INT, hire_date DATE );
Suppose you frequently query the employees table by the
last_name column. In that case, you can create an index on the
last_name column to speed up query execution:
CREATE INDEX idx_last_name ON employees(last_name);
You can also create a composite index on multiple columns if you frequently query the table based on a combination of column values. For example, if you often search for employees by their
department_id, you can create a composite index on these columns:
CREATE INDEX idx_last_name_department_id ON employees(last_name, department_id);
To create an index while defining the table schema, you can use the
KEY keyword within the
CREATE TABLE statement:
CREATE TABLE employees ( id INT AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE, department_id INT, hire_date DATE, INDEX idx_last_name (last_name), INDEX idx_last_name_department_id (last_name, department_id) );
Using Indexes in Queries
MySQL automatically uses the appropriate index when executing queries if it determines that using an index would be more efficient than performing a full table scan. You don't need to explicitly specify the index in your queries.
For example, if you have an index on the
last_name column, MySQL will use it when executing the following query:
SELECT * FROM employees WHERE last_name = 'Doe';
Similarly, if you have a composite index on the
department_id columns, MySQL will use it when executing this query:
SELECT * FROM employees WHERE last_name = 'Doe' AND department_id = 3;
Modifying and Removing Indexes
To modify an existing index, you'll need to first drop the index and then recreate it with the new definition. Use the
ALTER TABLE statement with the
DROP INDEX and
ADD INDEX clauses to achieve this:
ALTER TABLE employees DROP INDEX idx_last_name, ADD INDEX idx_last_name (last_name(10));
In this example, we've modified the index on the
last_name column to store only the first 10 characters of the column values.
To remove an index, use the
ALTER TABLE statement with the
DROP INDEX clause:
ALTER TABLE employees DROP INDEX idx_last_name;
When to use indexes
Formulating a good strategy on when to use indexes is important. As mentioned earlier in this article, having an index on a column (or multiple columns) will dramatically speed up read operations, but also slow down write operations (INSERT, UPDATE, DELETE) because the database engine has to update the index as well as the data itself.
The key to knowing when to use indexes is understanding how they work, and how the application works. For the majority of the time you will be fine adding indexes to columns that are read from commonly, but there will be times where you'll need to formulate an application-level strategy in order to maximise both read and write times.
Example: Ad platform
Let's take a look at an example of an ad platform. Your application allows customers to upload an advert, which is served on websites on your ad network. Your customer can upload the advert in your dashboard, and then report on various data such as impressions and clicks. Indexes will be super important here, because you'll have potentially billions of rows to sift through.
Imagine a table that tracks impressions (when the advert is viewed):
+-------+-------------+-----------+---------------+---------------------+ | ID | customer_id | advert_id | fingerprint | created_at | +-------+-------------+-----------+---------------+---------------------+ | 23637 | 20 | 55 | 64255e95beb2d | 2023-03-30 11:04:01 | | 23638 | 20 | 55 | 64255e95beb2d | 2023-03-30 11:04:06 | | 23639 | 20 | 55 | 64255e95beb2d | 2023-03-30 11:04:18 | +-------+-------------+-----------+---------------+---------------------+
Within our dashboard we're going to want to allow the user to view impression numbers on a per-date basis. For example, we'll want to show some headline numbers such as impressions in the last 30 days, 60 days or 90 days, and possible plot them on a graph. Because of that we'll need to query on two columns:
created_at, so it stands to reason we would need a compound/composite index on those two columns.
But what about writes? This table is potentially recording thousands of impressions every hour, or even every minute. If we have indexes to maintain at the same time then every impression will slow down those queries, which could impact the time it takes to serve the advert, which in turn reduces the number of clicks.
So what do we do?
In this instance it makes sense to use an intermediary table with no indexes. This table collects the data in a 'dumb' fashion, it just hoovers it up with no care about how long it takes to query it later. The application then needs to use a background scheduled task to move the data to an optimised table with the necessary indexes for it to be queried. It's the best of both worlds.
You can take it further still by understand exactly what data the application needs. If the application just needs to show headline figures to a user (total impressions in last 30 days, for example) then these numbers could be collated once a day and stored against the user record, meaning only a single query against the raw data in a 24 hour period.
You can further speed things up by utilising a multi-tenancy strategy where each user's table is stored on its own set of tables - whether that's on its own hardware, virtualised or in the same database - which will greatly reduce the number of rows in each table. This is all out of the scope of this article, but there are plenty of resources available about it.
In this article, we've provided an introduction to database indexes in MySQL. We've explored different types of indexes, their use cases, and how to create, modify, and use them in your database schema. By understanding and effectively using indexes, you can significantly improve the performance of your database-driven web applications, leading to faster response times and better user experiences.
However, it's always important to understand the use-cases of the application itself. It's not always possible to make the best indexing decisions based solely on a view of the data.
Interested in proving your knowledge of this topic? Take the Database Fundamentals certification.
Focusing on SQL and database design, this exam will test your knowledge of database fundamentals.