Home » » What is Database Indexing?

What is Database Indexing?

What is Database Indexing?

Database indexing is a technique that improves the efficiency and speed of accessing data from a database table. It involves creating a data structure, called an index, that stores a subset of the table’s columns and pointers to the corresponding rows. An index can be compared to a book’s index, which helps you find the page number of a specific topic without scanning the entire book.

How Does Database Indexing Work?

When you query a database table, the database engine has to search through all the rows in the table to find the ones that match your filter conditions. This can be very slow and resource-intensive, especially if the table has millions of rows. However, if the table has an index on one or more columns that are used in the query, the database engine can use the index to quickly locate and retrieve the matching rows.

An index is organized as a sorted list or a tree structure, depending on the type of index. Each entry in the index consists of two parts: a key value and a pointer. The key value is a copy of one or more columns from the table that form the index key. The pointer is a reference to the disk block where the row with that key value is stored. The index is sorted by the key value, which allows for fast binary search or tree traversal operations.

To illustrate how database indexing works, let’s use an example of a table that stores information about bank accounts:

acc_nonamebalancebranch
1001Alice5000A
1002Bob3000B
1003Charlie4000A
1004David2000C
1005Eve6000B

Suppose we want to query this table to find all accounts with balance greater than 3000. Without an index, the database engine would have to scan every row in the table and check if the balance column satisfies the condition. This would require five disk I/O operations and five comparisons.

However, if we create an index on the balance column, the database engine can use it to speed up the query. The index would look something like this:

balancepointer
20001004
30001002
40001003
50001001
60001005

The database engine can use a binary search algorithm to find the first entry in the index that has a balance greater than or equal to 3000. In this case, it would be the second entry with balance 3000 and pointer 1002. Then, it can follow the pointer to read the row with acc_no 1002 from the table. Similarly, it can read the next three entries in the index and follow their pointers to read the corresponding rows from the table. This would require only four disk I/O operations and three comparisons.

Types of Database Indexes

There are different types of database indexes that can be used for different purposes and scenarios. Some of the common types of indexes are:

  • Primary index: An index that is created on the primary key of a table. A primary key is a column or a combination of columns that uniquely identifies each row in the table. A primary index ensures that there are no duplicate values in the primary key column(s) and enables fast retrieval of rows by their primary key values.
  • Secondary index: An index that is created on a column or a combination of columns that are not part of the primary key. A secondary index allows for faster retrieval of rows by their non-primary key values. For example, in our bank account table, we can create a secondary index on the branch column to quickly find all accounts belonging to a specific branch.
  • Clustered index: An index that determines the physical order of rows in a table. A clustered index stores both the key values and the row data in the same data structure, such as a B-tree or a hash table. A clustered index ensures that rows with similar key values are stored close together on disk, which reduces disk I/O operations and improves query performance. However, there can be only one clustered index per table, and any modification to the clustered index key values can cause expensive reorganization of the table data.
  • Non-clustered index: An index that does not affect the physical order of rows in a table. A non-clustered index stores only the key values and pointers to the row data in a separate data structure, such as a B-tree or a hash table. A non-clustered index does not require reorganization of the table data when the key values change, but it requires an extra disk I/O operation to follow the pointer from the index to the table. There can be multiple non-clustered indexes per table, and they can be used in conjunction with a clustered index.

Advantages and Disadvantages of Database Indexing

Database indexing has many benefits, but it also has some drawbacks. Some of the advantages and disadvantages of database indexing are:

  • Advantages:
    • Database indexing can significantly improve the performance of database queries by reducing the number of disk I/O operations and comparisons required to find the matching rows.
    • Database indexing can also improve the performance of other database operations, such as joins, aggregations, sorting, and grouping, by using the index to efficiently access the relevant rows or columns.
    • Database indexing can enforce data integrity and uniqueness constraints on the table columns by preventing duplicate or invalid values in the index key columns.
  • Disadvantages:
    • Database indexing can increase the storage space required by the database, as each index creates a separate data structure that occupies disk space.
    • Database indexing can also increase the maintenance overhead for the database, as each index has to be updated whenever the table data is inserted, updated, or deleted. This can affect the performance of write operations and increase the risk of data inconsistency or corruption.
    • Database indexing can also have a negative impact on query performance if the index is not designed properly or used appropriately. For example, creating too many indexes or indexes on columns that are rarely used in queries can cause unnecessary overhead and waste disk space. Similarly, using an index that does not match the query filter conditions or sorting order can cause inefficient index scans or extra sorting operations.

Summary

Database indexing is a technique that improves the efficiency and speed of accessing data from a database table. It involves creating a data structure, called an index, that stores a subset of the table’s columns and pointers to the corresponding rows. An index can be compared to a book’s index, which helps you find the page number of a specific topic without scanning the entire book.

There are different types of database indexes that can be used for different purposes and scenarios, such as primary index, secondary index, clustered index, and non-clustered index. Each type of index has its own advantages and disadvantages, depending on the characteristics of the table data and the query requirements.

Database indexing has many benefits, but it also has some drawbacks. Database indexing can significantly improve the performance of database queries and other operations by reducing the number of disk I/O operations and comparisons required to find the matching rows. However, database indexing can also increase the storage space required by the database, increase the maintenance overhead for the database, and have a negative impact on query performance if not designed properly or used appropriately.

Therefore, database indexing is an important technique that should be used carefully and wisely to optimize database performance and efficiency.

0 মন্তব্য(গুলি):

একটি মন্তব্য পোস্ট করুন

Comment below if you have any questions

Contact form

নাম

ইমেল *

বার্তা *