's blog entry on Duplicate indexes and redundant indexes
certainly made a bit of a stir! It has already led to a new (mini) project by Baron Schwartz
, the Duplicate index/foreign key finder
which is publicly available on MySQLForge
as a perl script. Daniel Schneller
has entered the arena as well, devoting an entire blog
to his own java implementation to tackle this and other index trouble.
I figured it would be fun to add a pure SQL solution for the problem. This is not too hard to implement with the
system view found in the MySQL information_schema
database. This solution is limited in that it will only work for MySQL 5.0 (and higher). On the other hand, it has the advantage that it does not require any external programming languages or runtime environments.
The solution I'm presenting here also takes the uniqueness
of indexes into account, something that - to the best of my knowledge - has not been discussed by the other authors (although I believe that Daniel Schneller's tool does take uniqueness into account).
Those that don't feel like reading the article, hey - you decide. The code is available on MySQLForge
as a snippet. The article just explains a little bit of the background of the problem, and explores some techniques to solve it using a pure SQL solution. It finishes with a detailed explanation of the actual query, which you may or may not find interesting to read.
hi, i just skimmed through your article - interesting reading.
You seem to say that a BTREE index is redundant if a prefix of another index of any other index type is indexing the same columns. You also seem to say that a hash index is redundant only if another has index is indexing the exact same columns.
I may have misunderstood you and it may come down to how the indexes are used. But what i just want to add is, that the index types are good for different things; hash indexes may be good for equality-queries, while B-TREE indexes may be good for range queries and sorting, and also (but perhaps to a lesser extend) equality queries.
Hi, and thanks for your comments!
"You seem to say that a BTREE index is redundant if a prefix of another index of any other index type is indexing the same columns. You also seem to say that a hash index is redundant only if another has index is indexing the exact same columns."
Almost - I'm saying that a BTREE index is redundant if it's columns form the prefix of another BTREE index, and that a HASH index is redundant only if another HASH index is has the same columnlist (so, same columns *and* same order).
That's why my GROUP BY includes the INDEX_TYPE column; this ensures we will be comparing indexes only if they have the same type.
"hash indexes may be good for equality-queries, while B-TREE indexes may be good for range queries and sorting"
And this is entirely true. Lookups that use exact equality on *all* index columns (and sometimes, also IN using a list of constants) can be extremely fast for (unique) HASH indexes. BTREE is generally not as fast for this purpose, but can be used effectively in JOINS, range scans, and indeed sometimes ORDER BY and GROUP BY can also benefit from and ordered index like a BTREE.
If you want a version that doesn't rely on the information schema, so it's fast and works on MySQL 4.x and takes lots of complicated things into account (the index structure of InnoDB tables, the differences between BTREE and HASH indexes, etc...) Maatkit has a tool for it. See http://www.maatkit.org/