Monday, July 9, 2012

Removing Unused Indexes (Indices)

Index design and maintenance is vital when creating transactional and multi-dimensional database solutions. I have seen numerous occasions where a 10 minute query is reduced to 10 seconds simply by adding an index where one should be but isn’t. I have seen times where, in a company’s Development environment a query completed in milliseconds but the same exact query takes minutes in Production. After troubleshooting and reviewing the query plans we would determine that someone failed to move an index into Production with the rest of the objects. When used correctly, indexes make your SQL queries run faster. Unused indexes, however, are a drag on storage and performance; this article will detail how to identify and address the issue of unused indexes.

SQL Indexes 101

There are two kinds of indexes: clustered and nonclustered. To best understand the function of a clustered index, think of it as the page numbers in a book: each page is a unique record and they are sorted in a specific order. Nonclustered indexes are like the book’s index at the back: specific words or topics are listed alphabetically and tell you on what page/pages/page-range you can find what you are looking for. It is important to note, too, how you need a clustered index (page number in a book) if you are going to have an index (nonclustered) at the back of a book. If not for page numbers in a book, what would an index in the back of the book direct you to?

Without an index on a book it would be impossible to find occurrences of a certain word or topic without scanning the entire book page by agonizing page. If we’re talking about a 5-page children’s book … no big deal. With a 1000-page technical book… big deal. The same is true with tables in a DB: if the table is small (say 50K rows or less), the query engine can retrieve data rather quickly using a table scan (SQL’s version of scanning a book page-by-page looking for something). A table scan on a large table, we’ll say 1M rows+, can take forever.

Hopefully this helps explain what indexes are for and why they critical when dealing with large numbers of records.

Unused SQL indexes 101

Understanding the impact of not having an index where there should be one is important. Understanding the impact of unused indexes is also very important and often overlooked. Going back to the book analogy… Say you are the publisher of a text book and need to make the following updates and changes: the content of page 12 is moving to page 14, the contents of page 26 are changed and page 50 is removed (which now means page 51 is page 50, page 52 is 51, etc.) Logic would dictate that, after making these changes, you would also need to update the book’s index; otherwise the index would no longer be relevant.

The same is true with database indexes: when a query UPDATES/INSERTS/DELETES records in one or more tables, SQL server must also update the associated index(s). This adds overhead, especially when we are talking about large and/or multiple indexes. The additional work that SQL Server performs to keep your indexes up-to-date is justified if the indexes are used to improve performance. What benefit do these indexes provide if they are never used? None.

The problem is not just that unused indexes provide no value whatsoever, unused indexes consume vital system resources, RAM, local and/or SAN drive space as well as backup space. So what can we do about them? The first step is to identify them.

Identifying unused indexes

Starting with SQL Server 2005, Microsoft introduced Dynamic Management Views (DMV’s) and Dynamic Management Functions (DMF’s). These return state information that can be used for monitoring and diagnosing problems with a SQL Server Instance. The query below leverages a DMV and sys table to tell us what indexes are being used and which ones are not:

SELECT OBJECT_NAME(s.[object_id]) [Table Name],
       i.name [Index Name],
       user_updates [Total Writes],
       user_seeks + user_scans + user_lookups [Total Reads],
       user_updates - (user_seeks + user_scans + user_lookups) [Difference]
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i
       ON s.[object_id] = i.[object_id]
       AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
       AND s.database_id = DB_ID()
       AND user_updates > (user_seeks + user_scans + user_lookups)
       AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC

Sample Output:

Looking at index1 of my index stats: the totalwrites number (6,059,833) represents a negetive performance impact on the associated INSERT/UPDATE/DELETE queries. The TotalReads (0) represents the number of times this index was used to make a query faster. In this example it’s obvious that Index1 through Index6 have not provided any benefit.

So let’s just drop them right? Perhaps – but not quite yet. The (DMV) in the query above (dm_db_index_usage_stats) begins collecting index usage when SQL Server starts. To determine how long this DMV has been collecting stats run this:

SELECT Create_Date [UpSince],
       DATEDIFF(DD,Create_Date, GETDATE()) [DaysRunning]
FROM sys.databases WHERE name = 'tempdb';

You can also update your query to include this information as shown below:

DECLARE @RunningSince datetime =
      (SELECT Create_Date FROM sys.databases WHERE name = 'tempdb');

DECLARE @DaysRunning INT = DATEDIFF(DD,@RunningSince, GETDATE());

SELECT DB_NAME() [db],
       OBJECT_NAME(s.[object_id]) [Table],
       i.name [Index],
       user_updates [Total Writes],
       user_seeks + user_scans + user_lookups [Total Reads],
       user_updates - (user_seeks + user_scans + user_lookups) [Difference],
       @RunningSince [Running Since],
       @DaysRunning [Days Running]
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i
       ON s.[object_id] = i.[object_id]
       AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
AND s.database_id = DB_ID()
AND user_updates > (user_seeks + user_scans + user_lookups)
AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC

Sample Output:

Collecting Unused Index statistics over time

So what if we needed 3 months of stats but the server has only been running for a month? Bummer. To collect Index usage stats on a regular basis I suggest:

  1. Create a table to collect your index usage
  2. Update the above query to insert the result set into the table created in Step 1. Put the query into a stored procedure
  3. Schedule a SQL job that runs the stored procedure periodically (weekly, daily, whatever…)

Table DDL (Step 1):

USE {SomeDB}
GO

CREATE TABLE tbl_indexUsage
(
       [id] int IDENTITY(1,1),
       [db] varchar(100),
       [table] varchar(100),
       [Index] varchar(100),
       [Writes] int,
       [Reads] int,
       [Difference] int,
       [Running Since] datetime,
       [Days Running] int,
       PRIMARY KEY ([id])
)
GO

Stored Proc Query DDL (Step 2):

CREATE PROC usp_collectIndexUsage
AS
BEGIN
       DECLARE @RunningSince datetime =
            (SELECT Create_Date FROM sys.databases WHERE name = 'tempdb');
       DECLARE @DaysRunning INT = DATEDIFF(DD,@RunningSince, GETDATE());

       INSERT INTO tbl_indexUsage
            (db,[table],[Index],Writes,Reads,Difference,[Running Since],[Days Running])
             SELECT DB_NAME() [db],
                   OBJECT_NAME(s.[object_id]) [Table],
                   i.name [Index],
                   user_updates [Total Writes],
                   user_seeks + user_scans + user_lookups [Total Reads],
                   user_updates - (user_seeks + user_scans + user_lookups) [Difference],
                   @RunningSince [Running Since],
                   @DaysRunning [Days Running]
             FROM sys.dm_db_index_usage_stats s
             INNER JOIN sys.indexes i
                   ON s.[object_id] = i.[object_id]
             AND i.index_id = s.index_id
             WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
             AND s.database_id = DB_ID()
             AND user_updates > (user_seeks + user_scans + user_lookups)
             AND i.index_id > 1
             ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC
END
GO

Scheduled Job DDL (Step 3):

USE [msdb]
GO

BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS
(
       SELECT name
       FROM msdb.dbo.syscategories
       WHERE name=N'[Uncategorized (Local)]'
       AND category_class=1
)
BEGIN
       EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
       IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'collect Index Usage',
       @enabled=1,
       @notify_level_eventlog=0,
       @notify_level_email=0,
       @notify_level_netsend=0,
       @notify_level_page=0,
       @delete_level=0,
       @description=N'No description available.',
       @category_name=N'[Uncategorized (Local)]',
       @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'GetThatData',
       @step_id=1,
       @cmdexec_success_code=0,
       @on_success_action=1,
       @on_success_step_id=0,
       @on_fail_action=2,
       @on_fail_step_id=0,
       @retry_attempts=0,
       @retry_interval=0,
       @os_run_priority=0,
       @subsystem=N'TSQL',
       @command=N'EXEC usp_collectIndexUsage;',
       @database_name=N'ajbTest',
       @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'DailyIndexUsageCollection',
       @enabled=1,
       @freq_type=8,
       @freq_interval=1,
       @freq_subday_type=1,
       @freq_subday_interval=0,
       @freq_relative_interval=0,
       @freq_recurrence_factor=1,
       @active_start_date=20120709,
       @active_end_date=99991231,
       @active_start_time=0,
       @active_end_time=235959,
       @schedule_uid=N'305e6d6b-f93f-4158-bc91-769ecdb740e8'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
       IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

You can then collect your index usage history and use this information to make informed decisions about what indexes to keep and which ones to drop.

Other Considerations

A few other considerations include: what kind of indexes are these - clustered or nonclustered? If they are clustered do they exist because of a primary key? Is the table replicated? These are some of the questions I need answered before removing unused indexes.

If these are clustered indexes then understand the impact on any nonclustered indexes on that table. E.g. If you drop a clustered index on a table with nonclustered indexes, all the nonclustered indexes are rebuilt to replace the clustered index keys with row pointers. This can be time consuming depending on the size, quantity and content of the nonclustered indexes. Removing a clustered index from an indexed view automatically removes all nonclustered indexes.

Sometimes you will have unused indexes that cannot be removed. For example: I recently setup transactional replication. In transactional (and Merge) replication you must have a primary key on a published table. The published source tables in my transactional replication are written to by an application then replicated to data warehouses. Though the clustered indexes associated with my PK's on the source table are never used they associated with a PK and cannot be removed.

I have seen cases in the past where developers add a number of indexes on a table while doing performance tuning & testing. The indexes are garbage and would never be used in our query execution plans. Nonetheless, they are not removed because of sloppy developer work and go into Production.

Lastly: documentation is always key! In the BI world - we say, "If you can't measure it, you can't manage it." Documentation helps us understand why it exists in the first place.

Conclusion

Indexes speed up performance but aren’t overhead free. Understanding which indexes are helping performance and which ones are hurting performance will enable you to make informed decisions about which ones to keep and which ones are just wasting your server’s time.

1 comment: