Thursday, September 6, 2012

XML and NASA

"A key to addressing the data heterogeneity issue is the adoption and enforcement of standards across information systems. While XML is clearly emerging to be the lingua franca for databases and internet systems, it is encouraging to see the enforcement of standards even at the level of Microsoft® Word and Microsoft® Powerpoint® documents"
-- National Aeronautics and Space Administration (NASA)

XML is so wonderful that I created this blog to tell you about it. You don't need to be a Rocket Scientist to understand how powerful XML is but apparently it doesn't hurt. Recently I Googled "NASA Business Intelligence" and found an article posted on stanford.edu titled, The NASA Program Management Tool: A New Vision in Business Intelligence. So, what technology sits at the core of Business Intelligence (BI) at NASA? An elite team of Data Scientists from MIT, Berkeley, Cornell and Stanford determined that only XML would do. Of course. Below is a link to the article but let me share a few highlights:

This paper describes a novel approach to business intelligence and program management for large technology enterprises like the U.S. National Aeronautics and Space Administration (NASA). Two key distinctions of the approach are that 1) standard business documents are the user interface, and 2) a “schema-less” XML database enables flexible integration of technology information for use by both humans and machines in a highly dynamic environment. The implementation utilizes patent-pending NASA software called the NASA Program Management Tool (PMT) and its underlying “schema-less” XML database called Netmark.

...a 'schema-less' XML database enables integration and query-based document composition, eliminating the need for database administration by automating the integration of information that have diverse schemas.

...Netmark supports the XML standard for metadata and information interchange thus making it an open system that is interoperable with a wide variety of other information systems and tools.

It is worth noting that NASA uses the Xalan XSLT processor to transform their XML. Nice.

--ab

Additional Reading:

http://infolab.stanford.edu/~maluf/papers/ieeepmt06.pdf
http://www.nasa.gov/pdf/560758main_Program%20Management%20Tool.pdf

Wednesday, September 5, 2012

SP_WHO3

Wouldn't it be nice to have the ability to be able to pass more parameters to SP_WHO or SP_WHO2? Well I just wrote SP_WHO3 which allows you to pass the following parameters: @DBName, @SPID, @Status, @Login, @hostName, @Command, @CPUtime_floor, @DiskIO_floor, @ProgramName and @precision_bit.

Furthermore, you can set @precision_bit=0 to get a close match (see comments in the code). Enjoy!


SP_WHO3:

CREATE PROC sp_who3

(      @exact_flag          bit=1,

       @DBName              varchar(100)='*All*',

       @SPID                int=-1,

       @Status              varchar(40)='*All*',

       @Login               varchar(50)='*All*',

       @hostName            varchar(100)='*All*',

       @Command             varchar(200)='*All*',

       @CPUtime_floor       bigint=-1,

       @DiskIO_floor        bigint=-1,

       @ProgramName         varchar(100)='*All*'

)

AS

/******************************************************************

 Author:      Alan Burstein

 Created on:  9/5/2012

 Updated on:  4/19/2013

 

 Use:

 Returns the same columns as SP_WHO2 but you can pass parameters

 and filter the result set.

 

 With @exact_flag=1 (default) you only get records where there

 is an EXACT (=) match. For example, when @exact_flag=1 and

 @Command='TASK' nothing will be returned. If exact_match_flag=0

 and @Command='TASK' then records where Command="TASK MANAGER",

 Command="BRKR TASK", etc will be returned.

******************************************************************/

BEGIN

 SET NOCOUNT ON;

 

 -- Create table to hold the results of EXEC SP_WHO2

 DECLARE @sp_who3 TABLE

 (     SPID          int,

       [Status]      varchar(40),

       [Login]       varchar(50),

       HostName      varchar(100),

       BlkBy         varchar(4),

       DBName        varchar(100),

       Command       varchar(200),

       CPUtime       bigint,

       DiskIO        bigint,

       LastBatch     varchar(100),

       ProgramName   varchar(100),

       SPID2         int,

       REQUESTID     int);

 

 INSERT INTO @sp_who3 EXEC SP_WHO2;

 

 -- output

 IF @exact_flag=1

 BEGIN

  SELECT * FROM @sp_who3

  WHERE (DBName=@DBName OR @DBName='*All*') AND

    ([Status]=@Status OR @Status='*All*') AND

    (SPID=@SPID OR @SPID=(-1)) AND

    ([Login]=@Login OR @Login='*All*') AND

    (HostName=@hostName OR @hostName='*All*') AND

    (Command=@Command OR @Command='*All*') AND

    (CPUtime>=@CPUtime_floor OR @CPUtime_floor=(-1)) AND

    (DiskIO>=@DiskIO_floor OR @DiskIO_floor=(-1)) AND

    (ProgramName=@ProgramName OR @ProgramName='*All*')

  ORDER BY LastBatch DESC

 END

 ELSE

 BEGIN

  SELECT @DBName='%'+CAST(@DBName as varchar(100))+'%',

    @Status='%'+CAST(@Status as varchar(40))+'%',

    @Login='%'+CAST(@Login AS varchar(50))+'%',

    @hostName='%'+CAST(@hostName AS varchar(100))+'%',

    @Command='%'+CAST(@Command AS varchar(200))+'%',

    @ProgramName='%'+CAST(@ProgramName AS varchar(100))+'%'

 

  SELECT * FROM @sp_who3

  WHERE (DBName LIKE @DBName OR @DBName='%*All*%') AND

    ([Status] LIKE @Status OR @Status='%*All*%') AND

    (SPID=@SPID OR @SPID=(-1)) AND

    ([Login] LIKE @Login OR @Login='%*All*%') AND

    (HostName LIKE @hostName OR @hostName='%*All*%') AND

    (Command=@Command OR @Command='%*All*%') AND

    (CPUtime>=@CPUtime_floor OR @CPUtime_floor=(-1)) AND

    (DiskIO>=@DiskIO_floor OR @DiskIO_floor=(-1)) AND

    (ProgramName LIKE @ProgramName OR @ProgramName='%*All*%')

  ORDER BY LastBatch DESC

 END

END

GO

--SP_WHO3 @exact_flag=0, @ProgramName='repl', @CPUtime_floor=10000

 


Example:


Epilogue

Before posting this I googled SP_WHO3. It turns out I was not the first to come up with this idea. Bummer. My SP_WHO3 is still great.

--ab

The new xmlsqlninja logo ('X-M-L'-'See-Quel' 'ninja')

Last Updated: 4/19/2013 (code cleanup)

Tuesday, August 28, 2012

Get the size of all databases

...without using dynamic SQL or sp_msForEachDB

A common task for DBA's is to manage the size of all the databases on one or more instances. This often requires the ability to collect DB sizes using T-SQL; most methods I have found involve Dynamic SQL or sp_msForEachDB. These methods work but, in each case, leaves you vulnerable to SQL Injection.

SQL Server 2005 introduced the sys.master_files catalog view. sys.master_files contains everything you need for this (log and data file sizes) except for the database name. That is okay, because it does provide the database_id which can be used for joining to sys.databases which contains the database_id and database name.

This query will get you all db sizes without any Dynamic SQL:

Get All Database Sizes:

Monday, August 20, 2012

My Reusable, Element-Name Agnostic, XML to HTML table XSLT Transform - Part 1


Special Thanks

Like I said in my first post - this is my first blog. The best experience so far has been seeing where traffic to my site is coming from. Let me extend a special "Hello" to my readers in Hong Kong, Germany, Russia, Australia, South Africa, Indonesia, England, India (Hi Anu), Chicago, Bolingbrook, Woodridge and France. How cool is the Internet?!?! You say something in Chicago and someone listens in Bavaria. In some cases it's just one person but I will take it    :^))..

Intro

Ahhh... XML and SQL. As I have said before, none of this (blogs, the ability to communicate and collaborate globally with ease using the Internet) would be possible if not for the wonders of XML and SQL. I have written some posts about SQL but nothing yet about XML. Today that changes as I introduce my Reusable, Element-Name Agnostic, XML to HTML table XSLT Transform.

What it Does

You can attach any XML file that has the same format/hierarchey structure as an HTML table and attach it to my transform to create an HTML table populated with the with your XML data. It accomplishes this very quickly, regardless of the element names, using some XSLT best practices and with approx 30 lines of XSLT code.

It will take this...

Source XML data

... and turn it into this...

XML data transformed into HTML table

... which, in a HTML browser, will produce this

HTML in browser

... and will do so regardless of the names used in the XML file passed to it.

Background

When doing XML XSLT development (or any kind of development) one of my primary objectives is to accomplish the task at-hand with as little code as possible. This makes debugging easier and reduces the footprint for potential human error. My rule of thumb is “no solution is correct if a more elegant solution is available.” By elegant I mean: surprisingly simple yet highly effective. If you can get the same performance and accurate results then you have more work to do.

To demonstrate I will start by taking a basic transform that performs this task for a specific XML file/structure with static element names and re-write it so that it is re-usable and takes advantage of the functional programming power of XSLT.

Example XML (Catalog.xml)

I took this right from W3Schools.com. This is the file they use in many of their examples.

Catalog.xml


Desired Result (truncated for readability)

Title Artist Company Country Price Year
Empire Burlesque Bob Dylan USA Columbia 10.90 1985
Hide your heart Bonnie Tyler UK CBS Records 9.90 v1988
Greatest Hits Dolly Parton USA RCA 9.90 1982

Original Transform (Transform V1)

This is the transform before applying some XSLT development best practices to it. It is simple and easy-to-understand but can be improved. 

Transform V1:


Newly Updated Transform

Below (Transform V7) is the finished product. The most notable change is that I removed the FOR-EACH loop for building columns(td) and rows(tr); using FOR-EACH is not the best choice in XSLT for iterating through a node set (I still use a FOR-EACH to get the column headings). For the rest we are using XSLT templates. All explicit references to any element names (such as artist, price, year, etc) are removed. Instead the transform has been updated to return values based on their location in the node tree. This is done using path expressions and predicates. Writing transforms in this manner makes them much more reusable because you are not married XML structures based on element names.

Transform V7:

In the next post (Part 2) we will examine how we made the original more effective and reusable.


Updated on 8/27/2012 at 8:49PM

Monday, July 9, 2012

Removing Unused Indexes (Indices)

Index design and maintenance is vital when creating transactional and multi-dimensional database solutions. I have seen numerous occasions where a 10 minute query is reduced to 10 seconds simply by adding an index where one should be but isn’t. I have seen times where, in a company’s Development environment a query completed in milliseconds but the same exact query takes minutes in Production. After troubleshooting and reviewing the query plans we would determine that someone failed to move an index into Production with the rest of the objects. When used correctly, indexes make your SQL queries run faster. Unused indexes, however, are a drag on storage and performance; this article will detail how to identify and address the issue of unused indexes.

SQL Indexes 101

There are two kinds of indexes: clustered and nonclustered. To best understand the function of a clustered index, think of it as the page numbers in a book: each page is a unique record and they are sorted in a specific order. Nonclustered indexes are like the book’s index at the back: specific words or topics are listed alphabetically and tell you on what page/pages/page-range you can find what you are looking for. It is important to note, too, how you need a clustered index (page number in a book) if you are going to have an index (nonclustered) at the back of a book. If not for page numbers in a book, what would an index in the back of the book direct you to?

Without an index on a book it would be impossible to find occurrences of a certain word or topic without scanning the entire book page by agonizing page. If we’re talking about a 5-page children’s book … no big deal. With a 1000-page technical book… big deal. The same is true with tables in a DB: if the table is small (say 50K rows or less), the query engine can retrieve data rather quickly using a table scan (SQL’s version of scanning a book page-by-page looking for something). A table scan on a large table, we’ll say 1M rows+, can take forever.

Hopefully this helps explain what indexes are for and why they critical when dealing with large numbers of records.

Unused SQL indexes 101

Understanding the impact of not having an index where there should be one is important. Understanding the impact of unused indexes is also very important and often overlooked. Going back to the book analogy… Say you are the publisher of a text book and need to make the following updates and changes: the content of page 12 is moving to page 14, the contents of page 26 are changed and page 50 is removed (which now means page 51 is page 50, page 52 is 51, etc.) Logic would dictate that, after making these changes, you would also need to update the book’s index; otherwise the index would no longer be relevant.

The same is true with database indexes: when a query UPDATES/INSERTS/DELETES records in one or more tables, SQL server must also update the associated index(s). This adds overhead, especially when we are talking about large and/or multiple indexes. The additional work that SQL Server performs to keep your indexes up-to-date is justified if the indexes are used to improve performance. What benefit do these indexes provide if they are never used? None.

The problem is not just that unused indexes provide no value whatsoever, unused indexes consume vital system resources, RAM, local and/or SAN drive space as well as backup space. So what can we do about them? The first step is to identify them.

Identifying unused indexes

Starting with SQL Server 2005, Microsoft introduced Dynamic Management Views (DMV’s) and Dynamic Management Functions (DMF’s). These return state information that can be used for monitoring and diagnosing problems with a SQL Server Instance. The query below leverages a DMV and sys table to tell us what indexes are being used and which ones are not:

SELECT OBJECT_NAME(s.[object_id]) [Table Name],
       i.name [Index Name],
       user_updates [Total Writes],
       user_seeks + user_scans + user_lookups [Total Reads],
       user_updates - (user_seeks + user_scans + user_lookups) [Difference]
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i
       ON s.[object_id] = i.[object_id]
       AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
       AND s.database_id = DB_ID()
       AND user_updates > (user_seeks + user_scans + user_lookups)
       AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC

Sample Output:

Looking at index1 of my index stats: the totalwrites number (6,059,833) represents a negetive performance impact on the associated INSERT/UPDATE/DELETE queries. The TotalReads (0) represents the number of times this index was used to make a query faster. In this example it’s obvious that Index1 through Index6 have not provided any benefit.

So let’s just drop them right? Perhaps – but not quite yet. The (DMV) in the query above (dm_db_index_usage_stats) begins collecting index usage when SQL Server starts. To determine how long this DMV has been collecting stats run this:

SELECT Create_Date [UpSince],
       DATEDIFF(DD,Create_Date, GETDATE()) [DaysRunning]
FROM sys.databases WHERE name = 'tempdb';

You can also update your query to include this information as shown below:

DECLARE @RunningSince datetime =
      (SELECT Create_Date FROM sys.databases WHERE name = 'tempdb');

DECLARE @DaysRunning INT = DATEDIFF(DD,@RunningSince, GETDATE());

SELECT DB_NAME() [db],
       OBJECT_NAME(s.[object_id]) [Table],
       i.name [Index],
       user_updates [Total Writes],
       user_seeks + user_scans + user_lookups [Total Reads],
       user_updates - (user_seeks + user_scans + user_lookups) [Difference],
       @RunningSince [Running Since],
       @DaysRunning [Days Running]
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i
       ON s.[object_id] = i.[object_id]
       AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
AND s.database_id = DB_ID()
AND user_updates > (user_seeks + user_scans + user_lookups)
AND i.index_id > 1
ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC

Sample Output:

Collecting Unused Index statistics over time

So what if we needed 3 months of stats but the server has only been running for a month? Bummer. To collect Index usage stats on a regular basis I suggest:

  1. Create a table to collect your index usage
  2. Update the above query to insert the result set into the table created in Step 1. Put the query into a stored procedure
  3. Schedule a SQL job that runs the stored procedure periodically (weekly, daily, whatever…)

Table DDL (Step 1):

USE {SomeDB}
GO

CREATE TABLE tbl_indexUsage
(
       [id] int IDENTITY(1,1),
       [db] varchar(100),
       [table] varchar(100),
       [Index] varchar(100),
       [Writes] int,
       [Reads] int,
       [Difference] int,
       [Running Since] datetime,
       [Days Running] int,
       PRIMARY KEY ([id])
)
GO

Stored Proc Query DDL (Step 2):

CREATE PROC usp_collectIndexUsage
AS
BEGIN
       DECLARE @RunningSince datetime =
            (SELECT Create_Date FROM sys.databases WHERE name = 'tempdb');
       DECLARE @DaysRunning INT = DATEDIFF(DD,@RunningSince, GETDATE());

       INSERT INTO tbl_indexUsage
            (db,[table],[Index],Writes,Reads,Difference,[Running Since],[Days Running])
             SELECT DB_NAME() [db],
                   OBJECT_NAME(s.[object_id]) [Table],
                   i.name [Index],
                   user_updates [Total Writes],
                   user_seeks + user_scans + user_lookups [Total Reads],
                   user_updates - (user_seeks + user_scans + user_lookups) [Difference],
                   @RunningSince [Running Since],
                   @DaysRunning [Days Running]
             FROM sys.dm_db_index_usage_stats s
             INNER JOIN sys.indexes i
                   ON s.[object_id] = i.[object_id]
             AND i.index_id = s.index_id
             WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
             AND s.database_id = DB_ID()
             AND user_updates > (user_seeks + user_scans + user_lookups)
             AND i.index_id > 1
             ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC
END
GO

Scheduled Job DDL (Step 3):

USE [msdb]
GO

BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS
(
       SELECT name
       FROM msdb.dbo.syscategories
       WHERE name=N'[Uncategorized (Local)]'
       AND category_class=1
)
BEGIN
       EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
       IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'collect Index Usage',
       @enabled=1,
       @notify_level_eventlog=0,
       @notify_level_email=0,
       @notify_level_netsend=0,
       @notify_level_page=0,
       @delete_level=0,
       @description=N'No description available.',
       @category_name=N'[Uncategorized (Local)]',
       @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'GetThatData',
       @step_id=1,
       @cmdexec_success_code=0,
       @on_success_action=1,
       @on_success_step_id=0,
       @on_fail_action=2,
       @on_fail_step_id=0,
       @retry_attempts=0,
       @retry_interval=0,
       @os_run_priority=0,
       @subsystem=N'TSQL',
       @command=N'EXEC usp_collectIndexUsage;',
       @database_name=N'ajbTest',
       @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'DailyIndexUsageCollection',
       @enabled=1,
       @freq_type=8,
       @freq_interval=1,
       @freq_subday_type=1,
       @freq_subday_interval=0,
       @freq_relative_interval=0,
       @freq_recurrence_factor=1,
       @active_start_date=20120709,
       @active_end_date=99991231,
       @active_start_time=0,
       @active_end_time=235959,
       @schedule_uid=N'305e6d6b-f93f-4158-bc91-769ecdb740e8'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
       IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

You can then collect your index usage history and use this information to make informed decisions about what indexes to keep and which ones to drop.

Other Considerations

A few other considerations include: what kind of indexes are these - clustered or nonclustered? If they are clustered do they exist because of a primary key? Is the table replicated? These are some of the questions I need answered before removing unused indexes.

If these are clustered indexes then understand the impact on any nonclustered indexes on that table. E.g. If you drop a clustered index on a table with nonclustered indexes, all the nonclustered indexes are rebuilt to replace the clustered index keys with row pointers. This can be time consuming depending on the size, quantity and content of the nonclustered indexes. Removing a clustered index from an indexed view automatically removes all nonclustered indexes.

Sometimes you will have unused indexes that cannot be removed. For example: I recently setup transactional replication. In transactional (and Merge) replication you must have a primary key on a published table. The published source tables in my transactional replication are written to by an application then replicated to data warehouses. Though the clustered indexes associated with my PK's on the source table are never used they associated with a PK and cannot be removed.

I have seen cases in the past where developers add a number of indexes on a table while doing performance tuning & testing. The indexes are garbage and would never be used in our query execution plans. Nonetheless, they are not removed because of sloppy developer work and go into Production.

Lastly: documentation is always key! In the BI world - we say, "If you can't measure it, you can't manage it." Documentation helps us understand why it exists in the first place.

Conclusion

Indexes speed up performance but aren’t overhead free. Understanding which indexes are helping performance and which ones are hurting performance will enable you to make informed decisions about which ones to keep and which ones are just wasting your server’s time.

Wednesday, June 27, 2012

Career Advice From Zombies

Normally I would pass on an article with this title but, as a subscriber to the Dice.com blog and someone currently reading World War Z (a great zombie book), I was intrigued and gave this a quick read...

This is a great article -- not just for job seekers but for everyone with aspirations to advance a skill and/or their careers, or those with a specific goal. Zombies lead by example: they demonstrate how to set goals and achieve those goals with a focused tenacity that all living persons should aspire to emulate.

Career Advice from Zombies


--ab

The new xmlsqlninja logo ('X-M-L'-'See-Quel' 'ninja')

Last Updated: 4/19/2013 (code cleanup, added new pic)

Tuesday, June 19, 2012

Recovering an unsaved SSMS Query

This is a something I deal with from time-to-time; yesterday, most recently, as a matter of fact.

So, you have been in SQL Server Management Studio (SSMS) all morning working on that query when the office lights flicker, then…. Darkness! Your monitor turns off and you hear your PC fans go quiet. You just lost power but, even worse, you just lost that query.

You know that you need to save your work as often as possible (and usually do) but now is not the time for that advice. Is there any way to get that work back?

Here's how the Ninja does it:

Steps to recovery an unsaved query created in SSMS

1) Go to Start

2) Go into the Search programs and files box

Vista/Windows 7 - Search Programs and Files

3) Search for this (exactly as shown):

"~AutoRecover.

Your work should appear under Documents and is saved as ~autoRecover.{something}.sql

4) Right-click on one of the “~autoRecover” files and select Open File Location.

5) Your SQL files should be in that folder.

At this point you can open them and save them as a different name.



Behind the scene:

Starting with version 2005, SQL Server includes an Auto Recovery feature which automatically saves your work. It behaves like the Auto Recovery feature in Microsoft Word where, if your system should crash or hang, you can retrieve the files from their recovered state.
This feature in SQL Server is not configurable via the GUI;it can only (to my knowledge) be enabled/disabled in the registry under:

2008: HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\100\Tools\Shell\General\AutoRecover
2005: HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\90\Tools\Shell\General\AutoRecover

Last Updated: May 1, 2013 (formatting)