Friday, April 26, 2013

mdq.XmlTransform -- Part 1: Basic XSLT in SQL Server


Intro

In my article, Missing CLR Part 2, I called mdq.XmlTransform the “baddest CLR ever” because it adds a ton of new functionality to your SQL Server instance. In this article I will demonstrate some of that functionality. For many, the concept of XML transforms is new so I’ll keep it basic. The purpose of this article is not to sell you on XML transforms but rather to inform you that this technology is available. Hopefully, as more people see the potential of mdq.XmlTransform, more developers will take some of these ideas and run with them.


XSLT is about transforming Data

The conventional wisdom is that XSLT is used to transform XML into something else (such as a new XML file or text). This is only part of the truth. XSLT is for transforming data into data; that’s what it does. This means that you are not limited to XML as input or as output. XSLT can receive text, numbers (of any kind), dates, XML files, XML fragments, delimited strings, XHTML, HTML, etc, then transform that data, and then return text, numbers, dates, XML, HTML, or pretty much anything else you need.

mdq.XmlTransform can do math, parse strings, perform date calculations and almost anything else SQL can do to data. XSLT solves many complex XML problems better than T-SQL's limited XPath & XQuery functionality. It can also solve many problems SQL cannot solve yet and, unlike any Microsoft language, you can create new XSLT functions without creating new CLRs or compiling more code (as would be the case with LINQ and F#).


A quick note about performance

For many this is a new topic so I am not going to discuss performance considerations yet. There is plenty I can do to make the test code below run much faster but, for the sake of simplicity, I will table that for a future post.


Transforming XML Into SQL

In Setting up mdq.XmlTransform I showed you how to use XSLT to calulate a factorial. I did this to demonstrate recursion and computability using XSLT. Today I will show you the type of task that XSLT does better than regex, padindex or anyting else: transforming XML into SQL.

To demonstrate, I grabbed the HTML code from this Onion article and structured it similar (kinda) to an RSS XML feed (namespaces and CDATA removed for simplicity). The resulting XML code looks like this:

<articles>

  <article name="Starbucks Interview">

    <p>

        BURLINGTON, VT—Citing the loud screeching sounds of milk being steamed in the background, the overly crowded setting, and the fact that a job applicant’s future is being

        discussed over a tiny Formica table at a national coffee shop chain, <b>sources can now confirm that <i>the most depressing job interview you could possibly imagine is currently

        being conducted within a <u>local Starbucks.</u></i></b></p>

    <p>

        The breathtakingly sad interview, which many speculate is for an entry-level position at a small nonprofit organization, was made even more depressing when reports indicated

        that the female job applicant in her early to mid-20s seemed <i style="width:95.5%; ">visibly eager</i> to impress her interviewer, a man who conducts his job interviews at a

        <b style="font-family: comic sans ms /*spelling_....?^?^?*/">Starbucks</b>.

        </p>

    <p>

      <span style="height:90.2%;[xxx=%]">Thanks for meeting me here,</span> said the middle-aged man, who went on to explain that the <i>fulfilling</i> job pays little money and doesn’t include

            health insurance for at least the first eight months of employment. <span style="fff%[%]">It says here you went to Notre Dame. What was that like?</span></p>

    <p>

      <span style="fff%[%]">Ultimately, we’re looking for motivated people who can think fast on their feet and are up to any challenge,</span> the man added while a Josh Groban song played over a

                storewide speaker and <b style="sfty5">seven people lined up three feet away to use the bathroom</b>.

        </p>

    <p>

        Following the interviewer’s description of the <span style="color:blue; address: http://www.x.xxx%50%20%%20%%%">rewarding work</span> and <span style="fff%[%]">character-building

                opportunities, </span> sources confirmed that the interviewee, who was the only individual in the establishment wearing professional attire, went on to describe <b>her</b> work

                experience and career aspirations in front of approximately two dozen café patrons and Starbucks <u>employees</u>.

        </p>

    <p>

        Reports indicate that the interview reached a <b><i>new level of despair when </i></b>a Starbucks patron interrupted the meeting and asked the job applicant if he could take

        the unused seat next to her back to his table.

        </p>

    <p>

      <span style="fff%[%]">Obviously, I’d be thrilled to be involved with an organization like this,</span> said the interviewee, who had been seen aimlessly milling about the outdoor

        commercial plaza that houses the coffee shop for approximately 45 minutes prior to the interview. <span style="width: 90%; f.x.^^">I’m sure you’re seeing a lot of candidates, but I’d like

        to think I can bring something to the table that you can’t get from anyone else.</span></p>

    <p>

      <span style="fff%[%]">I honestly feel that this could be a great chance for me to grow both <b>personally and professionally,</b></span> added the woman, whose earnestness

            juxtaposed with the Starbucks employee sweeping around her feet was, sources noted, absolutely heartbreaking.

        </p>

    <p>

        According to witnesses, while the interview itself was completely and utterly pathetic, the moments leading up to it were equally depressing, with the job applicant having

        difficulty identifying her interviewer when she first entered the Starbucks and asking another man if he was Jim Oswald. In addition, when the two ultimately found one another,

        the awkward small talk they engaged in while waiting in line to order coffee was reportedly a tragic spectacle, to say the least.

        </p>

    <p>

        Other unbearably bleak and humiliating features of the interview included the amount of muffin crumbs that needed to be wiped to the ground before the two sat down and the

        homeless people within earshot who were talking incoherently about a range of subjects.

        </p>

    <p>

        Official records also confirmed that job interviews don’t break your f[%removed%]ing heart when they are conducted in an office as opposed to what is essentially a fast-food restaurant.

        </p>

  </article>

</articles>

I also dirtied up the HTML code to look like the nonsense I often have to deal with (lot's stuff like brackets ([ ]), % signs and other things that make this tricky for T-SQL and Regular Expressions.)


The Requirement

Let's say you are asked to normalize the XML data posted above. In, other words, the requirement is to take that XML data, strip the tags and then insert the data into a couple tables like so:


How would you do that?

This would be extremely difficult if not impossible using anything else yet it's quite simple using XSLT. First we create two XSLT functions - one to strip the tags, another to clean up the white space like this:

<!-- XML to SQL Function -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:output method="text" indent="yes" xml:space="preserve" />

  <xsl:template match="/articles" xml:space="default">

    <!-- need: (1)par_id (2)title (3)par_nbr (4)content -->

    <xsl:apply-templates select="/articles/article" />

  </xsl:template>

  <xsl:template match="/articles/article" xml:space="default">

    <xsl:value-of select="position()" />||<xsl:apply-templates select="@name" />||

                <xsl:apply-templates select="p" /></xsl:template>

  <xsl:template match="p" xml:space="default">

    <xsl:apply-templates />

    <xsl:if test="position()!=last()">||</xsl:if>

  </xsl:template>

</xsl:stylesheet>

 

<!-- Normalize Space Function -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

        <xsl:output method="text" indent="yes" xml:space="preserve" />

        <xsl:template match="text()" xml:space="default">

                <xsl:value-of select="normalize-space()" />

        </xsl:template>

</xsl:stylesheet>

Once my XSLT functions were ready I would pass them to mdq.xmlTransform like this:

 SET NOCOUNT ON

GO


/*************************************************

(0) PREP: Create output tables:

       1. for the article (article_id, article)

       2. for the content (content_id, article_id, paragraph_nbr, paragraph)

*************************************************/

IF OBJECT_ID('tempdb..#articles') IS NOT NULL

       DROP TABLE #articles;

IF OBJECT_ID('tempdb..#article_content') IS NOT NULL

       DROP TABLE #article_content;

IF OBJECT_ID('tempdb..#tmp') IS NOT NULL

       DROP TABLE #tmp;

GO


CREATE TABLE #articles (article_id int identity primary key, article varchar(100) unique NOT NULL);

CREATE TABLE #article_content (content_id int identity primary key, article_id int NOT NULL, p_nbr int NOT NULL, paragraph varchar(1000) NOT NULL);

GO


/*************************************************

(1) XML DATA: Get the article in XML format

*************************************************/

DECLARE @xml xml='

<articles>

    <article name="Starbucks Interview">

        <p>

        BURLINGTON, VT—Citing the loud screeching sounds of milk being steamed in the background, the overly crowded setting, and the fact that a job applicant’s future is being

        discussed over a tiny Formica table at a national coffee shop chain, <b>sources can now confirm that <i>the most depressing job interview you could possibly imagine is currently

        being conducted within a <u>local Starbucks.</u></i></b>

        </p>

        <p>

        The breathtakingly sad interview, which many speculate is for an entry-level position at a small nonprofit organization, was made even more depressing when reports indicated

        that the female job applicant in her early to mid-20s seemed <i style="width:95.5%; ">visibly eager</i> to impress her interviewer, a man who conducts his job interviews at a

        <b style="font-family: comic sans ms /*spelling_....?^?^?*/">Starbucks</b>.

        </p>

        <p>

        <span style="height:90.2%;[xxx=%]">Thanks for meeting me here,</span> said the middle-aged man, who went on to explain that the <i>fulfilling</i> job pays little money and doesn’t include

            health insurance for at least the first eight months of employment. <span style="fff%[%]">It says here you went to Notre Dame. What was that like?</span>

        </p>

         <p>

        <span style="fff%[%]">Ultimately, we’re looking for motivated people who can think fast on their feet and are up to any challenge,</span> the man added while a Josh Groban song played over a

              speaker and <b style="sfty5">seven people lined up three feet away to use the bathroom</b>.

        </p>

         <p>

        Following the interviewer’s description of the <span style="color:blue; address: http://www.x.xxx%50%20%%20%%%">rewarding work</span> and <span style="fff%[%]">character-building

              opportunities, </span> sources confirmed that the interviewee, who was the only individual in the establishment wearing professional attire, went on to describe <b>her</b> work

              experience and career aspirations in front of approximately two dozen café patrons and Starbucks <u>employees</u>.

        </p>

       <p>

        Reports indicate that the interview reached a <b><i>new level of despair when </i></b>a Starbucks patron interrupted the meeting and asked the job applicant if he could take

        the unused seat next to her back to his table.

        </p>

        <p>

        <span style="fff%[%]" >Obviously, I’d be thrilled to be involved with an organization like this,</span> said the interviewee, who had been seen aimlessly milling about the outdoor

        commercial plaza that houses the coffee shop for approximately 45 minutes prior to the interview. <span style="width: 90%; f.x.^^">I’m sure you’re seeing a lot of candidates, but I’d like

        to think I can bring something to the table that you can’t get from anyone else.</span>

        </p>

        <p>

        <span style="fff%[%]">I honestly feel that this could be a great chance for me to grow both <b>personally and professionally,</b></span> added the woman, whose earnestness

            juxtaposed with the Starbucks employee sweeping around her feet was, sources noted, absolutely heartbreaking.

        </p>

        <p>

        According to witnesses, while the interview itself was completely and utterly pathetic, the moments leading up to it were equally depressing, with the job applicant having

        difficulty identifying her interviewer when she first entered the Starbucks and asking another man if he was Jim Oswald. In addition, when the two ultimately found one another,

        the awkward small talk they engaged in while waiting in line to order coffee was reportedly a tragic spectacle, to say the least.

        </p>

        <p>

        Other unbearably bleak and humiliating features of the interview included the amount of muffin crumbs that needed to be wiped to the ground before the two sat down and the

        homeless people within earshot who were talking incoherently about a range of subjects.

        </p>

        <p>

        Official records also confirmed that job interviews don’t break your f[%removed%]ing heart when they are conducted in an office as opposed to what is essentially a fast-food restaurant.

        </p>

    </article>

</articles>

';


/*************************************************

(2) TRANSFORMS: Create two transforms:

       1. To parse the XML

       2. Trims excess space

*************************************************/

DECLARE @xslt xml='

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output method="text" indent="yes" xml:space="preserve" />

       <xsl:template match="/articles" xml:space="default">

              <!-- need: (1)par_id (2)title (3)par_nbr (4)content -->

              <xsl:apply-templates select="/articles/article" />

    </xsl:template>

       <xsl:template match="/articles/article" xml:space="default">

              <xsl:value-of select="position()" />||<xsl:apply-templates select="@name" />||

              <xsl:apply-templates select="p" />

       </xsl:template>

       <xsl:template match="p" xml:space="default">

              <xsl:apply-templates /><xsl:if test="position()!=last()">||</xsl:if>

       </xsl:template>

</xsl:stylesheet>

';

DECLARE @xslt_clean xml='

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output method="text" indent="yes" xml:space="preserve" />

       <xsl:template match="text()" xml:space="default">

              <xsl:value-of select="normalize-space()"/>

    </xsl:template>

</xsl:stylesheet>

';


/*************************************************

(3) EXECUTE TRANSFORMS & TRIM SPACE

       1. Transform the XML and pass the result tree to @val

       2. Trim excess space (free up memory)

*************************************************/

DECLARE @val varchar(7000)=(SELECT mdq.XmlTransform(@xml, @xslt));

SET @val=(SELECT mdq.XmlTransform(@val, @xslt_clean));

WITH article_info AS

(      SELECT sequence,

RTRIM(LTRIM(token)) AS value

       FROM mdq.split(@val, '||', 1, '', 0))

SELECT Sequence, ROW_NUMBER() OVER (ORDER BY sequence) ordinal, value

INTO #tmp FROM article_info;


/*************************************************

(4) PERFORM INSERTS

       1. articles

       2. content (with a_id as pk for a, fk for ac)

*************************************************/

INSERT INTO #articles

       SELECT value FROM #tmp WHERE Sequence=2;

INSERT INTO #article_content (article_id, p_nbr, paragraph)

       SELECT MAX(article_id) OVER (ORDER BY article_id), Sequence-2,

value

       FROM #tmp, #articles a

       WHERE value<>a.article AND value<>CAST(article_id AS varchar(10))

GO


/*************************************************

(5) TESTING: Ensure that data is correct.

*************************************************/

SELECT * FROM #articles;

SELECT * FROM #article_content;


/*************************************************

(6) CLEANUP

*************************************************/

DROP TABLE #tmp;

DROP TABLE #articles;

DROP TABLE #article_content;

... and that's it.

One thing to note is that I used a mdq.split, a CLR splitter function. You don't need a CLR splitter for this; that was the one I was using when I tested this code.


But what about portability?!?!

That's that best part. The XSLT functions I wrote were developed using XSLT 1.0. Not some "flavor" of XSLT, not T-XSLT, XSLT.NET-sharp, PL/XML or DBTransforms2 (I made those up). The XML format is version 1.0. People have been mastering XSLT 1.0 since last century (1999) because it works, it's proven and isn't going anywhere. Yeah, it's a little tricky but this opensource, non-proprietary functional coding goodness is available to you and your SQL instance at no cost and without compiling a new DLL or any new SQL objects. I have done transforms in MYSQL and Oracle using code I originally wrote for SQL Server and did not need to change a single line. That's what portability sounds like to me.


Conclusion

mdq.xmlTransform is a powerful tool; in this post I demonstrated how to use it to transform XML into relational data. As I said earlier, XSLT is about transforming data not just XML. If you don't like XML then think of mdq.xmlTransform as a way to turn your XML into SQL, CSV or anything else. XSLT 3.0 has great support for JSON. More about that in future posts. Thanks for reading!


--ab

The new xmlsqlninja logo ('X-M-L'-'See-Quel' 'ninja')

P.S. Two Declarative programming languages, 0 loops!



Last Updated: 4/27/2013 (Posted, re-posted with typo's fixed and extra code)

8 comments:

  1. Thank you Aparna! I'm glad you liked the article.

    ReplyDelete
  2. "MDQXMLTransform Part 1: Basic XSLT" appears to be a tutorial or guide on using XSLT VPN Movies Lag (Extensible Stylesheet Language Transformations) for XML transformation of context.

    ReplyDelete