Tuesday, May 28, 2013

mdq.regex CLR Functions -- Part 2: mdq.RegexMatches


Updated 12/7/2013

I updated this article on 12/7/2013 to include a link to instructions for enabling CLR Integration as and an updated link for the instructions to create the [Microsoft.MasterDataServices.DataQuality] assembly.


Intro

This is the second article in my series, mdq.regex CLR Functions. In my previous article about mdq.regex functions, mdq.regex CLR Functions -- Part 1 I introduced mdq.RegexIsMatch, a scalar CLR function that returns a bit based on if a regex pattern is matched. Today we will review mdq.RegexMatches.

mdq.RegexMatches is a Table Valued CLR function. Like mdq.RegexIsMatch, setup is fast, easy and this CLR was developed, tested, tuned and QA’d by Microsoft.

Setup

To create this function you will need:

  1. A schema named, “mdq” in your target DB
  2. CLR integration enabled
  3. The [Microsoft.MasterDataServices.DataQuality] assembly

See this article for details on enabling CLR integration and this article for the steps to create the assembly.


Run the following DDL to create mdq.RegexMatches:

CREATE FUNCTION mdq.[RegexMatches](@input [nvarchar](4000), @pattern [nvarchar](4000), @mask [tinyint] = 0)

RETURNS TABLE

(      [Sequence] [int] NULL,

       [Token] [nvarchar](4000) NULL)

WITH EXECUTE AS CALLER

AS

EXTERNAL NAME [Microsoft.MasterDataServices.DataQuality].[Microsoft.MasterDataServices.DataQuality.SqlClr].[RegexMatches]

GO



Using mdq.RegexMatches

mdq.RegexMatches searches an input string for all occurrences of a regular expression and returns all the successful matches. This function uses the regular expression functionality of the Matches method in the Microsoft .NET Framework.


Syntax

mdq.RegexMatches (input,pattern,mask)


Arguments

input

Is the input string for which to find matches. input is nvarchar(4000) with no default.

pattern

Is the regular expression pattern to use for matching. pattern is nvarchar(4000) with no default.

mask

Is the RegexOptions mask that specifies the behavior of the regular expression. For more information, see mdq.RegexMask (Transact-SQL).


mdq.RegexMatches Code Samples

In the first example below I am searching for numeric patterns. For the second example I will demonstrate how to search for records based on a word pattern. We will conclude with a technique for querying a table for valid and invalid EMail addresses.

mdq.regexMatches to find numeric patterns

-- (1) Let's find this pattern: (one or more numbers)-(one or more numbers)-(one or more numbers) [e.g. 222-111-9999]

SELECT Token AS pattern

FROM clr.RegexMatches(N'XYXYXYXYYX11-222-333XYXYXYXYXY', N'\d+-\d+-\d+', 0);

GO

 

-- (2) Let's find patterns of one or more numbers

DECLARE @text varchar(1000)='1 is one, 2 is two and 3 makes three',

        @find_cons_digits varchar(20)='\d+';

SELECT Sequence, Token AS value

FROM clr.RegexMatches(@text, @find_cons_digits, 0);

GO

 

 

-- (3) Let's find sequenses of two consecutive numbers

DECLARE @text varchar(1000)='1 is one, 2 is two and 3 makes three',

        @find_cons_digits varchar(20)='\d+';

 

SELECT  @text='1 is zero, 11 is one, 22 is two... 345678 makes three, four and five',

        @find_cons_digits='\d{2}';

 

SELECT Sequence, Token AS value

FROM clr.RegexMatches(@text, @find_cons_digits, 0);

GO


mdq.regexMatches to find word patterns

-- (4) Let's perform a word count

DECLARE @quote varchar(1000)='The world is a dangerous place to live; not because of the people who are evil, but because of the people who dont do anything about it. --Albert Einstein',

                @quote2 varchar(100)='apple orange orange pear',

                @word_count varchar(20)='\b(\w+)',

                @ab_count varchar(20)='\b([ab]\S+)',

                @find_duplicates varchar(50)='\b(\w+?)\s\1\b';

 

SELECT COUNT(*) AS [word count]

FROM clr.RegexMatches(@quote,@word_count,0);

-------------------------------------------------------

 

-- (5) Let's find how many words begin with A or B

SELECT COUNT(*) AS [words that begin with A or B]

FROM clr.RegexMatches(@quote, @ab_count,0);

 

-- (6) Let's find repeated words

SELECT @quote2 AS quote, Token AS [repeated words]

FROM clr.RegexMatches(@quote2,@find_duplicates,0);

 

-- (7) Let's normalize a string, a series of numbers or both:

SELECT Sequence, Token AS val

FROM clr.RegexMatches(N'NormalizeMe', N'([a-zA-z])', ''); --string

 

SELECT Sequence, Token AS val

FROM clr.RegexMatches(N'54321', N'\d', ''); --series of numbers

 

SELECT Sequence, Token AS val

FROM clr.RegexMatches(N'text1234', N'([a-zA-z0-9])', ''); --text and numbers


Query that returns valid Email addresses and one that only returns invalid addresses:

IF OBJECT_ID('tempdb..#customer_email') IS NOT NULL

        DROP TABLE #customer_email;

 

CREATE TABLE #customer_email

(       email_id int identity primary key,

        cust_id int NOT NULL,

        email nvarchar(100) unique NOT NULL);

 

INSERT INTO #customer_email (cust_id, email)

        SELECT  1,'john.doe@a+++cme. com'       UNION ALL

        SELECT  1,'JohnDoe@somewhere.com'   UNION ALL

        SELECT  2,'Sally.Smith@Microsoft'   UNION ALL

        SELECT  3,'AlChurch@turning.net'    UNION ALL

        SELECT  4,'JaneDoe@aol.com';

GO

 

DECLARE @validEmail nvarchar(1000);

SET @validEmail='^(([a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+(\.[a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+)*){1,64}@(([A-Za-z0-9]+[A-Za-z0-9-_]*){1,63}\.)*(([A-Za-z0-9]+[A-Za-z0-9-_]*){3,63}\.)+([A-Za-z0-9]{2,4}\.?)+){1,255}$';

 

-- (1) This will return records with only valid EMail addresses

SELECT e. *

FROM #customer_email e

CROSS APPLY clr.RegexMatches(e.email,@validEmail, 0);

 

-- (2) This will return records with only invalid EMail addresses

SELECT e.email_id, e.cust_id, e.email

FROM #customer_email e

EXCEPT

SELECT ex.email_id, ex.cust_id, ex.email

FROM #customer_email ex

CROSS APPLY clr.RegexMatches(ex.email, @validEmail, 0);

 

DROP TABLE #customer_email;

 


... EMail Query results:



Conclusion

Today we reviewed the steps for setting up and using mdq.RegexMatches. Thanks for reading!


--ab

The new xmlsqlninja logo ('X-M-L'-'See-Quel' 'ninja')

Last Updated: 12/7/2013 (minor updates, repaired broken links)

Monday, May 27, 2013

5 Career Killers You Must Absolutely Avoid

5 Career Killers You Must Absolutely Avoid

(via Dice News in Tech)

Here are a couple of obvious ways to get yourself in trouble: Insult your boss on Facebook, and look for a new job on company time. But as annoyed as your boss might get over those, they’re usually survivable mistakes, at least as long as you don’t repeat them. The real dangers that could lead…

Tuesday, May 21, 2013

Reading XML Data


Intro

In this post I will demonstrate how to query XML data. This is one of those things that is much easier to learn by example so I provided five examples. I won't get into how to get the data in or out of SQL Server because that topic is outside the scope of this article. All you need to know is that it's easy if you know a little WMI scripting, powershell or .NET.



Examples

In the examples below I will read XML data from:

  1. An XML fragment from a MS Word Doc
  2. An XML fragment from an SSRS Report
  3. An event from the Windows Event Viewer
  4. An XML fragment from an SSIS package
  5. An XML fragment from a SQL trace

For these examples I will not include the entire XML document because the XML can be quite verbose. Instead I am using XML fragments (a portion of the entire XML document). This way you can get the general idea without having to copy/paste a ton of XML.


Setting up the sample data

Use this code:

/*******************************************************

(0) Create some sample data

 

Note: For simplicity I am only including XML fragments and

I removed the namespaces, namespace declarations, etc.

*******************************************************/

IF OBJECT_ID('tempdb..#data') IS NOT NULL

       DROP TABLE #data;

 

CREATE TABLE #data (id int primary key, xmldesc varchar(100) not null, doc xml not null);

 

DECLARE @worddoc xml='

<wordDocument>

<body>

       <p><r><t>1:  Paragraph 1...</t></r></p>

       <p><r><t>2:  Paragraph 2...</t></r></p>

       <p><r><t>3:  Paragraph 3...</t></r></p>

</body>

</wordDocument>';

DECLARE @ssrs_report xml=

REPLACE(CAST(

'<Report>

       <DataSources>

              <DataSource Name="QuestNTDev">

                     <DataSourceReference>Something_Dev</DataSourceReference>

                     <rd:SecurityType>None</rd:SecurityType>

                     <rd:DataSourceID>54f84859-75fd-45a8-a5b7-5f6fec3ede7b</rd:DataSourceID>

              </DataSource>

       </DataSources>

       <DataSets>

              <DataSet Name="ds_p_major_region"/>

              <DataSet Name="ds_p_country_code"/>

              <DataSet Name="ds_p_acct_group"/>

              <DataSet Name="ds_p_currency_code"/>

              <DataSet Name="ds_p_profit_center"/>

              <DataSet Name="ds_p_office_id"/>

              <DataSet Name="ds_p_total_by_office"/>

              <DataSet Name="ds_p_subtotal_on_summary"/>

              <DataSet Name="ds_get_spid"/>

              <DataSet Name="ds_output"/>

       </DataSets>

</Report>' AS varchar(1000)),'rd:','');

DECLARE @events xml='

<Events>

       <Event>

              <System>

                     <Provider Name="MSSQL$SQL2012"/>

                     <EventID Qualifiers="16384">17137</EventID>

                     <TimeCreated SystemTime="2012-05-10T16:18:53.000000000Z"/>

                     <EventRecordID>16563</EventRecordID>

                     <Channel>Application</Channel>

                     <Computer>SQLSERVER1</Computer>

              </System>

              <EventData>

                     <Data>ReportServer$SQL2012TESTTempDB</Data>

                     <Binary>F14200000A000000...</Binary>

              </EventData>

       </Event>

       <Event>

              <System>

                     <Provider Name="MSSQL$SQL2012"/>

                     <EventID Qualifiers="19999">1000225</EventID>

                     <TimeCreated SystemTime="2013-05-6T16:18:53.000000000Z"/>

                     <EventRecordID>16999</EventRecordID>

                     <Channel>Application</Channel>

                     <Computer>SQLSERVER1</Computer>

              </System>

              <EventData>

                     <Data>blah</Data>

                     <Binary>F14200000A000000...</Binary>

              </EventData>

       </Event>

</Events>';

DECLARE @ssis_package xml=

REPLACE(CAST(

'<DTS:Executable DTS:ExecutableType="SSIS.Package.2">

  <DTS:Property DTS:Name="PackageFormatVersion">2</DTS:Property>

  <DTS:Property DTS:Name="VersionComments"></DTS:Property>

  <DTS:Property DTS:Name="CreatorName">DOMAIN\username</DTS:Property>

  <DTS:Property DTS:Name="CreatorComputerName">AJBTEST</DTS:Property>

  <DTS:Property DTS:Name="CreationDate"

                DTS:DataType="7">11/25/2012 3:35:28 PM</DTS:Property>

</DTS:Executable>

' AS varchar(1000)),'DTS:','');

DECLARE @sqlTrace xml='

<TraceData>

  <Events>

    <Event id="65534" name="Trace Start">

      <Column id="14" name="StartTime">2013-05-10T17:21:28.007-05:00</Column>

    </Event>

    <Event id="13" name="SQL:BatchStarting">

      <Column id="1" name="TextData">SELECT 1</Column>

      <Column id="3" name="DatabaseID">7</Column>

      <Column id="35" name="DatabaseName">ajbTest</Column>

      <Column id="12" name="SPID">60</Column>

      <Column id="14" name="StartTime">2013-05-10T17:21:48.917-05:00</Column>

      <Column id="26" name="ServerName">MYSERVER\SQL2012Express</Column>

    </Event>

    <Event id="65533" name="Trace Stop">

      <Column id="14" name="StartTime">2013-05-10T17:21:54.46-05:00</Column>

    </Event>

  </Events>

</TraceData>';

 

INSERT INTO #data

       SELECT 1, 'An XML fragment from a MS Word Doc', @worddoc UNION ALL

       SELECT 2, 'An XML fragment from an SSRS Report', @ssrs_report UNION ALL

       SELECT 3, 'An event from the Event Viewer',            @events UNION ALL

       SELECT 4, 'An XML fragment from an SSIS pkg',   @ssis_package UNION ALL

       SELECT 5, 'An XML fragment from a SQL trace',   @sqlTrace;


The contents of #data will look like this:




Reading XML data

Note my comments in the code

/*******************************************************

(1) Reading from a word doc

*******************************************************/

SELECT p.value('(r/t/text())[1]', 'varchar(20)') AS p_num

FROM #data

CROSS APPLY doc.nodes('wordDocument/body/p') AS doc(p)

WHERE id=1;

 

/*******************************************************

(2) Reading from an SSRS Report

*******************************************************/

SELECT datasource.value('(DataSourceReference/text())[1]', 'varchar(20)') AS DataSource,

              dataset.value('(@Name)', 'varchar(20)') AS DataSet

FROM #data

CROSS APPLY doc.nodes('Report/DataSources/DataSource') AS datasources(datasource)

CROSS APPLY doc.nodes('Report/DataSets/DataSet') AS datasets(dataset)

WHERE id=2;

 

/*******************************************************

(3) Reading from extracted event viewer data

*******************************************************/

SELECT [System].value('(Computer/text())[1]', 'varchar(20)') AS SQLBox,

              [System].value('(../EventData/Data/text())[1]', 'varchar(50)') AS DB,

              [System].value('(TimeCreated/@SystemTime)[1]', 'varchar(20)') AS StartupTime

FROM #data

CROSS APPLY doc.nodes('Events/Event/System') AS events([System])

WHERE id=3

AND [System].value('(EventRecordID/text())[1]', 'int')=16563

AND [System].value('(Channel/text())[1]', 'varchar(20)')='Application';

 

 

/*******************************************************

(4) Reading from a SSIS package

*******************************************************/

SELECT x.value('(@ExecutableType)[1]', 'varchar(100)') AS PkgType,

              x.value('(Property[@Name="CreatorName"])[1]', 'varchar(100)') AS Creator,

              x.value('(Property[@Name="CreationDate"])[1]', 'varchar(100)') AS CreatedOn,

              x.value('(Property[@Name="CreatorComputerName"])[1]', 'varchar(100)') AS Computer

FROM #data

CROSS APPLY doc.nodes('Executable') AS ex(x)

WHERE id=4;

 

 

/*******************************************************

(5) Reading from an SQL trace (XML format)

*******************************************************/

SELECT  e.value('(@name)[1]', 'varchar(100)') AS [event],

              e.value('(Column[@name="StartTime"]/text())[1]', 'varchar(100)') AS StartTime,

              e.value('(Column[@id="1"]/text())[1]', 'varchar(100)') AS Command

FROM #data

CROSS APPLY doc.nodes('TraceData/Events/Event') AS events(e)

WHERE id=5;



... and the results:




Conclusion

Today I demonstrated how to use T-SQL to read from different XML data structures. Thanks for reading!


--ab

The new xmlsqlninja logo ('X-M-L'-'See-Quel' 'ninja')

Last Updated: 5/21/2013 (Posted)