donderdag 24 september 2009

Using XML in an SQL query to UNPIVOT data

To continue with my growing addicition to the SQL Challenges, I was looking at Challenge #10 from Beyond Relational (prior to the winning posted solutions and their explanations). 

After doing a bit of seraching I came across a solution by Syed Mehroz Alam on his blog. His solution involved the unpivot/pivot method. I started thinking if there was another way of unpivotting the data using XML and thus this post which is my delvings into TSQL ,and XML .

Just to recap what the challenge was, and thus what started my delving into XML:

Horizontal and Vertical sorting of a result set

Given the following table, sort horizontally and vertically and remove duplicates once sorted:



Result should look like :



To retrieve data in XML format from a SQL Statement the FOR XML clause is used. There are 4 mode options that can be used with this clause:
  1. RAW
  2. AUTO
  3. EXPLICIT
  4. PATH
To get an excellent explanation of how to use the above clauses refer to the article Using the FOR XML Clause to Return Query Result as XML by Robert Shelden.

Test Data
DECLARE @t TABLE ( c1 CHAR(1), c2 CHAR(1), c3 CHAR(1) )
INSERT INTO@t VALUES ( '2' '1' '3' )
INSERT INTO@t VALUES ( '3' '2' '1' )
INSERT INTO@t VALUES ( 'Z' 'X' 'Y' )
INSERT INTO@t VALUES ( 'B' 'C' 'D' )
INSERT INTO@t VALUES ( 'Y' 'Z' 'X' )
INSERT INTO@t VALUES ( 'B' 'C' 'A' )


Goal XML
The following xml is what is what I want to achieve:





Version1: FOR XML RAW

SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS rowid,
           c1, c2, c3
FROM @t
FOR XML RAW('DataRow'), ROOT('Root'), Type

  • The Rownumber () function is used to achieve a unique rowid
  • The use of Select 1 in the order by statement of the Rownumber function is because the actual ordering is not important.

This will result in the following XML :



The above is really just an "XML" representation of the data in the table and not anywhere close to an "unpivotted" view of the table which is what I am trying to achieve.

VERSION2:  FOR XML AUTO

SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS rowid,
           c1, c2, c3
FROM @t
FOR XML AUTO , ROOT('Root'), Type

This option resulted in similar xml to the RAW option except that  the "DataRow" tag names were generated as Row Tag name is only allowed with RAW or PATH mode of FOR XML.

The tag names that are generated are generally the table names used in the query , but as I am using a table variable it generates a random name ( xt0040_t  in my case ). To alleviate the random generation of tags, one can use aliases. The following will generate identical xml to the FOR XML RAW option above.

SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS  rowid,
           DataRow.c1,  DataRow.c2,   DataRow.c3
FROM @t      DataRow
FOR XML AUTO , ROOT('Root'), Type


Moving to an Unpivotted view

With some minor adjustments to either of the above queries ( as they both give identical results) and the use of a CTE ,  a "parent" / "Child" relationship can be simulated whereby the Columns ( c1, c2, and c3) are seen as children of the Parent  , ie the rowid.  

WITH [Columns] AS  (
      SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 )) AS rowid,
                c1, c2, c3
     FROM @t
)
SELECT DataRow.RowID,
           (  SELECT c1, c2, c3
              FROM [Columns]
              WHERE [Columns].RowID = DataRow.RowID
              FOR XML AUTO, ELEMENTS, TYPE)
FROM [Columns] as DataRow
FOR XML AUTO, Root('Root');

This results in the following XML :



The above version gave a parent/child view but even though the columns c1, c2 and c3 are seperate tags now, they have different names, thus the result was still not what I wanted.

VERSION3: EXPLICIT

I found this mode rather cumbersome, and again could not produce the result I wanted.
I did manage to produce a similar result to above :

WITH theData AS

(
   SELECT ROW_NUMBER() over (order by (SELECT 1)) AS RowID,
              c1, C2, C3
   FROM @t
)
SELECT 1 AS Tag,
           NULL AS Parent,
           RowID AS [Row!1!RowID],
           NULL AS [Column!2!c1!ELEMENT],
           NULL AS [Column!2!c2!ELEMENT],
           NULL AS [Column!2!c3!ELEMENT]
FROM theData
UNION ALL
SELECT 2 AS Tag,
           1 AS Parent,
           RowID,
           c1,
           c2,
           c3
FROM theData
ORDER BY [Row!1!RowID], [Column!2!c1!ELEMENT]
FOR XML EXPLICIT;

Resulting XML:




VERSION3: XPATH

Last but not least I played with the XPath mode and finally got my result.

SELECT
(
     SELECT ROW_NUMBER() over (order by c1, c2, c3) AS '@rowid'
               ,'c1' AS 'Column/@Name'
               , c1 AS 'Column/@value'
               ,NULL
               ,'c2' AS 'Column/@Name'
               , c2 AS 'Column/@value'
               ,NULL
               , 'c3' AS 'Column/@Name'
               , c3  AS 'Column/@value'
               ,NULL
     FROM @t
     FOR XML PATH('DataRow'), Type
)
FOR XML PATH('Root')

The trick in the above statement is the insertion of the NULL selections between the "nodes".

UNPIVOT RESULT

Well finally having achieved the XML I needed using the XPATH, writing the statement to produce an unpivotted result was a piece of cake. 

Assuming the above query puts its result in the variable @Xml  ( ie set @xml = ( .... )  )

EXEC sp_xml_preparedocument @idoc OUTPUT, @xml
SELECT * , ROW_NUMBER() over (partition by rowID order by col) as colID
FROM OPENXML (@idoc, 'Root/DataRow/Column', 1 )
WITH ( Rowid char(1) '../@rowid',
          Col     char(1) '@value' )

To clarify the extra RowNumber function , this is necessary as pivotted data is always 3 dimensional.  The COLID column can be used to repivot the data as it signifies the "column number"  (c1, c2 and c3 )

The Challenges required end result:

To get back to the reason I started this delve into SQL, challenge #10 and my twist on things.
Here is the complete sql giving the end result for the challenge:

DECLARE @idoc int

DECLARE @XML varchar(max)
SET @xml = (
  SELECT
  (
     SELECT ROW_NUMBER() over (order by c1, c2, c3) AS '@rowid'
               ,'c1' AS 'Column/@Name'
               , c1 AS 'Column/@value'
               ,NULL
               ,'c2' AS 'Column/@Name'
               , c2 AS 'Column/@value'
               ,NULL
               , 'c3' AS 'Column/@Name'
               , c3  AS 'Column/@value'
               ,NULL
     FROM @t
     FOR XML PATH('DataRow'), Type
  )
  FOR XML PATH('Root')
)
EXEC sp_xml_preparedocument @idoc OUTPUT, @xml
SELECT DISTINCT  result.c1, result.c2, result.c3
FROM
(
      SELECT ROWID, [1] as c1, [2] as c2, [3] as c3
      FROM
      (
          SELECT * , ROW_NUMBER() over (partition by rowID order by col) as colID
          FROM OPENXML (@idoc, 'Root/DataRow/Column', 1 )
          WITH ( Rowid char(1) '../@rowid', Col char(1) '@value' )
       ) sorted
       PIVOT ( min(col) for colID in ([1],[2],[3])) as pvt
) result
ORDER BY result.c1, result.c2, result.c3







Geen opmerkingen:

Een reactie posten