After doing a bit of seraching I came across a solution by Syed Mehroz Alam on his blog. His solution involved the unpivot/pivot method. I started thinking if there was another way of unpivotting the data using XML and thus this post which is my delvings into TSQL ,and XML .
Just to recap what the challenge was, and thus what started my delving into XML:
Horizontal and Vertical sorting of a result set
Given the following table, sort horizontally and vertically and remove duplicates once sorted:
Result should look like :
To retrieve data in XML format from a SQL Statement the FOR XML clause is used. There are 4 mode options that can be used with this clause:
- RAW
- AUTO
- EXPLICIT
- PATH
Test Data
DECLARE @t TABLE ( c1 CHAR(1), c2 CHAR(1), c3 CHAR(1) )
INSERT INTO@t VALUES ( '2' , '1' , '3' )
INSERT INTO@t VALUES ( '3' , '2' , '1' )
INSERT INTO@t VALUES ( 'Z' , 'X' , 'Y' )
INSERT INTO@t VALUES ( 'B' , 'C' , 'D' )
INSERT INTO@t VALUES ( 'Y' , 'Z' , 'X' )
INSERT INTO@t VALUES ( 'B' , 'C' , 'A' )
Goal XML
The following xml is what is what I want to achieve:
Version1: FOR XML RAW
SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS rowid,
c1, c2, c3
FROM @t
FOR XML RAW('DataRow'), ROOT('Root'), Type
- The Rownumber () function is used to achieve a unique rowid
- The use of Select 1 in the order by statement of the Rownumber function is because the actual ordering is not important.
This will result in the following XML :
The above is really just an "XML" representation of the data in the table and not anywhere close to an "unpivotted" view of the table which is what I am trying to achieve.
VERSION2: FOR XML AUTO
SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS rowid,
c1, c2, c3
FROM @t
FOR XML AUTO , ROOT('Root'), Type
This option resulted in similar xml to the RAW option except that the "DataRow" tag names were generated as Row Tag name is only allowed with RAW or PATH mode of FOR XML.
The tag names that are generated are generally the table names used in the query , but as I am using a table variable it generates a random name ( xt0040_t in my case ). To alleviate the random generation of tags, one can use aliases. The following will generate identical xml to the FOR XML RAW option above.
SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS rowid,
DataRow.c1, DataRow.c2, DataRow.c3
FROM @t DataRow
FOR XML AUTO , ROOT('Root'), Type
Moving to an Unpivotted view
With some minor adjustments to either of the above queries ( as they both give identical results) and the use of a CTE , a "parent" / "Child" relationship can be simulated whereby the Columns ( c1, c2, and c3) are seen as children of the Parent , ie the rowid.
WITH [Columns] AS (
SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 )) AS rowid,
c1, c2, c3
FROM @t
)
SELECT DataRow.RowID,
( SELECT c1, c2, c3
FROM [Columns]
WHERE [Columns].RowID = DataRow.RowID
FOR XML AUTO, ELEMENTS, TYPE)
FROM [Columns] as DataRow
FOR XML AUTO, Root('Root');
This results in the following XML :
The above version gave a parent/child view but even though the columns c1, c2 and c3 are seperate tags now, they have different names, thus the result was still not what I wanted.
VERSION3: EXPLICIT
I found this mode rather cumbersome, and again could not produce the result I wanted.
I did manage to produce a similar result to above :
WITH theData AS
(
SELECT ROW_NUMBER() over (order by (SELECT 1)) AS RowID,
c1, C2, C3
FROM @t
)
SELECT 1 AS Tag,
NULL AS Parent,
RowID AS [Row!1!RowID],
NULL AS [Column!2!c1!ELEMENT],
NULL AS [Column!2!c2!ELEMENT],
NULL AS [Column!2!c3!ELEMENT]
FROM theData
UNION ALL
SELECT 2 AS Tag,
1 AS Parent,
RowID,
c1,
c2,
c3
FROM theData
ORDER BY [Row!1!RowID], [Column!2!c1!ELEMENT]
FOR XML EXPLICIT;
Resulting XML:
VERSION3: XPATH
Last but not least I played with the XPath mode and finally got my result.
SELECT
(
SELECT ROW_NUMBER() over (order by c1, c2, c3) AS '@rowid'
,'c1' AS 'Column/@Name'
, c1 AS 'Column/@value'
,NULL
,'c2' AS 'Column/@Name'
, c2 AS 'Column/@value'
,NULL
, 'c3' AS 'Column/@Name'
, c3 AS 'Column/@value'
,NULL
FROM @t
FOR XML PATH('DataRow'), Type
)
FOR XML PATH('Root')
The trick in the above statement is the insertion of the NULL selections between the "nodes".
UNPIVOT RESULT
Well finally having achieved the XML I needed using the XPATH, writing the statement to produce an unpivotted result was a piece of cake.
Assuming the above query puts its result in the variable @Xml ( ie set @xml = ( .... ) )
EXEC sp_xml_preparedocument @idoc OUTPUT, @xml
SELECT * , ROW_NUMBER() over (partition by rowID order by col) as colID
FROM OPENXML (@idoc, 'Root/DataRow/Column', 1 )
WITH ( Rowid char(1) '../@rowid',
Col char(1) '@value' )
To clarify the extra RowNumber function , this is necessary as pivotted data is always 3 dimensional. The COLID column can be used to repivot the data as it signifies the "column number" (c1, c2 and c3 )
The Challenges required end result:
To get back to the reason I started this delve into SQL, challenge #10 and my twist on things.
Here is the complete sql giving the end result for the challenge:
DECLARE @idoc int
DECLARE @XML varchar(max)
SET @xml = (
SELECT
(
SELECT ROW_NUMBER() over (order by c1, c2, c3) AS '@rowid'
,'c1' AS 'Column/@Name'
, c1 AS 'Column/@value'
,NULL
,'c2' AS 'Column/@Name'
, c2 AS 'Column/@value'
,NULL
, 'c3' AS 'Column/@Name'
, c3 AS 'Column/@value'
,NULL
FROM @t
FOR XML PATH('DataRow'), Type
)
FOR XML PATH('Root')
)
EXEC sp_xml_preparedocument @idoc OUTPUT, @xml
SELECT DISTINCT result.c1, result.c2, result.c3
FROM
(
SELECT ROWID, [1] as c1, [2] as c2, [3] as c3
FROM
(
SELECT * , ROW_NUMBER() over (partition by rowID order by col) as colID
FROM OPENXML (@idoc, 'Root/DataRow/Column', 1 )
WITH ( Rowid char(1) '../@rowid', Col char(1) '@value' )
) sorted
PIVOT ( min(col) for colID in ([1],[2],[3])) as pvt
) result
ORDER BY result.c1, result.c2, result.c3