I was surfing the net the other day trying to find a solution for a TSQL problem I was working on and came across this site Beyond Relational.
As I love challenges, I proceeded to try the TSQL challenge ( Nr 12 in August 2009 ).
The Challenge
"Build sequential ranges of dates with propogation to missing values".
The Context:
You are working for an online gaming company and as usual when we talk about games we need to manage scores. Some features in your system had recently changed. Before when a user get connected you only shown him its last score, but now you have to present him a graph month by month of its best score in each one since its first game until the current date.
The Scores:
YearMonth | Score |
----------------------------- |
200903 | 100 |
200803 | 95 |
200802 | 99 |
200801 | 100 |
200711 | 100 |
When a user connects for the first time after the deployment of the new system you will need to produce a table based on the original scores table with the following conditions:
- Create a new couple year/month for each missing month between two valid months of the original table original table
- For each new couple created, you should recopy the score of the last month he played.
- Continue the list until the current month (included).
Here is the resulting table you need to produce
YearMonth | Score |
----------------------------- |
200908 | 100 |
200907 | 100 |
200906 | 100 |
200905 | 100 |
200904 | 100 |
200903 | 100 |
200902 | 95 |
200901 | 95 |
200812 | 95 |
200811 | 95 |
200810 | 95 |
200809 | 95 |
200808 | 95 |
200807 | 95 |
200806 | 95 |
200805 | 95 |
200804 | 95 |
200803 | 95 |
200802 | 99 |
200801 | 100 |
200712 | 100 |
200711 | 100 |
My Solution
Well looking at the above I figured the first thing i needed to do was to produce a working set which had all the missing year months, thereby satisfying the first condition of the challenge ie Build sequential ranges of datesranges of dates
My second problem would be to figure out how to ensure that the values of these missing year/month values were propogated with the scores from the "last" month played prior to the inserted missing year/month value.
The approach I used involved the use of CTE (Common Table Expressions) .
I am going to post my whole solution first and then explain the individual parts.
-- Base Data
Declare @Scores TABLE
(
YearMonth
INT,
Score
INT,
)
INSERT @Scores VALUES ( 200903, 100 )
INSERT @Scores VALUES ( 200803, 95 )
INSERT @Scores VALUES ( 200802, 99 )
INSERT @Scores VALUES ( 200801, 100 )
INSERT @Scores VALUES ( 200711, 100 )
--------------------------------------------------------------------------------
WITH
-- relevant years
YEARS (someyear) AS
(
SELECT ( SELECT MIN ( YEAR ( CONVERT(datetime, Convert(varchar(6),Yearmonth) + '01' ) ) 0 )
FROM @Scores ) as someyear
UNION ALL
SELECT someyear + 1
FROM YEARS
WHERE someyear < YEAR( getdate() )
), -- all months
MONTHS (somemonth) AS
(
SELECT 1 AS somemonth
UNION ALL
SELECT somemonth + 1
FROM MONTHS
WHERE somemonth < 12
), -- COMBINE THE years and months to get an inflated range OF Year Month combo's.
YEARMONTHS ( YearMonth) AS
(
SELECT DISTINCT c.ym
FROM
(
SELECT CONVERT( int, CONVERT(varchar(4), a.someyear) +
CASE WHEN b.somemonth < 10
THEN '0' + CONVERT(varchar(2), b.somemonth)
ELSE CONVERT(varchar(2), b.somemonth)
END ) as ym
FROM YEARS a cross join MONTHS b
) c
WHERE c.ym >= ( SELECT MIN(Yearmonth) FROM @Scores )
AND c.ym <= ( SELECT CONVERT(int, CONVERT(varchar(6), YEAR( getdate() ) ) +
CASE
WHEN MONTH( getdate() ) < 10
THEN '0' + CONVERT( varchar(2), MONTH ( getdate() ) )
END )
)
), -- join to base data to fill in the existing known scores
result ( rownum, YearMonth, Score ) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY a.YearMonth) as rownum,
a.YearMonth, b.Score
FROM YEARMONTHS a
LEFT JOIN @Scores b ON b.YearMonth = a.YearMonth
) -- inflate with previous scores
endresult ( rownum, YearMonth, Score ) AS
(
SELECT rownum, YearMonth, Score
FROM result
UNION ALL
SELECT a.rownum, a.Yearmonth, b.Score
FROM result a
JOIN endresult b ON b.rownum = a.rownum-1
WHERE a.Score IS NULL
) -- get end result
SELECT DISTINCT x.YearMonth, x.Score
FROM
( SELECT ROW_NUMBER() OVER (ORDER BY rownum, yearmonth, score) AS num,
Yearmonth, Score
FROM endresult
) X
WHERE x.Score IS NOT NULL
ORDER BY Yearmonth DESC
Section 1 : YEARS
In this section I use a recursive CTE to create a "table" of relevant years.
- The anchor part of the CTE will select the smallest year from the scores table
- The recursive part of the CTE will accumulate the years with 1 UP UNTIL the current year.
Section 2: MONTHS
In this section I again use a recursive CTE to create a "table"of 12 months
Section 3: YEARMONTHS
In this section I create a combined "table" of years and months.
However as the MONTHS table is a comprehensive table of 12 records, I limit the YEARMONTHS "table" to start at the first Year Month combination from the original table scores up until the current Year month combination.
There are a few minor issues such as creating YYYYMM integers , thereby if the month number is less than 10, it must be prefixed with 0. You will notice converts to varchar with case statements and converts back to integers. This is all just to ensure that the yearmonth combinations are created as integers correctly.
Seciton 4: RESULT
In this section I (left) join the YEARMONTH table with the original Scores table so that the actual scores are filled in the months they occurred. A key factor in this table is the creation of a unique rownumber per record. This is vital for the next section.
At this stage I have achieved the first part of the solution ie : Build sequential ranges of datesranges of dates
Section 5: ENDRESULT
And now for the final pieces of the puzzle. The ENDRESULT table is a recursive query
whereby the result of a cross join with each previous record is generated.
The anchor part of the query returns the result table ( so not just 1 record )
the recursive part joins this table to itself based on previous rownumber.
Section 6: OUTPUT
In this section a unique rownumber over each of the records in ENDRESULT is generated in a subquery as when doing a select on ENDRESULT we need those records that do not have a NULL as a score.