maandag 7 september 2009

TSQL Challenge #12

I was surfing the net the other day trying to find a solution for a TSQL problem I was working on and came across this site Beyond Relational.

As I love challenges, I proceeded to try the TSQL challenge ( Nr 12 in August 2009 ).


The Challenge
"Build sequential ranges of dates with propogation to missing values".

 
The Context:

You are working for an online gaming company and as usual when we talk about games we need to manage scores. Some features in your system had recently changed. Before when a user get connected you only shown him its last score, but now you have to present him a graph month by month of its best score in each one since its first game until the current date.

The Scores:


YearMonthScore
-----------------------------
200903100
20080395
20080299
200801100
200711100

When a user connects for the first time after the deployment of the new system you will need to produce a table based on the original scores table with the following conditions:

  • Create a new couple year/month for each missing month between two valid months of the original table original table
  • For each new couple created, you should recopy the score of the last month he played.
  • Continue the list until the current month (included).

Here is the resulting table you need to produce

YearMonthScore
-----------------------------
200908100
200907100
200906100
200905100
200904100
200903100
20090295
20090195
20081295
20081195
20081095
20080995
20080895
20080795
20080695
20080595
20080495
20080395
20080299
200801100
200712100
200711100

My Solution

Well looking at the above I figured the first thing i needed to do was to produce a working set which had all the missing year months, thereby satisfying the first condition of the challenge ie Build sequential ranges of datesranges of dates

My second problem would be to figure out how to ensure that the values of these missing year/month values were propogated with the scores from the "last" month played prior to the inserted missing year/month value.

The approach I used involved the use of CTE (Common Table Expressions) .

I am going to post my whole solution first and then explain the individual parts.

-- Base Data
Declare @Scores TABLE

(
     YearMonth        INT,
     Score              INT,
)
INSERT @Scores VALUES ( 200903, 100 )
INSERT @Scores VALUES ( 200803, 95 )
INSERT @Scores VALUES ( 200802, 99 )
INSERT @Scores VALUES ( 200801, 100 )
INSERT @Scores VALUES ( 200711, 100 )
--------------------------------------------------------------------------------
WITH 
-- relevant years

YEARS  (someyear) AS

(
 
    SELECT (  SELECT MIN ( YEAR ( CONVERT(datetime, Convert(varchar(6),Yearmonth) + '01' ) ) 0 )

                      FROM @Scores  )  as someyear 
    UNION ALL
 
    SELECT someyear + 1

    FROM YEARS 
    WHERE someyear < YEAR( getdate() )

),
  -- all months
MONTHS (somemonth) AS
(
  SELECT 1 AS somemonth 
  UNION ALL 

  SELECT somemonth + 1
  FROM MONTHS 
  WHERE somemonth < 12

),  -- COMBINE THE years and months to get an inflated range OF Year Month 
combo's.
YEARMONTHS ( YearMonth) AS
(
   SELECT DISTINCT c.ym
   FROM
   (
      SELECT CONVERT( int, CONVERT(varchar(4), a.someyear) +
                               CASE WHEN b.somemonth < 10
                                   THEN '0' + CONVERT(varchar(2), b.somemonth)
                                   ELSE CONVERT(varchar(2), b.somemonth)
                               END ) as ym
       FROM YEARS a cross join MONTHS b
    ) c
   WHERE c.ym >= ( SELECT MIN(Yearmonth) FROM @Scores )
   AND     c.ym <= ( SELECT CONVERT(int, CONVERT(varchar(6), YEAR( getdate() ) ) +
                                       CASE
                                            WHEN MONTH( getdate() ) < 10
                                            THEN '0' + CONVERT( varchar(2), MONTH ( getdate() ) )
                                        END )
                           )
), -- join to base data to fill in the existing known scores
result ( rownum, YearMonth, Score ) AS
(
   SELECT ROW_NUMBER() OVER (ORDER BY a.YearMonth) as rownum,
              a.YearMonth, b.Score
   FROM YEARMONTHS  a
   LEFT JOIN @Scores   b ON b.YearMonth = a.YearMonth
) -- inflate with previous scores
endresult ( rownum, YearMonth, Score ) AS
(
   SELECT rownum, YearMonth, Score
   FROM result
   UNION ALL
   SELECT a.rownum, a.Yearmonth, b.Score
   FROM result a
   JOIN endresult b ON b.rownum = a.rownum-1
   WHERE a.Score IS NULL
) -- get end result
SELECT DISTINCT x.YearMonth, x.Score
FROM
(   SELECT ROW_NUMBER() OVER (ORDER BY rownum, yearmonth, score) AS num,
               Yearmonth, Score
     FROM endresult
) X
WHERE x.Score IS NOT NULL
ORDER BY Yearmonth DESC

Section 1 :  YEARS
In this section I use a recursive CTE  to create a "table" of relevant years.
  • The anchor part of the CTE will select the smallest year from the scores table
  • The recursive part of the CTE will accumulate the years with 1 UP UNTIL the current year.
Section 2:  MONTHS
In this section I again use a recursive CTE to create a "table"of 12 months

Section 3: YEARMONTHS
In this section I create a combined "table" of years and months.

However as the MONTHS table is a comprehensive table of 12 records, I limit the YEARMONTHS "table" to start at the first Year Month combination from the original table scores up until the current Year month combination.

There are a few minor issues such as creating YYYYMM integers , thereby if the month number is less than 10, it must be prefixed with 0. You will notice converts to varchar with case statements and converts back to integers. This is all just to ensure that the yearmonth combinations are created as integers correctly.

Seciton 4:  RESULT
In this section I (left) join the YEARMONTH table with the original Scores table so that the actual scores are filled in the months they occurred.  A key factor in this table is the creation of a unique rownumber per record. This is vital for the next section.

At this stage I have achieved the first part of the solution ie : Build sequential ranges of datesranges of dates

Section 5: ENDRESULT
And now for the final pieces of the puzzle.  The ENDRESULT table is a recursive query
whereby the result of a cross join with each previous record is generated.

The anchor part of the query returns the result table ( so not just 1 record )
the recursive part joins this table to itself based on previous rownumber.

Section 6: OUTPUT
In this section a unique rownumber over each of the records in ENDRESULT is generated in a subquery as when doing a select on ENDRESULT we need those records that do not have a NULL as a score. 







 

Geen opmerkingen:

Een reactie posten