woensdag 16 september 2009

TSQL Challenge #13

It seems I've gotten totally hooked on doing the SQL Challenges at Beyond Relational . So seeing the closing date of 14th September has passed for Challenge #13, herewith my solution.

The Challenge
"Break a batch of invoices into multiple parts based on rules"

The Context

A module of an application generates invoices for services provided to patients. The information is sent to the insurance company.

Sample Data
InvIdBatchNumberInvoiceNumberVisitDateAmount
1 10000001 20001 2009-01-01 50.00
2 10000001 20001 2009-01-02 50.00
3 10000001 20001 2009-01-03 50.00
4 10000001 20001 2009-01-04 50.00
5 10000001 20002 2009-01-01 50.00
6 10000001 20002 2009-01-02 50.00
7 10000001 20002 2009-01-03 50.00
8 10000001 20003 2009-01-01 50.00
9 10000001 20003 2009-01-02 50.00
10 10000001 20004 2009-01-01 50.00
11 10000001 20004 2009-01-02 50.00
12 10000001 20004 2009-01-03 50.00
13 10000001 20004 2009-01-04 50.00
14 10000001 20004 2009-01-05 50.00
15 10000001 20004 2009-01-06 50.00
16 10000001 20005 2009-01-01 50.00
17 10000001 20005 2009-01-02 50.00
18 10000001 20005 2009-01-03 50.00
19 10000001 20005 2009-01-04 50.00
20 10000001 20005 2009-01-05 50.00
21 10000001 20005 2009-01-06 50.00
22 10000001 20005 2009-01-07 50.00
23 10000001 20005 2009-01-08 50.00
24 10000001 20006 2009-01-01 50.00
25 10000001 20007 2009-01-01 50.00
26 10000001 20007 2009-01-02 50.00
27 10000001 20007 2009-01-03 50.00
28 10000001 20008 2009-01-01 50.00
29 10000001 20008 2009-01-02 50.00
30 10000001 20008 2009-01-03 50.00



The insurance company needs an electronic file in a certain format (the format is irrelevant for us). A separate file needs to be generated for each batch. So based on the sample data we need to generate two files (one for each batch).

Each batch should contain not more than 10 rows. The maximum number of lines we can put in a file is 10. So we need to break each batch into parts that contains not more than 10 rows. When breaking the batches, there is one more rule to follow. While breaking the batches, we should take care that we do not break invoices.

Result should like look:

InvIdBatchNumberInvoiceNumberVisitDateAmountSet
1 10000001 20001 2009-01-01 50.00 1
2 10000001 20001 2009-01-02 50.00 1
3 10000001 20001 2009-01-03 50.00 1
4 10000001 20001 2009-01-04 50.00 1
5 10000001 20002 2009-01-01 50.00 1
6 10000001 20002 2009-01-02 50.00 1
7 10000001 20002 2009-01-03 50.00 1
8 10000001 20003 2009-01-01 50.00 1
9 10000001 20003 2009-01-02 50.00 1
10 10000001 20004 2009-01-01 50.00 2
11 10000001 20004 2009-01-02 50.00 2
12 10000001 20004 2009-01-03 50.00 2
13 10000001 20004 2009-01-04 50.00 2
14 10000001 20004 2009-01-05 50.00 2
15 10000001 20004 2009-01-06 50.00 2
16 10000001 20005 2009-01-01 50.00 3
17 10000001 20005 2009-01-02 50.00 3
18 10000001 20005 2009-01-03 50.00 3
19 10000001 20005 2009-01-04 50.00 3
20 10000001 20005 2009-01-05 50.00 3
21 10000001 20005 2009-01-06 50.00 3
22 10000001 20005 2009-01-07 50.00 3
23 10000001 20005 2009-01-08 50.00 3
24 10000001 20006 2009-01-01 50.00 3
25 10000001 20007 2009-01-01 50.00 1
26 10000001 20007 2009-01-02 50.00 1
27 10000001 20007 2009-01-03 50.00 1
28 10000001 20008 2009-01-01 50.00 1
29 10000001 20008 2009-01-02 50.00 1
30 10000001 20008 2009-01-03 50.00 1

My Solution 

Personally I found this challenge quite easy and very fun to do. 

This query depends on first determining where the boundaries are. A given requirement is that Invoice numbers may not be broken.  This is in fact not a factor that makes the query more complex in my eyes, but makes it easier. The first step is thus to determine how many rows (count) there are within a batchnumber, invoice combination. This will form the basis for the boundaries.

With tblA ( id, BatchNumber, InvoiceNumber, RWN ) AS

(
  -- Step 1. Get the Count of Invoices Per BatchNumber/InvoiceNumber combination and in the
  -- output  generate a sequential id using the rownumber function
  SELECT ROW_NUMBER() OVER( ORDER BY BatchNumber, InvoiceNumber ), Batchnumber, 
            InvoiceNumber, COUNT(InvId)
  FROM @t
  GROUP BY Batchnumber, InvoiceNumber
) ,
rt ( id, BatchNumber, InvoiceNumber, runtot , cappedtotal, groupnr ) AS

(
  -- Step 2. Using recursive function, calculate a running total of the count of rows till the max
      allowed of 10 rows.
  -- Using this "running capped" total, which forms the final boundary , create a "running"
  -- group number.
  SELECT id, BatchNumber, InvoiceNumber, RWN, RWN , 1
  FROM tblA
  WHERE id = 1
  UNION ALL
  SELECT t.id, t.BatchNumber, t.InvoiceNumber, t.rwn + rt.runtot,
   -- check for change in batchnumber as per rule = new batch
           CASE WHEN t.BatchNumber = rt.BatchNumber
                THEN
                     -- now checking capped totals.
                     CASE WHEN t.rwn + rt.cappedtotal < 10
                              -- accumulate as capped total is less than 10 rows
                            THEN t.RWN + rt.cappedtotal
                             -- initialise for new accumulation
                            ELSE t.RWN
                      END
                 ELSE t.RWN
             END,
             -- check for change in batchnumber as per rule = new batch
            CASE WHEN t.BatchNumber = rt.BatchNumber
                  THEN
                        -- check if wihin capped totals
                        CASE WHEN t.rwn + rt.cappedtotal < 10
                                -- keep group number
                                THEN rt.groupnr
                                -- accumulate group nr.
                                 ELSE rt.groupnr + 1
                         END
                   ELSE 1 -- reset running group total as change of batchnumber has occurred.
             END
   FROM tblA t
   JOIN rt ON ( t.id = rt.id + 1 )
)
-- Combine with the original table to produce the result.
SELECT t.InvId, t.BatchNumber, t.InvoiceNumber, t.VisitDate, t.Amount, rt.groupnr
FROM @t t
JOIN rt    rt ON rt.BatchNumber = t.BatchNumber
                AND rt.InvoiceNumber = t.InvoiceNumber



And that was it, they key being initial boundaries set by Count of Rows within Invoices. Using these boundaries, generate running capped totals within a recursive CTE.








Geen opmerkingen:

Een reactie posten