Thursday, April 14, 2016

Resequence Records in a Table Using SQL

Every now and then I hear a question like "How can I reset the sequence numbers in my file using SQL?" If you are using DB2 for i, there is a simple solution involving OLAP specifications. These are expressions you can use to query things like rank, dense rank, and row number. It is ROW_NUMBER that we are specifically interested in here. Lets set up a sample table:

  create table resequence_sample (
    id             long int    primary key,
    order_number   dec(7,0)    not null,
    detail_line    dec(5,0)    not null,
    seq_number     dec(5,0)    not null,
    ...
  )

So you can see this table has a primary key named id an order number, a detail line number, and a sequence number for sorting the records, among other fields not named. Periodically this sort sequence gets gaps due to deletions, or other operations on the file, and we want to close those gaps up. If I were using RPG and record level access methods, I could simply read the records in order, and reassign seq_number from a counter starting at 1 and continuing until the rows were read. I might also want to reset back to 1 for each new detail_line and order_number. It would look something like this:

  counter = 0;
  read record;
  dow not %eof();
    counter += 1;
    if order_number <> sv_order_number or
       detail_line <> sv_detail_line;
      counter = 1;
      sv_order_number = order_number;
      sv_detail_line = detail_line;
    endif;
    seq_number = counter;
    update record;
    read record;
  enddo;

This is pretty simple and performance is lightening fast. But what if you want to do this with SQL? Well as I mentioned earlier you can use an OLAP feature of the database called ROW_NUMBER to do the same thing. It would look like this:

  update resequence_sample s
    set s.seq_number =
      ( select x.row
        from
          ( select row_number()
                     over(partition by order_number, detail_line
                          order by seq_number)
                     as row,
                   a.id
            from resequence_sample a
          ) x
        where x.id = s.id
      )

The magic is in the inner subselect which returns a list of record ids with it's row_number with breaks (partition by) on order_number and detail_line, sorted by (order by) seq_number. Notice the syntax of ROW_NUMBER expression. It is row_number() over(...). The clauses that you can put in the parenthesis of over() determine how the row numbers are assigned. Partition by defines how to group the selected records. Each group get's it's own sequence of numbers starting with 1. You could for example assign a new group of sequence numbers to each order, or as we have done above, to each order detail line. Or if you leave out the partition by clause entirely, you get a single sequence of numbers starting with 1 and incrementing by 1 to the last row selected. Order by allows you to select the sequence of records within each group. This is for numbering purposes only, and is independent of the statement order by. So you could select employee records ordering by name, but assigning row numbers by salary. This makes more semantic sense with the RANK and DENSE_RANK, but it works the same way with ROW_NUMBER.

One last point. You may be wondering why we didn't just put that where x.id = s.id on the inner subselect, and remove the outer subselect all together. The reason is in the way row numbers are assigned. That is by group of selected records starting with 1. If we moved the were clause to the inner subselect, only a single record would be selected there causing the row_number to always be 1. Pulling that where clause out one level allows us to assign row numbers to a group of rows, and then select the row we want the sequence number for. DB2's optimizer is smart enough that it doesn't perform that query more times than it needs to.

This example only shows the business end of the code for RPG. There are various declarations required to support the file and control variables to make it work that I have chosen to leave out. However the SQL example will run as is once you create the table.