2015-11-08

How to display user-friendly entity history

Most systems need data audit/versioning. This usually means that for every audited table FOO there is table FOO_AUD, which contains the current and all previous values of records from FOO. FOO_AUD can be populated using Hibernate Envers or simply by triggers on table FOO.

The question is, how to present historical/audit data to the user? Suppose we have a bug tracking system with tables ISSUE and ISSUE_AUD. Obviously we could display the results of the following query:
select * from issue_aud where issue_id = 10;

 rev  | revtype | issue_id |    issue_title    | issue_status | issue_assignee
------+---------+----------+-------------------+--------------+----------------
 1000 |       0 |       10 | Bar always fails! | Open         |
 1011 |       1 |       10 | Bar always fails! | In Progress  | Developer A
 1037 |       1 |       10 | Bar always fails! | Resolved     | Reporter B
 1057 |       1 |       10 | Bar always fails! | Closed       |
but that's not readable and user-friendly. Ideally, we would like to present data in the following format:
 rev  |    property    | prev_value  |    curr_value
------+----------------+-------------+-------------------
 1000 | issue_status   |             | Open
 1000 | issue_title    |             | Bar always fails!
 .... | ...........    | ..........  | .................
In this post I would like to show you how this can be done using SQL query only.

Note: the structure of table ISSUE_AUD is based on Hibernate Envers, but there is nothing Envers-specific in my solution. You should be able to easily modify it to your needs if your audit table is different.

1. Unpivot the data

The rows of table ISSUE_AUD contain multiple properties of an issue in each row, while the result contains only one property per row. This means that we need a query, which transforms multiple columns of one input row into multiple output rows.

This can be achieved using unnest (in PostgreSQL) or unpivot (in Oracle). I'll use Postgres syntax here and show the query for Oracle at the end of the post:
select rev, 
    unnest(array['issue_title', 'issue_status', 'issue_assignee']) property,
    unnest(array[issue_title, issue_status, issue_assignee]) value_
from issue_aud 
where issue_id = 10
order by rev;

 rev  |    property    |      value_
------+----------------+-------------------
 1000 | issue_title    | Bar always fails!
 1000 | issue_status   | Open
 1000 | issue_assignee |
 1011 | issue_title    | Bar always fails!
 1011 | issue_status   | In Progress
 1011 | issue_assignee | Developer A
 1037 | issue_title    | Bar always fails!
 1037 | issue_status   | Resolved
 1037 | issue_assignee | Reporter B
 1057 | issue_title    | Bar always fails!
 1057 | issue_status   | Closed
 1057 | issue_assignee |

2. Current and previuos values

Every row in the previous result set contains only the current value of property. In order to show what was modified we need current and previous values. This can be achieved using window function lag:
select rev, property, 
    lag(value_) over (partition by property order by rev) prev_value, 
    value_ curr_value
from (
    select rev, 
        unnest(array['issue_title', 'issue_status', 'issue_assignee']) property,
        unnest(array[issue_title, issue_status, issue_assignee]) value_ 
    from issue_aud 
    where issue_id = 10
) X;

 rev  |    property    |    prev_value     |    curr_value
------+----------------+-------------------+-------------------
 1000 | issue_assignee |                   |
 1011 | issue_assignee |                   | Developer A
 1037 | issue_assignee | Developer A       | Reporter B
 1057 | issue_assignee | Reporter B        |
 1000 | issue_status   |                   | Open
 1011 | issue_status   | Open              | In Progress
 1037 | issue_status   | In Progress       | Resolved
 1057 | issue_status   | Resolved          | Closed
 1000 | issue_title    |                   | Bar always fails!
 1011 | issue_title    | Bar always fails! | Bar always fails!
 1037 | issue_title    | Bar always fails! | Bar always fails!
 1057 | issue_title    | Bar always fails! | Bar always fails!

3. Final touch

The final step is to filter rows where data was actually modified and sort them for better readability:
select rev, property, prev_value, curr_value from (
    select rev, property, 
        lag(value_) over (partition by property order by rev) prev_value, 
        value_ curr_value 
    from (
        select rev, 
            unnest(array['issue_title', 'issue_status', 'issue_assignee']) property,
            unnest(array[issue_title, issue_status, issue_assignee]) value_ 
        from issue_aud 
        where issue_id = 10
    ) X
) Y
where (prev_value is null and curr_value is not null
    or prev_value is not null and curr_value is null
    or prev_value != curr_value)
order by rev, property;

 rev  |    property    | prev_value  |    curr_value
------+----------------+-------------+-------------------
 1000 | issue_status   |             | Open
 1000 | issue_title    |             | Bar always fails!
 1011 | issue_assignee |             | Developer A
 1011 | issue_status   | Open        | In Progress
 1037 | issue_assignee | Developer A | Reporter B
 1037 | issue_status   | In Progress | Resolved
 1057 | issue_assignee | Reporter B  |
 1057 | issue_status   | Resolved    | Closed
and we're done!

Oracle version

select rev, property, prev_value, curr_value 
from (
    select rev, property, 
        lag(value_) over (partition by property order by rev) prev_value, 
        value_ curr_value
    from (
        select rev, property, value_ 
        from (
            select *
            from issue_aud 
            where issue_id = 10
        )
        unpivot include nulls (value_ for property in (issue_title, issue_status, issue_assignee))
    )
)
where (prev_value is null and curr_value is not null
    or prev_value is not null and curr_value is null
    or prev_value != curr_value)
order by rev, property;

Links

Checkout working sqlfiddle examples for PostgreSQL and Oracle.