Storing History and Audit Trail in Relational Databases
Category: Architecture Posted by: at 8 March 2010
It is a common requirement for enterprise software solutions to store data changes history and audit information. It is a huge topic so this post will concentrate only on the concepts without discussing implementation details.
From architectural point of view there are several ways to approach such a requirement and before making the decision on the specific approach it is good to analyze the requirements and constraints:
- Is it planned to actively use the data history as part of the system functionality? Or in other words – is it needed to show history data to the users, or it will be stored only to perform audits and recover data in case of user mistakes?
- What is the expected volume and usage of the data? Is it frequently displayed and is it expected to have a large number of records?
- Are there any plans to regularly clean the history?
Depending on the answers there are usually two main routes – storing the history in the same table (live history) or using separated table (shadow history).
With this approach the history is stored in the same table as the live data.
When performing a data modification the original record is not updated, but instead a new record is created in the same table containing the updated record. The new (most recent) record becames the live record, while all previous records form the history.
Deleted records are usually managed the same way – a new record is inserted with a flag identifying the record as deleted.
In order to provide audit trail such tables often have fields containing the last modification date and the user who has made the changes.
- Functional history – the history records can be easily used by the system functionality. For example allowing users to view and comment different versions (revisions) of the same marketing material or article.
- History data can be manipulated using the same API used for the live records. Usually the API methods have additional parameters like ModificationDate or LiveRecordsOnly for data retrieval and do not require specific handling for update or delete, as the history records are normal records in the live table.
- Natural history for the related records – each history or live record in the primary table might have its own set of records in the related tables.
- Complicated history cleanup – as the history records might be reffered by some live records, it is not so simple to remove the history. Cleanup by date (records older than a week or month) is also much more complicated due to the mixture of live and history records.
- Relativley bad performance – the live and history records stored in the same place form a huge table with many records and no optimization strategy (history/live flags and indexes) can fully mitigate this.
- Use this approach in case of a low data volume and when the history records are expected to be actively used by the users.
- Optimize the live records retrieval by adding flags for “live record” or “history record”
- Avoid using linked list approach to identify a group of records (each record having a FK to the previous record), instead provide a field to easily retrive the whole list of records with a simple query – for example keep the initial record id in a field like “base id” in all consequent records, this way the whole group of records can be easily extracted by this id.
In this case the history records are stored in a separate table. Usually the history table mirrors the structure of the live records table and this is why the approach is referred as a shadow history.
The history table might contain only a set of the fields in the live table, or in some cases there might be a separated history table for each field. For audit trail purposes the history table might have modification date and user information.
When a record is updated in the live table, a new record containing the same data is stored in the history table. To keep the audit trail for deleted records it might be decided to add a history record with status of “deleted”.
- Good performance – the live table has small (only the required) number of records.
- Allows partial history – the shadow table could have only a subset of the fields, not the entire structure of the live table.
- Easy maintenance – the history for the last month/week/day can be removed simply deleting by modification date older than specified date. The history can be wiped with fast table truncation.
- Not suitable for regular usage of history records – linking records from live tables to records in the history tables is not a good practice and harms the maintainability.
- Separated API has to be built in order to support the live and history records.
- Keep the shadow history table as small as possible (go for partial history whenever possible).
- Duplicate the live (most recent) record in the history. This way the whole history including the most recent record could be extracted with a single call.
- Use consistent table naming conventions to make it easier for database administrators to identify and clean the history tables or provide stored procedure(s) for history cleanup.