Design a Table to Keep Historical Changes in Database
September 17, 2019
Back to months ago, I encountered a classical problem: how to design a table so that it can keep historical changes in database?
Imagine you have a transaction table, which has the following fields:
Transaction
Column Name | Data Type |
---|---|
Id | int |
Description | text |
TransactionDate | date |
DeliveryDate | date |
Status | varchar(30) |
Here comes the business requirements: the users want to keep the historical changes for transaction table for future usage, e.g. data analysis / auditing
How to design such a table which can keep the historical changes?
I have 3 different approaches to solve the problem.
1. Use Effective From and Effective To Fields
I DON’T like this approach but I’ve seen people practicing it, struggling with it but still insisting to do it. Hence, I would like to demonstrate why it’s a bad approach doing so.
The idea is to have EffectiveFrom
and EffectiveTo
fields inside of the tables, which looks like this:
Column Name | Data Type |
---|---|
Id | int |
Description | text |
TransactionDate | date |
DeliveryDate | date |
Status | varchar(30) |
EffectiveFrom | date |
EffectiveTo | date |
The EffectiveFrom
and EffectiveTo
fields are for validating the effectiveness of the records. For example, today is 2019-09-17
, if the record has the EffectiveFrom
being 2019-09-01
and EffectiveTo
being 2019-09-30
, then this record is active. If today’s date is out of the range, then the record is inactive / disabled.
In that case, the effectiveness of the record is indirectly determined by these 2 fields. Hence, the following situation may happen
Id | Description | TransactionDate | DeliveryDate | Status | EffectiveFrom | EffectiveTo |
---|---|---|---|---|---|---|
100 | A short text | 2019-09-15 | 2019-09-28 | Shipping | 2019-09-15 | 2019-09-17 |
101 | A short text | 2019-09-15 | null | Cancelled | 2019-09-17 | 2019-09-30 |
Once the value is changed, the EffectiveTo
field is set to a previous date, Disabled
is set to true
and a duplicated record is inserted into the table which contains the latest information, and DIFFERENT ID.
Such design actually meet the requirement of retaining historical changes in database, however, it’s actually bad. By duplicating the record, the ID is changed from 100
to 101
. As a developer, we know that there 2 transactions are the same transaction, it’s just one is newer and the other one is older. But now there are 2 IDs to represent 1 record, which becomes a disaster for other tables linking to it.
For example, if you have a TransactionDetails
table containing the transaction items which needs Transaction ID to establish the relationship, you would end up into troubles. Let’s say I bought an iPad and Apple Pencil in this transaction 100. Now the transaction is updated to 101 but the iPad and Apple Pencil are still linking to 100.
To solve such a issue, you need to do more queries to identify which is the latest active ID and then link to it, which significantly increase the complexity.
And it’s really bad.
2. Use a History Table
What if I want to keep a whole record as a history but doesn’t want to affect my actual table? Then you may want to create a history table for Transaction
, which has the same fields as Transaction
table itself.
TransactionHistory
Column Name | Data Type |
---|---|
Id | int |
Description | text |
TransactionDate | date |
DeliveryDate | date |
Status | varchar(30) |
AddBy | int |
AddDate | date |
The same example happens here: there is a transaction looks like this:
Id | Description | TransactionDate | DeliveryDate | Status |
---|---|---|---|---|
100 | A short text | 2019-09-15 | 2019-09-28 | Shipping |
Then a user with id 20
modifies the description to A not long text
and DeliveryDate
to 2019-10-01
.
Now, you only have to insert one record, which is a duplicated record of the one in Transaction
table, into TransactionHistory
table, which looks like this:
Id | Description | TransactionDate | DeliveryDate | Status | AddBy | AddDate |
---|---|---|---|---|---|---|
100 | A short text | 2019-09-15 | 2019-09-28 | Shipping | 20 | 2019-09-17 |
Lastly, update the original record in Transaction
table into
Id | Description | TransactionDate | DeliveryDate | Status |
---|---|---|---|---|
100 | A not long text | 2019-09-15 | 2019-10-01 | Shipping |
In this case, you are trying to save the old record completely into another table and then apply changes in the original table. It’s better if the users want to have a full picture of how the record is changing.
However, the down side of this approach is that, redundant information is stored. For example, if you have a large number of fields, but only one or two fields are updated every time, it’s actually a huge waste of space.
3. Use an Audit Table
A better solution is to create an audit table to record every single change in every field, which saves the spaces by eliminating redundant information. The table looks like this:
Audit
Column Name | Data Type |
---|---|
Id | int |
Table | varchar(50) |
Field | varchar(50) |
RecordId | int |
OldValue | varchar(255) |
NewValue | varchar(255) |
AddBy | int |
AddDate | date |
For example, there is a transaction looks like this:
Id | Description | TransactionDate | DeliveryDate | Status |
---|---|---|---|---|
100 | A short text | 2019-09-15 | 2019-09-28 | Shipping |
And now, another user with id 20
modifies the description to A not long text
and DeliveryDate
to 2019-10-01
Id | Description | TransactionDate | DeliveryDate | Status |
---|---|---|---|---|
100 | A not long text | 2019-09-15 | 2019-10-01 | Shipping |
Hence, there are 3 fields changed, Description
, ModBy
and ModDate
. Respectively, there should be 3 records added into the audit table as shown below.
Id | Table | Field | RecordId | OldValue | NewValue | AddBy | AddDate |
---|---|---|---|---|---|---|---|
1 | Transaction | Description | 100 | A short text | A not long text | 20 | 2019-09-17 |
2 | Transaction | DeliveryDate | 100 | 2019-09-28 | 2019-10-01 | 20 | 2019-09-17 |
Lastly, update the original record in Transaction
table into
Id | Description | TransactionDate | DeliveryDate | Status |
---|---|---|---|---|
100 | A not long text | 2019-09-15 | 2019-10-01 | Shipping |
In that case, it’s very easy to query for field changes. For example, if I want to find out all the description changes for the transaction with ID 100, I can just query by
Select * from Audit where RecordId = 100 and Field = "description"
.
The downside of this approach is the possible huge increase of records. Since every change in different fields is one record in the Audit table, it may grow drastically fast such as tens of changes resulting in hundreds of audit records. In this case, table indexing plays a vital role for enhancing the querying performance.
Conclusion
In this article, I talked about 3 approaches to keep historical changes in database, which are effective from and effective to fields, history table and audit table.
Effective from and effective to fields is NOT recommend to use, and I recommend to use history table or audit table to solve the problem depending on cases.
Here is the comparison.
Audit Table
- Record only data in changed fields
-
Pros:
- Not affecting actual table
- No Redundant information
-
Cons:
- No. of records may increase significantly
- Suitable for: actual table has many fields, but often only a few fields are changed
History Table
- Record the entire old record
-
Pros:
- Simple query to get the complete history
- Not affecting actual table
-
Cons
- Redundant information is stored
-
Suitable for:
- A lot of fields are changed in one time
- Generating a change report with full record history is needed
Featured image is credited to Joshua K. Jackson from Pexels