Zhiyue Yi

Posted on Jan 16, 2020

Design a Table to Keep Historical Changes in Database

#database

Visit my Blog for the original post: Design a Table to Keep Historical Changes in Database

Back to months ago, I encountered a classical problem: how to design a table so that it can keep historical changes in database?

Imagine you have a transaction table, which has the following fields:

Transaction

Column Name	Data Type
Id	int
Description	text
TransactionDate	date
DeliveryDate	date
Status	varchar(30)

Here comes the business requirements: the users want to keep the historical changes for transaction table for future usage, e.g. data analysis / auditing

How to design such a table which can keep the historical changes?

I have 3 different approaches to solve the problem.

1. Use Effective From and Effective To Fields

I DON'T like this approach but I've seen people practicing it, struggling with it but still insisting to do it. Hence, I would like to demonstrate why it's a bad approach doing so.

The idea is to have EffectiveFrom and EffectiveTo fields inside of the tables, which looks like this:

Column Name	Data Type
Id	int
Description	text
TransactionDate	date
DeliveryDate	date
Status	varchar(30)
EffectiveFrom	date
EffectiveTo	date

The EffectiveFrom and EffectiveTo fields are for validating the effectiveness of the records. For example, today is 2019-09-17, if the record has the EffectiveFrom being 2019-09-01 and EffectiveTo being 2019-09-30, then this record is active. If today's date is out of the range, then the record is inactive / disabled.

In that case, the effectiveness of the record is indirectly determined by these 2 fields. Hence, the following situation may happen

Id	Description	TransactionDate	DeliveryDate	Status	EffectiveFrom	EffectiveTo
100	A short text	2019-09-15	2019-09-28	Shipping	2019-09-15	2019-09-17
101	A short text	2019-09-15	null	Cancelled	2019-09-17	2019-09-30

Once the value is changed, the EffectiveTo field is set to a previous date, Disabled is set to true and a duplicated record is inserted into the table which contains the latest information, and DIFFERENT ID.

Such design actually meet the requirement of retaining historical changes in database, however, it's actually bad. By duplicating the record, the ID is changed from 100 to 101. As a developer, we know that there 2 transactions are the same transaction, it's just one is newer and the other one is older. But now there are 2 IDs to represent 1 record, which becomes a disaster for other tables linking to it.

For example, if you have a TransactionDetails table containing the transaction items which needs Transaction ID to establish the relationship, you would end up into troubles. Let's say I bought an iPad and Apple Pencil in this transaction 100. Now the transaction is updated to 101 but the iPad and Apple Pencil are still linking to 100.

To solve such a issue, you need to do more queries to identify which is the latest active ID and then link to it, which significantly increase the complexity.

And it's really bad.

2. Use a History Table

What if I want to keep a whole record as a history but doesn't want to affect my actual table? Then you may want to create a history table for Transaction, which has the same fields as Transaction table itself.

TransactionHistory

Column Name	Data Type
Id	int
Description	text
TransactionDate	date
DeliveryDate	date
Status	varchar(30)
AddBy	int
AddDate	date

The same example happens here: there is a transaction looks like this:

Id	Description	TransactionDate	DeliveryDate	Status
100	A short text	2019-09-15	2019-09-28	Shipping

Then a user with id 20 modifies the description to A not long text and DeliveryDate to 2019-10-01.

Now, you only have to insert one record, which is a duplicated record of the one in Transaction table, into TransactionHistory table, which looks like this:

Id	Description	TransactionDate	DeliveryDate	Status	AddBy	AddDate
100	A short text	2019-09-15	2019-09-28	Shipping	20	2019-09-17

Lastly, update the original record in Transaction table into

Id	Description	TransactionDate	DeliveryDate	Status
100	A not long text	2019-09-15	2019-10-01	Shipping

In this case, you are trying to save the old record completely into another table and then apply changes in the original table. It's better if the users want to have a full picture of how the record is changing.

However, the down side of this approach is that, redundant information is stored. For example, if you have a large number of fields, but only one or two fields are updated every time, it's actually a huge waste of space.

3. Use an Audit Table

A better solution is to create an audit table to record every single change in every field, which saves the spaces by eliminating redundant information. The table looks like this:

Audit

Column Name	Data Type
Id	int
Table	varchar(50)
Field	varchar(50)
RecordId	int
OldValue	varchar(255)
NewValue	varchar(255)
AddBy	int
AddDate	date

For example, there is a transaction looks like this:

Id	Description	TransactionDate	DeliveryDate	Status
100	A short text	2019-09-15	2019-09-28	Shipping

And now, another user with id 20 modifies the description to A not long text and DeliveryDate to 2019-10-01

Id	Description	TransactionDate	DeliveryDate	Status
100	A not long text	2019-09-15	2019-10-01	Shipping

Hence, there are 3 fields changed, Description, ModBy and ModDate. Respectively, there should be 3 records added into the audit table as shown below.

Id	Table	Field	RecordId	OldValue	NewValue	AddBy	AddDate
1	Transaction	Description	100	A short text	A not long text	20	2019-09-17
2	Transaction	DeliveryDate	100	2019-09-28	2019-10-01	20	2019-09-17

Lastly, update the original record in Transaction table into

Id	Description	TransactionDate	DeliveryDate	Status
100	A not long text	2019-09-15	2019-10-01	Shipping

In that case, it's very easy to query for field changes. For example, if I want to find out all the description changes for the transaction with ID 100, I can just query by

Select * from Audit where RecordId = 100 and Field = "description".

The downside of this approach is the possible huge increase of records. Since every change in different fields is one record in the Audit table, it may grow drastically fast such as tens of changes resulting in hundreds of audit records. In this case, table indexing plays a vital role for enhancing the querying performance.

Conclusion

In this article, I talked about 3 approaches to keep historical changes in database, which are effective from and effective to fields, history table and audit table.

Effective from and effective to fields is NOT recommend to use, and I recommend to use history table or audit table to solve the problem depending on cases.

Here is the comparison.

Audit Table

Record only data in changed fields
Pros:
- Not affecting actual table
- No Redundant information
Cons:
- No. of records may increase significantly
Suitable for: actual table has many fields, but often only a few fields are changed

History Table

Record the entire old record
Pros:
- Simple query to get the complete history
- Not affecting actual table
Cons
- Redundant information is stored
Suitable for:
- A lot of fields are changed in one time
- Generating a change report with full record history is needed

Featured image is credited to Joshua K. Jackson from Pexels

Top comments (12)

TiDu Nguyen • Apr 16 '21

I think History Table is the best and most popular way to do this. Audit table has 2 huge drawbacks:

Unscalable because 1 field = 1 new row. For highly interactive data, this can bring serious performance issues. Although History Table takes up storage space, it will rarely have such a significant impact on query speed. Most systems would trade storage space for query speed. Also, we DON'T usually need to store every fields of the original record in the History Table, store things that matter.
Use a foreign key to link to original table for history is actually a very bad idea because the other fields in the original table are very much likely not the same as when the change was recorded, even deleted like Wilson Liu mentioned. This is crucial for records where the meaning of 1 field is dependent on other fields. In those cases, the repeated data from History table are not "redundant" at all.

I want to add more that for an ordinary system where real-time data is not crucial, table history can be recorded periodically using a cron job (aka. scheduled task), record history per transaction is usually costly without much benefits.

encryptblockr • Dec 27 '21

how do we then insert records into the original table and history table at same time? do we run insert query to insert into the 2 tables at same time? since you said using foreign keys is a bad idea...can you please share details of how to approach this?

Syedalisait • Apr 24 '20

Hi - Very good post on Auditing Mysql.

Question:
Can you actually show an example of how would you write a trigger for 3rd approach which is the audit table. I have implemented this but the trigger query that I use is very costly. So wanted to understand how would you implement column level changes to be recorded in audit table

Zhiyue Yi • May 11 '20

Hi, thanks a lot!

I used C# and entity framework and the audit records are generated at code level. So unfortunately I didn't write a trigger for that.

Nestor Suarez Alfonso • Jul 10

What if your entity is stored serialized in a XML column? What strategy you think is best for that case?

Would be interesting having a History table storing the previous serialized value? What if we use a diff algorithm and store diff changes to reduce redundancy? In our case, when you get the history of an entity, you get all changes (maybe 10-20 changes records)

mpisau • Jan 23

A history table sounds like a simple solution to me. But I still see a few cons:

If I have a lot of tables, I also need a lot of historical counterparts.

And if the database is still in development or fields change, then the fields of the historical table must also be adjusted and scripts must be adapted.

So I now had the following idea for a single historical table:
ID, TableName, RecordID, JSON (with dataset dump from Table).

Would this solution have any disadvantages?

Arend Peter Castelein • Jul 31 '22

Great post this was very helpful!

To clarify, if I want to track the history for N tables, it sounds like the history pattern would require N duplicate tables, but the auditor pattern would only require 1. Is that correct?

Also, what are your thoughts on having a history column? Here you add a "history" column to your table with a JSON blob of that history for that row. We don't expect many changes to the data, so we don't expect the blob size to be a problem

geethaKrishna • Dec 27 '23

When I saw this I thought it will solve the N tables problem. But investigating further this is what i found...

The history column will only point to one history record, but to get a full history of the a single record we will need to recursively call each history record right? That could become very complicated in comparison to having different history tables.

WilsonLiu • Apr 7 '21

How to manage history table if I want to delete a record from the main table?
Thanks

Anand Powar • Jul 1 '21

One option is to use a soft-delete i.e. set IsEnabled = false

encryptblockr • Dec 27 '21

how do we then insert records into the original table and history table at same time? do we run insert query to insert into the 2 tables at same time?

Anand Powar • Dec 27 '21

You can run two distinctive inserts within the same transaction scope.


Begin transaction 
  Insert into original-table
  Get inserted auto-id (optional)
  Insert into history-table
Commit transaction

View full discussion (12 comments)

DEV Community

Design a Table to Keep Historical Changes in Database

1. Use Effective From and Effective To Fields

2. Use a History Table

3. Use an Audit Table

Conclusion

Top comments (12)

Read next

How to use Prisma Postgres 🚀

How to Move Data From MySQL to Redis in Real Time

Accelerating ClippyAI with Embedding LLM and Vector Database

Concurrency and Consistency: Juggling Multiple Users Without Missing a Beat