This post was written by my co-founder Matheus
An important part of my daily work is to help developers, database administrators, devops, sysadmins and project managers to identify (using Nazar.ai) and fix bad SQL code. And the idea with the [SQL Performance Killers] series is to share some practical sql tuning examples.
Continuing (after a very long time) the [SQL performance killers series] in this post I’ll explain why bulk insert operations are generally faster than many individual insert operations.
Bulk inserts are significantly faster than individual inserts when working with a database for several reasons:
Reduced network traffic: Bulk inserts reduce the amount of data transferred between the application and the database server. In many cases, network latency can be a bottleneck for database performance. By sending a single batch of data, you can reduce the impact of network latency and improve efficiency.
Reduced transaction overhead: Each individual insert operation is typically wrapped in a transaction, which can lead to increased overhead due to transaction management. Bulk inserts can be enclosed in a single transaction, reducing the overhead associated with transaction management and ensuring data consistency.
Locking and concurrency: When you perform many individual inserts, each insert may require locks on the affected rows, leading to potential contention and concurrency issues in a multi-user environment. Bulk inserts often lock the entire table or a specific set of rows, reducing contention and improving concurrency.
Logging and indexing: Databases often maintain transaction logs and indexes to ensure data consistency and query performance. Bulk inserts are more efficient in terms of logging and indexing because they involve fewer transactions and updates to indexes.
In the example below I inserted 40,000 rows in a sample table, first using individual inserts and then using bulk insert.
INSERT INTO TB_INSERT(NAME) VALUES ('AF9CB08DF4F7B71F033CC857ECF30C21');
INSERT INTO TB_INSERT(NAME) VALUES ('B16D3C99C04F223E362BC0E1B4FFE7CD');
...
INSERT INTO TB_INSERT(NAME) VALUES ('BEDB35BC448FD5F32F37B86BECFDF225');
INSERT INTO TB_INSERT(NAME) VALUES ('4436A24C954EA17AEE9E92D4F16FAD20');
UPDATED ROWS 40000
START TIME MON NOV 06 14:52:02 PST 2023
FINISH TIME MON NOV 06 14:52:22 PST 2023
INSERT INTO TB_INSERT(NAME) VALUES
('AF9CB08DF4F7B71F033CC857ECF30C21'),
('B16D3C99C04F223E362BC0E1B4FFE7CD'),
...
('BEDB35BC448FD5F32F37B86BECFDF225'),
('4436A24C954EA17AEE9E92D4F16FAD20');
UPDATED ROWS 40000
START TIME MON NOV 06 14:56:32 PST 2023
FINISH TIME MON NOV 06 14:56:34 PST 2023
As we can see, inserting 40,000 rows into a sample table using bulk load was 10 times faster than using individual inserts. While individual inserts took 20 seconds to complete, bulk inserts were done in only 2 seconds.
However, it's essential to note that the suitability of bulk inserts depends on the database management system (DBMS) you are using and the specific use case. Always consider the characteristics of your database and the requirements of your application when deciding whether to use bulk inserts or individual inserts.
"It is not lack of hardware,
It is not network traffic,
It is not slow front ends,
the main performance problem in the huge majority of database applications is bad SQL code."
Joe Celko
Top comments (0)