Introduction
The batch operations are the process of making the multiple single-operations be executed against the database in one-go. The executions are ACID; an implicit transaction is provided if not present.
Benefits
Below are the benefits you will gain after this tutorial.
- Understanding the batch operations.
- How the batch operations are differed from the bulk operations.
- Clean and simple code implementations.
- Separation of concerns with much more SOLID written code.
- Huge performance improvements with (or over) different Use Cases and Scenarios.
ADO.NET
In ADO.NET, we usually do the following when doing a batch operation.
- Create a connection and command object.
- Set the command text.
- Create a transaction object.
- Loop the entities and pass the parameters into the command object.
- Execute the command on every item in the iteration.
- Commit the transaction.
To further elaborate on the code level, see below.
Let us say we have a Customer
class like below and an identity-based table from the database named [dbo].[Customer]
.
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public string SSN { get; set; }
public string Address { get; set; }
public DateTime CreatedUtc { get; set; }
public DateTime ModifiedUtc { get; set; }
}
Then, to achieve the batch executions, we usually write the code below.
using (var connection = new SqlConnection(ConnectionString))
{
connection.Open();
using (var transaction = connection.BeginTransaction())
{
using (var command = connection.CreateCommand())
{
command.CommandText = "INSERT INTO [dbo].[Customer] (Name, SSN, Address, CreatedUtc, ModifiedUtc) VALUES (@Name, @SSN, @Address, @CreatedUtc, @ModifiedUtc);";
command.Transaction = transaction;
foreach (var customer in customers)
{
command.Parameters.Clear();
command.Parameters.AddWithValue("@Name", customer.Name);
command.Parameters.AddWithValue("@SSN", customer.SSN);
command.Parameters.AddWithValue("@Address", customer.Address);
command.Parameters.AddWithValue("@CreatedUtc", customer.CreatedUtc);
command.Parameters.AddWithValue("@ModifiedUtc", customer.ModifiedUtc);
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
}
The code above is a bit optimized when it comes to execution. During the development, most developers are creating a single DbConnection
in each iteration without being aware of it. If they does, most developers are failed to realize the re-usability of the single DbCommand
object.
You usually do the same thing when writing the codes for Update
and Merge
operations, only differs from the command text.
ADO.NET Drawback
The drawback to this is that, each item is being iterated and executed against the database in an atomic way. The calls made via ExecuteNonQuery()
method is actually transmitting a data from the client application into the database server (or cloud) over the wire multiple times.
In a normal scenario, this is not a problem at all. But it is a big problem in a scenario of like processing thousand of items a very slow network connectivity (or transmitting the data from on-premise to the cloud). In short, this is not the most optimal way.
Solution
We have written RepoDb in such a way to cater the scenario mentioned above. It also brings simplicity during the development, resulting to a more productive software maintainability..
There is no much ORM that caters the scenario of the batch operations, even for the simple CRUD operations (i.e.: Insert, Update, Merge, Delete).
In RepoDb, the calls to the methods are very direct and simple. The way you invoke it is just by simply calling the extended methods of the DbConnection
object. On top of this, the methods are ACID on its own way.
Simply install the library and open a connection object, then you can do all the things you can do with your database.
To install, simply type the command below in you Package Manager Console.
Install-Package RepoDb
Let us say, we would like to insert certain numbers of Customer
records in the database. Then, you can call the method below.
var customers = CreateCustomers(1000);
using (var connection = new SqlConnection(ConnectionString))
{
connection.InsertAll<Customer>(customers);
}
If the target table has an identity column, the newly generated identity value will be mapped back to the data models accordingly.
By default, the batch is 10, but this can be overridden by simply specifying the value of the batchSize
argument. See the code below.
using (var connection = new SqlConnection(ConnectionString))
{
connection.InsertAll<Customer>(customers, batchSize: 100);
}
The calls to other batch operations (i.e.: Update, Merge and Delete) are also identical to Insert. See the code snippets below.
// Update
using (var connection = new SqlConnection(ConnectionString))
{
var rowsAffected = connection.UpdateAll<Customer>(customers);
}
// Merge
using (var connection = new SqlConnection(ConnectionString))
{
var rowsAffected = connection.MergeAll<Customer>(customers);
}
// Delete
using (var connection = new SqlConnection(ConnectionString))
{
var rowsAffected = connection.DeleteAll<Customer>(customers);
}
The operations are controllable in such a way you (as a developer) can optimize the number of rows to be batched, specifically for the following situations.
- No. of Columns
- Network Latency
- Type of Data (Blobs, etc)
- Database Network (Cloud, On-Premise)
- Etc
Behind the scene
To further understand what happens behind the scene, below are some of the explanations.
- Pre-compiling the transformation ahead-of-time (AOT) via Expression.
- Creating an implicit transaction if not passed by the caller.
- Composing a packed-statements based on the
batchSize
. - Mapping the entities into the given parameters of the packed-statements.
- Executing the command in one-go.
- Committing the created implicit transaction.
Understanding the SQL Statement Composition
We also would like you to understand how the SQL statements are composed and the parameters are being passed.
When you call the operations, like below.
using (var connection = new SqlConnection(ConnectionString))
{
connection.InsertAll<Customer>(customers, batchSize: 100);
}
The library will then generate and pre-compile the packed-statements below.
INSERT INTO [dbo].[Customer] (Name, SSN, Address, CreatedUtc, ModifiedUtc) VALUES (@Name, @SSN, @Address, @CreatedUtc, @ModifiedUtc);
INSERT INTO [dbo].[Customer] (Name, SSN, Address, CreatedUtc, ModifiedUtc) VALUES (@Name1, @SSN1, @Address1, @CreatedUtc1, @ModifiedUtc1);
...
...
...
INSERT INTO [dbo].[Customer] (Name, SSN, Address, CreatedUtc, ModifiedUtc) VALUES (@Name99, @SSN99, @Address99, @CreatedUtc99, @ModifiedUtc99);
The generated packed-statements above will be utilized by the DbCommand
object once.
The library will then split the array of the Customer
entities based on the value of the batchSize
argument, in this case 100. Those making the 1000 rows to only be executed by 10 iterations with 100 rows each.
After the splits, the library will map the parameters as below.
- The first 10 customers will be on the parameters (@Name, ..., @ModifiedUtc) to (@Name9, ..., @ModifiedUtc9).
- The second 10 customers will be on the parameters (@Name10, ..., @ModifiedUtc10) to (@Name19, ..., @ModifiedUtc19).
- The last 10 customers will be on the parameters (@Name90, ..., @ModifiedUtc90) to (@name99 , ..., @ModifiedUtc99).
The library also solved the non-modular number of items passed in the batch operations. Everything is pre-compiled ahead-of-time (AOT).
Performance
The performance is far beyond the usual atomic operation we do when using the ADO.NET. Unless the developers will make an effort to create a complex implementation of pre-compilation, packed-statements generations, auto-mapping of the parameters, etc. There is no way the performance can beat by a native implementation. It is an overkill and a headache to the developers to implement such complex logic.
The performance improvement varies from 40% to 80% based on the number of records in the datasets being processed. The bigger the datasets the higher the percentages you gain.
See some simple statistics below.
Executing via RepoDb. First execution is with compilations.
0.8159257 second(s) for RepoDb with 1000 customers.
Executing via ADO.NET.
0.323014 second(s) for ADO.NET with 1000 customers.
Executing via RepoDb. AOT compilation.
0.0850875 second(s) for RepoDb with 1000 customers.
--------------------------------------------------
Executing via ADO.NET.
1.6131903 second(s) for ADO.NET with 10000 customers.
Executing via RepoDb. AOT compilation.
0.606382 second(s) for RepoDb with 10000 customers.
--------------------------------------------------
Executing via ADO.NET.
13.872307 second(s) for ADO.NET with 100000 customers.
Executing via RepoDb. AOT compilation.
9.8594733 second(s) for RepoDb with 100000 customers.
Press any key to exit.
The actual Test project can be found here.
Thank you for reading this tutorial. Please do not forget to star our GitHub repository.
Top comments (2)
Hi there. I wrote pretty extensive libraries on top of dapper to map ienumerables to table variables. It works really well but will definitely check your approach out. Sounds like it can work on top of an entity framework model providing extra beef.
Thanks. It is a bit more hybrid to use this library. It simplifies much of the implementations. Just do let us know if you have questions.