Typically, when inserting data from a CSV, you might read the file one line at a time and do batch inserts. Something like this:
$file = fopen(storage_path('app/test.csv'), 'r');
$i = 0;
$batch = [];
$batchSize = 1000;
while ($line = fgetcsv($file)) {
if ($i === 0) {
$i++;
continue;
}
$batch[] = [
'column' => $line[1],
// etc...
];
$i++;
if (($i % $batchSize) === 0) {
Model::insert($batch);
$batch = [];
}
}
// Insert the remaining batch
if (! empty($batch)) {
Model::insert($batch);
}
This approach is great if you're dealing with a CSV that has hundreds or even thousands of rows. But what if the CSV has millions of rows? Enter MySQL's LOAD DATA statement.
The LOAD DATA
statement is MySQL's lightning fast, built-in way of loading CSVs into a table in your database. With it, you can specify a custom separator or line ending or even transform the data before it's saved to your table and much more!
So how does it compare to the approach above? Let's see!
Benchmarking
Setup
I have a people
table with first_name
, last_name
, email
, phone
, street_address
, city
, state
, postal_code
, country
and date_of_birth
columns. My 1.5 GB CSV has 10 million rows with similar columns - ID
, First Name
, Last Name
, Email
, Phone
, Street Address
, City
, State
, Postal Code
, Country
and Date of Birth
. The Date of Birth field is in MM/DD/YYYY format so we have to convert it to a DB friendly format.
Computer: M1 Pro MacBook Pro
PHP: 8.3
Laravel: 11
MySQL/MariaDB: MariaDB 11.2.2
Batched Inserts
This will look very similar to the approach above. The code:
// Disable unique checks and foreign key checks to speed up inserts
DB::statement('SET @@session.unique_checks = 0');
DB::statement('SET @@session.foreign_key_checks = 0');
$file = fopen(storage_path('app/test.csv'), 'r');
$i = 0;
$batch = [];
$batchSize = 1000;
while ($line = fgetcsv($file)) {
if ($i === 0) {
$i++;
continue;
}
$batch[] = [
'first_name' => $line[1],
'last_name' => $line[2],
'email' => $line[3],
'phone' => $line[4],
'street_address' => $line[5],
'city' => $line[6],
'state' => $line[7],
'postal_code' => $line[8],
'country' => $line[9],
'date_of_birth' => Carbon::parse($line[10])->format('Y-m-d'),
];
$i++;
if (($i % $batchSize) === 0) {
Person::insert($batch);
$batch = [];
}
}
// Insert the remaining batch
if (! empty($batch)) {
Person::insert($batch);
}
This takes an average of 6 minutes.
LOAD DATA Statement
To write the statement, I'm using a package I wrote - isaacdew/load-data.
The code:
LoadData::from(storage_path('app/test.csv'))
->to(Person::class)
->fieldsTerminatedBy(',')
->fieldsEnclosedBy('"', true)
->useFileHeaderForColumns()
->onlyLoadColumns([
'first_name',
'last_name',
'email',
'phone',
'street_address',
'city',
'state',
'postal_code',
'country',
'date_of_birth',
])
->setColumn('date_of_birth', "STR_TO_DATE(@date_of_birth, '%c/%d/%Y')")
->load();
This takes an average of 0.53 minutes.
To run the benchmarks yourself, see the repository I setup.
Conclusion
MySQL's LOAD DATA
statement is incredibly fast and flexible. If you need to load large CSVs into your database, I highly recommend it. You can install my package using composer to make writing your load statements in Laravel easy - composer require isaacdew/load-data
Top comments (0)