Q: What is CSV?
A: Comma separated values or CSV is file format defined for exchanging information where comma(,) acts as delimiter and each row/record is separated by line break.
For example, consider this capital.csv
file.
country,capital
India,New Delhi
Ireland,Dublin
Germany,Berlin
Austria,Vienna
Adding a header is optional, but it very common to find one in a csv file as it provides context on the data.
Q: What are the benefits that I receive from using this format?
A: Firstly, it is quite easy to understand for a human. If you have used spreadsheets sometime in your life, you have interacted with some form of csv. Secondly, it is useful for storing structured data, where in each column is some data point, similar to a relational database.
Q: Okay, now tell me how can I read a csv file in Python?
A: You can read a csv file by using the standard csv module, as follows:
import csv
with open('capital.csv') as f:
reader = csv.reader(f)
header = next(reader)
for row in reader:
print(row)
Output:
['India', 'New Delhi']
['Ireland', 'Dublin']
['Germany', 'Berlin']
['Austria', 'Vienna']
OR
import csv
with open('capital.csv') as f:
reader = csv.DictReader(f)
for row in reader:
print('Country:', row['country'])
Output:
Country: India
Country: Ireland
Country: Germany
Country: Austria
By using the DictReader
we get the output in dictionary format, using which we can access the members as row['country']
, or row['capital']
. Field names (header) can be retrieved as reader.fieldnames
.
Q: Is it mandatory for the delimiter to be a comma or can we use a delimiter of our own choice?
A: Good question! There is no specific guideline for the same. Python allows to customize the delimiter and also provides several other options. A collection of such options is called a dialect
.
You can look up the different dialects:
import csv
print(csv.list_dialects())
Output:
['excel', 'excel-tab', 'unix']
The different options are:
- delimiter
- doublequote
- escapechar
- lineterminator
- quotechar
- quoting
- skipinitialspace
- strict
Most of them are self explanatory. You can read more about them here.
You can either create a separate dialect and pass it to the reader
function, or if you require only few options, and creating a separate dialect sounds overkill for you then you can pass these options directly to the reader
function.
For example, consider the sample capital.csv
file with some minor changes.
country, capital
India, New Delhi
Ireland, Dublin
Germany, Berlin
Austria, Vienna
The file is maintained with some spaces to be able to read it properly, but but but, we don't require these spaces while parsing it. Guess which option can we use to solve this? Yes, you have guessed it right! skipinitialspace
it is.
import csv
with open('captial.csv') as f:
reader = csv.reader(f, skipinitialspace=True)
header = next(reader)
for row in reader:
print(row)
Output:
['India', 'New Delhi']
['Ireland', 'Dublin']
['Germany', 'Berlin']
['Austria', 'Vienna']
Try playing with the rest of the options.
And if you want to create a separate dialect, you can use csv.Dialect
for it, and pass it to the reader
function as csv.reader(f, dialect=dialect)
.
...
my_dialect = csv.excel()
my_dialect.skipinitialspace = True
with open('capital.csv') as f:
reader = csv.reader(f, dialect=my_dialect)
...
Here I am using the default csv.excel
dialect, with a tweak in setting skipinitialspace
option as true
.
There is yet another option,
...
my_dialect = csv.excel()
my_dialect.skipinitialspace = True
csv.register_dialect('my_dialect', my_dialect)
with open('capital.csv') as f:
reader = csv.reader(f, dialect='my_dialect')
...
This way you can register a dialect and then use it throughout the program.
Q: That was quite insightful! Before I leave, can you elaborate on how can I create a csv file if I have some data? I am guessing we have a writer
function complementing the reader
?
A: You are absolutely right! Similar to reader
and DictReader
, we have writer
and DictWriter
.
import csv
header = ["country", "capital"]
data = [("India", "New Delhi"), ("Ireland", "Dublin"), ("Germany", "Berlin"), ("Austria", "Vienna")]
with open('new_captial.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(header)
for row in data:
writer.writerow(row)
Rather than writing our own "for loop" for writing row, we can also use the writerows
method of the writer.
...
with open('new_captial.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(data)
csv.DictWriter
is useful when working with json
data in a particular format.
import csv
header = ["country", "capital"]
data = [
{"country": "India", "capital": "New Delhi"},
{"country": "Ireland", "capital": "Dublin"},
{"country": "Germany", "capital": "Berlin"},
{"country": "Austria", "capital": "Vienna"},
]
with open('new_captial.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=header)
writer.writeheader()
writer.writerows(data)
You can also pass the dialect
to writer
function to change the output format.
Top comments (0)