I've had a hard time recently working with the CSV files that Nutrionix published for their database. They are quite large, and that makes the files hard to open. However, the main issue was deleting the correct columns and having correct punctuation around items. This caused PostgreSQL to give me various errors. Here are the main ones.
- The column names in the CSV were not the same as database
- I had an empty column which was represented by a ','
- I didn't have quotations around items
- single quotations (') needed another single quotation, (').
I initially tried solving these issues using Excel, but Excel has a problem of deleting quotation marks in the file. I discovered that is a common issue for individuals, so I used pandas. Here is the Python script I used for one of the CSV files:
import pandas as pd
#read
file = pd.read_csv('food_insertion3(Done).csv')
#rename columns
file.rename(columns={'fdc_id' :
'item_id', "description": "item_description", "food_category_id" :
"food_category"}, inplace= True)
#drop unneeded columns
file.drop(columns='data_type', inplace=True)
#apply quotations
file = file.applymap(lambda x: f'"{x}"' if isinstance(x,str) else x)
file.to_csv('food_insertion3(Done).csv', index= False)
#deal with single quotes
import csv
with open('food_insertion3(Done).csv', 'r', newline='', encoding='utf-8') as infile:
reader = csv.reader(infile)
rows = []
for row in reader:
cor_row = []
for field in row:
cor_field = field.replace("'", "''")
cor_row.append(cor_field)
rows.append(cor_row)
with open('food_insertion3(Done).csv', 'w', newline='', encoding='utf-8') as outfile:
writer = csv.writer(outfile)
writer.writerows(rows)
Top comments (0)