Getting a CSV from internet with Pandas

#python #pandas #computerscience

Hi everyone.
Yesterday, I came across a challenge while I applying my studies od Pandas library.

I usually use CSV files I download to do my analysis, but this time my data was in an online CSV file that was constantly updated by other people. So, at least for me, is pretty boring to have to download every time the file and use it on the code.

So, here I am to show you how I solved this problem and managed to collect the data directly from the internet:

Case 1: CSV directly from a link

I have gotten this link to download a CSV file from G Drive:
https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download

If you click will download a file of the price of the dollar to Real.

Code:

import pandas as pd
url from the file you wanna download
#url ='https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download'
#Creating a dataframe to get the file
price_df = pd.read_csv(url)
price_df

Pretty easy ;)

But, sometimes you don't be so easy. You will have to format the file.

Case 2: CSV in a request that needs to be handled

This link is a history of coffee price in Brazil I found on Google:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp

At the end of the page you will see a link to download the file:

If you copy the URL, you will get that link:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp?d-3496238-e=2&6578706f7274=1

Let´s apply the same code:

coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
coffe_df = pd.read_csv(coffe_url)
coffe_df

And... Ops... Doesn't work.
Don't work because that link is not with the format like the other file of GDrive.

To solve that problem, we have to import two libraries:
Import requests Import io

import requests
import io

coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
# Doing a request from website 
content_url = requests.get(coffe_url).content
file = io.StringIO(content_url.decode('latin1'))
coffe_df = pd.read_csv(file, sep='\t')
coffe_df

And now we did it!!!

Notes: I use decode 'latin1', because the file has some special characters.
Note2: the separators of the file aren't a common separator. Is \t for some reason.
Note3: The file download like a text, the library io is to transform to file.

You also can use the strings methods, but more simple use IO library

DEV Community

Getting a CSV from internet with Pandas

Case 1: CSV directly from a link

Code:

Case 2: CSV in a request that needs to be handled

Top comments (0)

Read next

Navigating the world of Harry Potter with Knowledge Graphs

Transform Any Image into a Sketch with Python 🚀

Detecting Hallucinations in LLMs with Discrete Semantic Entropy and Perplexity

🌍 GeoIP Lookup Tool: Easily Get Geolocation Information of Any IP Address.