Hi everyone.
Yesterday, I came across a challenge while I applying my studies od Pandas library.
I usually use CSV files I download to do my analysis, but this time my data was in an online CSV file that was constantly updated by other people. So, at least for me, is pretty boring to have to download every time the file and use it on the code.
So, here I am to show you how I solved this problem and managed to collect the data directly from the internet:
Case 1: CSV directly from a link
I have gotten this link to download a CSV file from G Drive:
https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download
If you click will download a file of the price of the dollar to Real.
Code:
import pandas as pd
url from the file you wanna download
#url ='https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download'
#Creating a dataframe to get the file
price_df = pd.read_csv(url)
price_df
Pretty easy ;)
But, sometimes you don't be so easy. You will have to format the file.
Case 2: CSV in a request that needs to be handled
This link is a history of coffee price in Brazil I found on Google:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp
At the end of the page you will see a link to download the file:
If you copy the URL, you will get that link:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp?d-3496238-e=2&6578706f7274=1
Let´s apply the same code:
coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
coffe_df = pd.read_csv(coffe_url)
coffe_df
And... Ops... Doesn't work.
Don't work because that link is not with the format like the other file of GDrive.
To solve that problem, we have to import two libraries:
Import requests
Import io
import requests
import io
coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
# Doing a request from website
content_url = requests.get(coffe_url).content
file = io.StringIO(content_url.decode('latin1'))
coffe_df = pd.read_csv(file, sep='\t')
coffe_df
And now we did it!!!
Notes: I use decode 'latin1', because the file has some special characters.
Note2: the separators of the file aren't a common separator. Is \t for some reason.
Note3: The file download like a text, the library io is to transform to file.
You also can use the strings methods, but more simple use IO library
Top comments (0)