Recently, there have been talks of Instagram closing down its API and leaving access to only corporate partners.
Data scraping becomes even more important in this scenario because of Instagram's large user base. Instagram is a platform full of data in its every nook and cranny.
I decided to start by scraping whatever data we can find on a person's account page, which you can access at https://instagram.com/
Let's take a look at my page for example at https://instagram.com/manan.code
This is the main area I am interested in, what all could we scrape from here and how? Right-click on the page and click view page source to see the source file behind it.
You'll see something like this -
Now at first look, this seems incomprehensible and it seems almost impossible to find any data from this, it's just sea of link and script tags.
But the data is there somewhere for sure.
I did some digging and found out the script tag that consists basically everything we need.
Now that we know where the data is, let's move on to the code.
We'll use the requests module and BeautifulSoup.
So till this point in the code, we've requested Instagram and got the source, after that we've converted it to a BeautifulSoup object to make it easy to find the script tag we need. After converting it to BeautifulSoup object, we've used the find_all function in the BeautifulSoup library and found all the script tags, by a little trial and error, I discovered, the script tag we need is the 5th one, so we index it appropriately and find the script tag we need.
But, we need to do one more thing, right now what we have is not a string, we can't slice it to find what we need. Hence, we access the contents of the script tag.
The next step is to find out where's the part we need.
Now what we've done is, if you remember, the javascript object having all the data started from
{"config":
, I've simply used a little string processing to slice out the whole javascript object and having it isolated, convert it to a JSON object using loads from the json package in the standard library.If you print
data_json
, this is what you get - On looking closely, I figured out all the right keys to the data we need, here is the result.
and this marks the end of our journey to scraping Instagram!
Check out my video where I go over the same thing -
Top comments (8)
Hello!!! I've been working on an open source library for scraping Instagram data you might find interesting at this repo. It scrapes the same JSON data you explored in your blog post with steps as easy as
and that's it!! It scraped almost all the data points you can get from the
data_json
you were exploring. It has similar functionality for hashtags and posts as well, check it out :)Oh that is great! I'll look into it for sure. Thanks for commenting about it.
Great writeup, thank you! I've just published a simple tutorial on Instagram scraping and discovering micro-influencers via SQL.
How to scrape Instagram followers with Node.js, put results to MySQL, and discover micro-influencers
restyler ・ Oct 3 ・ 9 min read
I will appreciate your feedback and comments.
Cheers!
Hi,
Thanks for sharing this video.
I see that some accounts have the data we need at script tag 4, not 5. I am trying to scrape data for a list of users. How to extract search tag index and make this work for all users?
Great boi
How to understand scratching? Can I scratch Instagram?)
WUT EVEN DUDE ?
Some comments may only be visible to logged-in visitors. Sign in to view all comments.