apiharbor

Posted on Jun 30 • Updated on Jul 2 • Originally published at usemyapi.com

Analyzing Likes Using Instagram API with python - part 3

#python #tutorial #programming

Analyzing likes using Instagram API with python - part 3

Some time ago, we started working on an app to analyze likes on Instagram posts. We want to answer the question: What percentage of followers like Instagram posts? This will tell us whether the account should focus on gaining new followers or improving content since its current followers might not be interested. The result will be a bar chart for each post showing the ratio of all likes to likes from followers.

In our previous post How to analyze Instagram likes – part 2, we created an API client in Python. We decided to use the Instagram Scraper 2023 API. We wrote a client to fetch the data needed for our task. Today, it’s time to finish our app.

Save requests to Instagram API by adding file caching

Let’s revisit our rapidapi_client. We want to cache every call to the endpoint. We don’t want to waste requests while working, so it’s worth returning previously fetched data.

Currently, we have three functions that we will add caching to. Let’s write the caching method first:

def __get_cache_file_path(self, endpoint: str, identifier: str, count: int, end_cursor: Optional[str]) -> str:
    """
    Generate a file path for caching based on the endpoint, identifier, count, and end_cursor.
    """
    filename = f"{endpoint}_{identifier}_{count}_{self.__get_end_cursor(end_cursor)}.pkl"
    return os.path.join(self.cache_dir, filename)

def __get_or_set_cache(self, endpoint: str, identifier: str, count: int, end_cursor: Optional[str], url: str) -> Any:
    """
    Retrieve data from the cache if it exists; otherwise, fetch data from the URL, cache it, and return the data.

    :param endpoint: The API endpoint being queried.
    :param identifier: The identifier for the request (e.g., user ID or post shortcode).
    :param count: The number of items requested.
    :param end_cursor: The pagination cursor for the request.
    :param url: The URL to fetch data from if not cached.
    :return: The data retrieved from the cache or fetched from the URL.
    """
    cache_file_path = self.__get_cache_file_path(endpoint, identifier, count, end_cursor)

    # Check if the cache file exists
    if os.path.exists(cache_file_path):
        # Load and return data from cache
        with open(cache_file_path, 'rb') as cache_file:
            return pickle.load(cache_file)

    # Make the API request if cache does not exist
    response = requests.get(url, headers=self.headers)
    data = response.json()

    # Save the fetched data to the cache
    with open(cache_file_path, 'wb') as cache_file:
        pickle.dump(data, cache_file)

    return data

We also need to change the constructor to create a folder for caching files:

def __init__(self, cache_dir: str = 'cache'):
    self.headers = {
        'x-rapidapi-key': RAPIDAPI_KEY,
        'x-rapidapi-host': RAPIDAPI_HOST
    }
    self.cache_dir = cache_dir
    if not os.path.exists(self.cache_dir):
        os.makedirs(self.cache_dir)

Everything looks fantastic! Now, we need to change the methods that communicate with the API:

def get_user_posts(self, userid, count, end_cursor=None):
    url = self.__get_api_url(f"/userposts/{userid}/{count}/{self.__get_end_cursor(end_cursor)}")
    return self.__get_or_set_cache("userposts", userid, count, end_cursor, url)

def get_post_likes(self, shortcode, count, end_cursor=None):
    url = self.__get_api_url(f"/postlikes/{shortcode}/{count}/{self.__get_end_cursor(end_cursor)}")
    return self.__get_or_set_cache("postlikes", shortcode, count, end_cursor, url)

def get_user_followers(self, userid, count, end_cursor=None):
    url = self.__get_api_url(f"/userfollowers/{userid}/{count}/{self.__get_end_cursor(end_cursor)}")
    return self.__get_or_set_cache("userfollowers", userid, count, end_cursor, url)

Nothing extraordinary here, we just use __get_or_set_cache and return the cached content if found instead of querying the Instagram API.

We are now protected against wasting requests, let’s move on!

Creating plain old python objects (POPO)

POPO are simple Python objects without additional methods or attributes other than those explicitly defined. We need them to pass data between classes.

Let’s think about the POPO we need. We must gather data from the API and merge it into one object to pass it further. So, we need a simple PostData class:

class PostData:
    def __init__(self, post_id, post_likers: List[str], followers: List[str]):
        self.post_id = post_id
        self.post_likers = post_likers
        self.followers = followers

Another POPO will be the analysis result, let’s name this class PostResult:

class PostResult:
    def __init__(self, post_id, all_likes_count, likers_likes_count):
        self.all_likes_count = all_likes_count
        self.likers_likes_count = likers_likes_count
        self.post_id = post_id

Let’s create a list of PostData built from Instagram API data.

# Create an instance of the RapidApiClient
api_client = RapidApiClient()
end_cursor = None

# Replace with your actual Instagram account ID
ACCOUNT_ID = '__PASTE_ACCOUNT_ID__'

# Retrieve followers for the account (maximum 50 followers for testing purposes)
followers = api_client.get_user_followers(ACCOUNT_ID, 50)
# Extract usernames of the followers
followers_usernames = [user['username'] for user in followers["data"]["user"]]

# Initialize an empty list to store post data
posts_data: List[PostData] = []
PAGE_LIMIT = 1  # Limit the number of pages to retrieve for testing purposes

for i in range(PAGE_LIMIT):
    # Retrieve user posts (maximum 5 posts per page for testing purposes)
    posts = api_client.get_user_posts(ACCOUNT_ID, 5, end_cursor)

    # Check if there is no next page, break the loop if not
    if not posts["data"]["next_page"]:
        break

    data = posts["data"]
    end_cursor = data["end_cursor"]  # Update the end_cursor for the next page
    edges = data["edges"]  # Extract the posts data

    for edge in edges:
        node = edge["node"]
        post_id = node["id"]  # Extract the post ID

        # Retrieve likes for the post (maximum 50 likes for testing purposes)
        post_likes = api_client.get_post_likes(node["shortcode"], 50)
        # Extract usernames of the users who liked the post
        post_likers = [like['username'] for like in post_likes["data"]["likes"]]

        # Create a PostData object and add it to the posts_data list
        posts_data.append(PostData(post_id, post_likers, followers_usernames))

At this point, posts_data contains a list of PostData objects composed of post IDs, lists of likers, and lists of followers. Great! We have the data ready; it’s time to feed it to the analyzer.

Analyzing data from the Instagram API

Let’s think about what such an analyzer should do. Essentially, it should count, just count 😉 But what exactly?

The total number of likes on a given post
The number of likes from followers

This is achieved by the following code:

from typing import List

class LikesAnalyzer:
    def __init__(self, posts_data: List[PostData]):
        """
        Initialize the LikesAnalyzer with a list of PostData objects.

        :param posts_data: List of PostData objects containing data for each post.
        """
        self.posts_data = posts_data

    def get_analysis(self) -> List[PostResult]:
        """
        Analyze the likes data to determine the total likes and the likes from followers for each post.

        :return: A list of PostResult objects containing the analysis results for each post.
        """
        results: List[PostResult] = []  # Initialize an empty list to store the analysis results

        # Iterate through each PostData object in the posts_data list
        for p in self.posts_data:
            all_likes_count = len(p.post_likers)  # Count the total number of likes for the post
            likers_likes_count = 0  # Initialize the count for likes from followers

            # Iterate through each liker of the post
            for liker in p.post_likers:
                # Check if the liker is also a follower
                if liker in p.followers:
                    likers_likes_count += 1  # Increment the count if the liker is a follower

            # Create a PostResult object with the post ID, total likes, and likes from followers
            results.append(PostResult(p.post_id, all_likes_count, likers_likes_count))

        return results  # Return the list of PostResult objects

The get_analysis function returns a list of PostResult. Yes, these are the results of our analysis. Such dry results don’t tell us much. Let’s make a chart out of them!

Displaying the chart based on

Data from the Instagram API

I propose the class name PostLikesPlotter to keep it simple. The best representation for the results will be a bar chart. It will immediately show on one bar how many followers liked a given post.

import matplotlib.pyplot as plt
from typing import List
import random

# Constants for the bar chart
BAR_WIDTH = 0.5  # Width of the bars in the chart
ALL_LIKES_COLOR = 'blue'  # Color of the bars representing all likes
LIKERS_LIKES_COLOR = 'green'  # Color of the bars representing likes from followers

class PostLikesPlotter:

    def plot_analysis(self, results: List[PostResult]) -> None:
        """
        Plot the analysis of post likes, showing total likes and likes from followers for each post.

        :param results: A list of PostResult objects containing analysis results for each post.
        """
        # Extract post IDs, total likes counts, and likers likes counts from the results
        post_ids = [result.post_id for result in results]
        all_likes_counts = [result.all_likes_count for result in results]
        likers_likes_counts = [result.likers_likes_count for result in results]

        # Create a figure and axis for the bar chart
        fig, ax = plt.subplots()

        # Plot the bars for total likes
        bars = ax.bar(post_ids, all_likes_counts, BAR_WIDTH, color=ALL_LIKES_COLOR, label='All Likes')

        # Plot the bars for likes from followers, overlaying them on the total likes bars
        for i, (bar, all_likes, liker_likes) in enumerate(zip(bars, all_likes_counts, likers_likes_counts)):
            ax.bar(bar.get_x(), liker_likes, BAR_WIDTH, color=LIKERS_LIKES_COLOR, label='Likers Likes' if i == 0 else "")

        # Set the labels and title of the chart
        ax.set_xlabel('Post ID')
        ax.set_ylabel('Likes Count')
        ax.set_title('Likes Analysis per Post')

        # Move the legend outside the bar chart area
        ax.legend(loc='upper left', bbox_to_anchor=(1, 1))

        # Adjust the appearance of the x-axis labels
        ax.set_xticklabels(post_ids, fontsize=8, rotation=45, ha='right')

        # Display the bar chart
        plt.show()

Summary

We have a working application written in Python for Instagram data analysis! For those who want to download the source code, visit our website UseMyApi.com where you will find a link to GitHub with the code.

If you liked the post ❤️ leave a comment or any reaction. If you need any other application or changes to the current code, you can "hire" me for programming work. Feel free to contact me!

All the best for you all!

DEV Community

Analyzing Likes Using Instagram API with python - part 3

Analyzing likes using Instagram API with python - part 3

Save requests to Instagram API by adding file caching

Creating plain old python objects (POPO)

Analyzing data from the Instagram API

Displaying the chart based on

Summary

Top comments (0)

Read next

Entenda a diferença entre modelo conceitual e modelo lógico em Banco de Dados

Highcharts and React

409. Longest Palindrome

Setting Up Elasticsearch and Kibana Single-Node with Docker Compose