Introduction
Last year, I wrote a web scraping program to collect data from one of the NFT collections on the NFTrade site. My friend wanted the following data included in a CSV: all the NFTs currently for sale in the collection, the total price of each NFT in US dollars based on the current market price of the BNB cryptocurrency that the NFT is for sale in, and the price in USD per rarity point (a value randomly assigned to each NFT in the collection).
The NFTrade website does not have a public API so instead of writing a Node.js script to fetch the data via HTTP calls, I built a small site scraping script to go to the website and actually "scrape" the data from it.
Having not written a web scraper before, I chose to write the program in Python, and as I built the scraper, the project requirements got a bit more complex, and I learned a bunch of useful techniques when coding in Python, which I'm sharing in a series of posts.
After choosing the Selenium Python package to use Selenium WebDriver to scrape the data from NFTrade and extract the details from each NFT that I wanted (the NFT's ID and price in BNB), I needed to update my new list of NFT data in several ways:
- I needed to filter out any NFTs that weren't currently for sale (some that were scraped off the site weren't actually for sale),
- I needed to match all the NFTs for sale with their "rarity scores" (as defined in a separate JSON list) and include those scores along with the rest of the NFT data,
- I needed to compute the total cost and cost per rarity point for each NFT in USD based on the current market price of BNB and add those prices to each NFT in the list as well.
I know this sounds quite complicated, but I broke each of these requirements down into separate methods inside my Python script and learned a lot about working with lists in Python along the way.
Today, I'll show you how to filter lists by whether an attribute exists in an object, how to merge two lists of items together based on matching attributes, and even how to add new object properties to the objects within a list in Python.
NOTE: I am not normally a Python developer so my code examples may not be the most efficient or elegant Python code ever written, but they get the job done.
Sample Python Data
Before I dive into the specifics of my list manipulations in Python, let me give you a little background on what the data looks like that I was working with. Here's a small sample of what the list of NFT data looked like before I started mutating it.
Sample NFT data scraped from the NFTrade site
[
{'id': 6774, 'nft_price': '0.22'},
{'id': 5710, 'nft_price': '0.16'},
{'id': 3187, 'nft_price': '0.8'},
{'id': 6482, 'nft_price': '1.1'},
{'id': 7689, 'nft_price': '0.5'},
{'id': 335, 'nft_price': '4'},
{'id': 7025, 'nft_price': '1.057'},
{'id': 597, 'nft_price': '5'},
{'id': 3936, 'nft_price': '3.1'},
{'id': 2834, 'nft_price': '0.649'},
{'id': 763, 'nft_price': '1.65'},
{'id': 7683, 'nft_price': None},
{'id': 7914, 'nft_price': None}
]
As you can see from the output above, the original data I started with was pretty sparse: the ID number for each NFT and the price in BNB (if it existed) were the only pieces of data present in each object from the info scraped off the NFTrade site. I had my work cut out for me to clean this list up and add more useful data to it, so let's move on to how I did so in the next section.
NOTE: If you'd like to see more about how to scrape the browser data and gather just the necessary bits, read my first couple of blog posts here and here.
Filter objects in a list on whether an attribute exists or not
As I mentioned in the introduction, the first thing I needed to do was clean this list up by removing any NFTs that didn't have a price.
Due to how I had to lazily load and scrape the data from the NFTrade website initially, there was a good chance there were a handful of NFTs I gathered up that weren't for sale, and therefore didn't have prices, so I needed to weed them out first.
Technically every NFT in my list had an nft_price
attribute, but if there was no price listed in the card's scraped data, the nft_price
attribute was assigned None
, which proved very useful.
Inside of the __main__
method in my Python script, I'd already scraped the data from the webpage with the get_cards()
method, then looped through the NFT data to grab just the bits of relevant data with the get_nft_data()
method. Now I wanted to filter down the cards to only include ones listed for sale.
Here's the __main__
method code first:
for_sale_scraper.py
if __name__ == ' __main__':
scraper = ForSaleNFTScraper();
cards = scraper.get_cards(max_card_count=200)
card_data = []
for card in cards:
info = (scraper.get_nft_data(card))
card_data.append(info)
# filter out any extra cards that aren't for sale
cards_for_sale = scraper.filter_priced_cards(card_data)
And here's the method I came up with to filter down to just the NFTs for sale: filter_priced_cards()
.
def filter_priced_cards(self, card_list):
"""Filter card list to only cards with NFT cost."""
# filter out any cards in the list that don't have an NFT price equal to None
cards_for_sale = list(filter(lambda card: card['nft_price'] != None, card_list))
return cards_for_sale
Let's break down what's happening in the second line of the filter_priced_cards()
function.
I used Python's built-in filter()
function to iterate over the card_list
passed to the function to create a new list named cards_for_sale
. The anonymous lambda function inside of filter()
takes each card
in the card_list
and returns True
if the nft_price
attribute of the card is not None
, and False
if it is - this is how it filters out all the cards that don't have a price.
The list()
function that wraps the filter()
converts the result back to a list, because filter()
returns a filter object which is an iterator, not a list.
And finally, the new cards_for_sale
list is returned.
[Merge two lists together by matching object keys
Once the NFTs not for sale have been filtered out, the next step is to add the rarity score to each NFT based on its ID.
For this particular set of NFTs, each NFT had a "rarity score" that had been randomly assigned to it. The rarity scores for each NFT were listed in a separate JSON file in the project and they look like this.
id_rs_score.json
[
{"id": 1, "rs": 18},
{"id": 2, "rs": 13},
{"id": 3, "rs": 14},
{"id": 4, "rs": 10},
{"id": 5, "rs": 22},
{"id": 6, "rs": 13},
{"id": 7, "rs": 10},
{"id": 8, "rs": 13},
{"id": 9, "rs": 13},
{"id": 10, "rs": 9},
// more ids and rarity scores ("rs") below
]
I needed to combine my list of cards_for_sale
with the rarity scores in the JSON file by matching up the id
attribute in each list of objects. For this task, I came up with the following function: get_cards_rarity_score()
.
def get_cards_rarity_score(self, card_list):
"""Combine rarity scores with card list by ID."""
# get rs data for each card from json file
with open("id_rs_list.json") as file:
id_rs_list = json.load(file)
# merge together cards with id_rs_list by their matching ID numbers
match_cards_with_rs_list = groupby(sorted(card_list + id_rs_list, key=itemgetter("id")), itemgetter("id"))
combined_cards = [dict(ChainMap(*g)) for k, g in match_cards_with_rs_list]
# filter out all the items in the merged list without a "for sale" value
filtered_combined_cards = []
for card in combined_cards:
if 'nft_price' in card:
filtered_combined_cards.append(card)
return filtered_combined_cards
To combine the rarity score with any of the NFT objects contained in the card_list
list, the first thing that had to happen was to read the data from the id_rs_list.json
file and assign it to a variable.
# get rs data for each card from json file
with open("id_rs_list.json") as file:
id_rs_list = json.load(file)
Once the JSON list was extracted from the file, the card_list
and id_rs_list
needed to be merged together based on their matching IDs.
The groupby()
function groups elements with the same ID, and then ChainMap()
merged the grouped items into Python dictionaries (objects). The result was a list of dictionaries (combined_cards
) where each dictionary represented a card with combined information from both lists.
# merge together cards with id_rs_list by their matching ID numbers
match_cards_with_rs_list = groupby(sorted(card_list + id_rs_list, key=itemgetter("id")), itemgetter("id"))
combined_cards = [dict(ChainMap(*g)) for k, g in match_cards_with_rs_list]
One thing to note: the combined_cards
list has every NFT listed from the id_rs_list
, not just the ones whose IDs match the IDs in the card_list
. So the combined_cards
list looks like the data below - but for every item in id_rs_list
.
[
{'id': 1, 'rs': 4},
{'id': 2, 'nft_price': '3', 'rs': 6},
{'id': 3, 'rs': 22},
{'id': 4, 'rs': 4},
{'id': 5, 'rs': 10},
{'id': 6, 'nft_price': '5', 'rs': 1},
{'id': 7, 'rs': 1},
{'id': 8, 'nft_price': '0.1', 'rs': 14},
{'id': 9, 'nft_price': '1.5', 'rs': 5},
{'id': 10, 'rs': 1},
# more IDs and NFT data
]
Since the combined_cards
list had every single NFT in it (not just ones for sale), once more I had to filter the list down so that every item without an "nft_price"
was omitted.
# filter out all the items in the merged list without a "for sale" value
filtered_combined_cards = []
for card in combined_cards:
if 'nft_price' in card:
filtered_combined_cards.append(card)
In this case, since there's a (very likely) chance the NFT data in the combined_cards
list did not have the "nft_price"
attribute, I checked if each card had the key "nft_price"
and if so, the card was added to the new filtered_combined_cards
list.
The filtered_combined_cards
list ended up looking like the code snippet below.
[
{'id': 4, 'nft_price': '0.8', 'rs': 10},
{'id': 42, 'nft_price': '1.1', 'rs': 5},
{'id': 174, 'nft_price': '1.4', 'rs': 5},
{'id': 184, 'nft_price': '1.6' 'rs': 19},
{'id': 256, 'nft_price': '2', 'rs': 15},
{'id': 335, 'nft_price': '4', 'rs': 2},
{'id': 562, 'nft_price': '1.2', 'rs': 2},
{'id': 584, 'nft_price': '5', 'rs': 14},
{'id': 597, 'nft_price': '5', 'rs': 17},
# more NFT data here
]
Once all this data manipulation and list combining was done, the function returned the final list of cards (filtered_combined_cards
) that had both rarity score information and an "nft_price"
attribute included.
return filtered_combined_cards
For reference, here's the __main__
function in the Python script, which called the get_cards_rarity_score()
.
for_sale_scraper.py
if __name__ == ' __main__':
scraper = ForSaleNFTScraper();
cards = scraper.get_cards(max_card_count=200)
card_data = []
for card in cards:
info = (scraper.get_nft_data(card))
card_data.append(info)
# filter out any extra cards that aren't for sale
cards_for_sale = scraper.filter_priced_cards(card_data)
# filter out any extra cards that aren't for sale
cards_for_sale = scraper.filter_priced_cards(card_data)
Add new object properties to each object in a list
All right, here's the last Python list manipulation tip I'll be sharing in this post: how to add new properties to each object in a list.
After filtering the NFTs to just the ones for sale, and adding the rarity scores from the id_rs_list
JSON file, I needed to fetch the current price of 1 BNB compared to US dollars, calculate the current price of each NFT in USD, and calculate the cost per rarity point for each NFT.
Fortunately the cryptocurrency data aggregation site CoinGecko, has a REST API that I could use to get the current market price of BNB cryptocurrency in US dollars, and then calculate the rest of the required data based on the info in my NFT card list.
Here is the add_pricing_to_cards()
function I came up with to calculate the prices.
def add_pricing_to_cards(self, card_list):
"""Get current price of BNB and compute cost per rarity point"""
URL="https://api.coingecko.com/api/v3/simple/price?ids=binancecoin&vs_currencies=USD"
response = requests.get(URL).json()
bnb = response['binancecoin']['usd']
# add the current value of bnb to the card_list
cards_bnb_price = [dict(card, bnb=bnb) for card in card_list]
# compute the current price of usd for each card based on its bnb price
cards_with_usd_price= [dict(card, price_usd=round(float(card['nft_price'])*card['bnb'], 2)) for card in cards_bnb_price]
# compute the current cost usd of each rarity score point
cards_with_rs_prices = [dict(card, cost_per_rs=round(card['price_usd']/card['rs'], 2)) for card in cards_with_usd_price]
return cards_with_rs_prices
In the function, the first thing I did was call the CoinGecko price API to get the current price of BNB in USD.
URL="https://api.coingecko.com/api/v3/simple/price?ids=binancecoin&vs_currencies=USD"
response = requests.get(URL).json()
bnb = response['binancecoin']['usd']
Next, I added the bnb
to each object in the input card_list
and created a new list named cards_bnb_price
.
# add the current value of bnb to the card_list
cards_bnb_price = [dict(card, bnb=bnb) for card in card_list]
After including the current BNB price in USD, I was able to compute the total price in USD for each NFT in the list by multiplying the card's original price in BNB by the current price of BNB in USD.
# compute the current price of usd for each card based on its bnb price
cards_with_usd_price= [dict(card, price_usd=round(float(card['nft_price'])*card['bnb'], 2)) for card in cards_bnb_price]
And I also calculated the price in USD per rarity score point as well, simply by dividing the card's total price in USD by the rarity score number (rs
).
# compute the current cost usd of each rarity score point
cards_with_rs_prices = [dict(card, cost_per_rs=round(card['price_usd']/card['rs'], 2)) for card in cards_with_usd_price]
The function then returned the list of cards with the added pricing info, including BNB price, USD price, and USD cost per rarity score point. The final list data looked like this.
[
{'bnb': 352.44, 'cost_per_rs': 28.2, 'id': 4, 'nft_price': '0.8', 'price_usd': 281.95, 'rs': 10},
{'bnb': 352.44, 'cost_per_rs': 77.54, 'id': 42, 'nft_price': '1.1', 'price_usd': 387.68, 'rs': 5},
{'bnb': 352.44, 'cost_per_rs': 98.68, 'id': 174, 'nft_price': '1.4', 'price_usd': 493.42, 'rs': 5},
{'bnb': 352.44, 'cost_per_rs': 29.68, 'id': 184, 'nft_price': '1.6', 'price_usd': 563.9, 'rs': 19},
{'bnb': 352.44, 'cost_per_rs': 46.99, 'id': 256, 'nft_price': '2', 'price_usd': 704.88, 'rs': 15},
# more NFT data
]
The add_pricing_to_cards()
function is called from the main Python function like so:
for_sale_scraper.py
if __name__ == ' __main__':
scraper = ForSaleNFTScraper();
cards = scraper.get_cards(max_card_count=200)
card_data = []
for card in cards:
info = (scraper.get_nft_data(card))
card_data.append(info)
# filter out any extra cards that aren't for sale
cards_for_sale = scraper.filter_priced_cards(card_data)
# filter out any extra cards that aren't for sale
cards_for_sale = scraper.filter_priced_cards(card_data)
# add rarity scores to all cards in the list by merging them with the id_rs_list
cards_with_rs = scraper.get_cards_rarity_score(cards_for_sale)
And now that I had all the data that my friend requested for each NFT in the collection for sale on NFTrade, all that was left to do was turn the whole list into a downloadable CSV that would be easy to sort and manipulate. I'll save that for a future post.
Conclusion
When I had to use Python to build a website scraper to get NFT data off of the NFTrade site, I learned a lot of useful new coding tricks along the way.
After I'd managed to scrape the data with the help of Selenium Python, and extract the initial data I needed from each NFT by using WebDriver's XPath, my job was far from complete.
I needed to take the little data I had and combine those NFTs with "rarity scores" in a JSON file, fetch the current market price for BNB cryptocurrency in US dollars, and then compute the total cost of each NFT and cost per rarity point, and as I completed these tasks I learned a heck of a lot about how to work with lists of complex objects in various new ways. And I feel confident these new techniques will help me out in any future Python endeavors I might undertake.
Check back in a few weeks — I’ll be writing more blogs about the problems I had to solve while building this Python website scraper in addition to other topics on JavaScript or something else related to web development.
If you’d like to make sure you never miss an article I write, sign up for my newsletter here: https://paigeniedringhaus.substack.com
Thanks for reading. I hope learning to filter, merge, and alter objects within lists in Python proves helpful for you in your own projects.
Further References & Resources
- NFTrade website
- Python documentation
- CoinGecko cryptocurrency tracker and analytics site
- First blog post about scraping data from a lazy-loading website using Selenium Python
- Follow up blog post about limiting data searches to a particular element on a page instead of the whole page when using XPath
Top comments (0)