As you may already know, dev.to has an API to fetch articles and users.
Let's give it a try with my favorite HTTP client, httpie:
http https://dev.to/api/users/1
{
"github_username": "benhalpern",
"id": 1,
"joined_at": "Dec 27, 2015",
"location": "Brooklyn, NY",
"name": "Ben Halpern",
"profile_image": "https://res.cloudinary.com/practicaldev/image/fetch/s--DOU9qJSH--/c_fill,f_auto,fl_progressive,h_320,q_auto,w_320/https://thepracticaldev.s3.amazonaws.com/uploads/user/profile_image/1/f451a206-11c8-4e3d-8936-143d0a7e65bb.png",
"summary": "A Canadian software developer who thinks heβs funny.",
"twitter_username": "bendhalpern",
"type_of": "user",
"username": "ben",
"website_url": "http://benhalpern.com"
}
What I first noticed is that id
seems to be auto-incremented.
After some tries, I quickly found out the total number of users on dev.to: around 119000
.
It gives me the idea to scrape all the users public data and try to play with it.
Let's try with this command:
for i in {1..119000}; do http https://dev.to/api/users/$i >> users.dev.to.json && echo "," >> users.dev.to.json && echo "https://dev.to/api/users/$i"; done
After a few hours running in the background, I finally got all the users in my JSON file.
I ended up by importing the JSON file in a mongodb collection using NoSQLBooster.
There isn't many information in the users' profile, but two of them worth having a look at them: joined_at
and location
. It could be very interesting to have a diagram showing the progression of registrations through time. Maybe I'll do it some day, but for now, let's compute some statistics on users' location.
I used two queries to compute the statistics. This one helped me to get the big trends in people's location:
db.getCollection("users.dev.to").aggregate(
{"$group": {_id: "$location", count:{$sum:1}}}
).sort("-count")
Then, I use another query to get the count of people per location:
db.getCollection("users.dev.to").find({"location": /\bdublin\b/i})
The conclusions are the following:
Most of the users (90%) don't fill their location.
16.3% of the users who filled their location are from USA
.
5.8% are from India
.
4.9% are from UK
.
4% are from Japan
.
3.8% are from Germany
.
Let's have a look at the full statistics below:
# | country | region or city | #region or city | #country | % |
---|---|---|---|---|---|
1 | USA | 1850 | 16.32% | ||
Incl New York | 205 | ||||
Incl CA | 194 | ||||
Incl NY | 147 | ||||
Incl San Francisco | 137 | ||||
Incl TX | 119 | ||||
Incl Seattle | 117 | ||||
Incl Los Angeles | 107 | ||||
Incl Chicago | 101 | ||||
Incl Washington | 81 | ||||
Incl FL | 78 | ||||
Incl PA | 68 | ||||
Incl MA | 67 | ||||
Incl AZ | 38 | ||||
Incl MI | 34 | ||||
Incl OH | 30 | ||||
2 | India | 660 | 5.82% | ||
Incl Bangalore | 107 | ||||
Incl Delhi | 103 | ||||
3 | UK | 551 | 4.86% | ||
Incl London | 258 | ||||
Incl England | 35 | ||||
Incl Scotland | 33 | ||||
Incl Wales | 7 | ||||
4 | Japan | 451 | 3.98% | ||
Incl Tokyo | 242 | ||||
5 | Germany | 428 | 3.77% | ||
Incl Berlin | 128 | ||||
6 | Brazil | 301 | 2.65% | ||
7 | France | 294 | 2.59% | ||
Incl Paris | 98 | ||||
8 | Canada | 257 | 2.27% | ||
Incl Toronto | 107 | ||||
9 | Nigeria | 187 | 1.65% | ||
Incl Lagos | 97 | ||||
10 | Netherlands | 154 | 1.36% | ||
Incl Amsterdam | 62 | ||||
11 | Spain | 110 | 0.97% | ||
12 | Argentina | 109 | 0.96% | ||
13 | Italy | 102 | 0.90% | ||
14 | Indonesia | 92 | 0.81% | ||
15 | Russia | 89 | 0.78% | ||
16 | Mexico | 86 | 0.76% | ||
17 | Philippines | 85 | 0.75% | ||
18 | Poland | 81 | 0.71% | ||
19 | Ukraine | 75 | 0.66% | ||
20 | Portugal | 63 | 0.56% | ||
21 | South Africa | 57 | 0.50% | ||
22 | Pakistan | 56 | 0.49% | ||
23 | Turkey | 55 | 0.49% | ||
24 | Belgium | 53 | 0.47% | ||
25 | Egypt | 52 | 0.46% | ||
26 | Bangladesh | 50 | 0.44% | ||
27 | Colombia | 46 | 0.41% | ||
28 | Romania | 45 | 0.40% | ||
29 | Kenya | 45 | 0.40% | ||
30 | Switzerland | 45 | 0.40% | ||
31 | Austria | 43 | 0.38% | ||
32 | China | 40 | 0.35% | ||
33 | Norway | 38 | 0.34% | ||
34 | Ireland | 33 | 0.29% | ||
Total main locations (listed above) | 6683 | 58.94% | |||
Other locations (unlisted) | 4656 | 41.06% | |||
Total locations | 11339 | 100.00% | |||
<null> |
90260 | 80.80% | |||
<empty> |
10108 | 9.05% | |||
Total | 111707 | 100% |
Note: these numbers are one month old, but they give a good overview of the trends.
Thanks for reading!
Top comments (47)
Cool!!
I guess you might want to add throttling on the API based on IP ;)
Yes, but itβs also served from our CDN so itβs not a resource-gobbling endpoint.
π±
Although cool in and of itself, this also highlights why using autoincremented ids is not a good idea when it comes to protecting users privacy (this is not the case of course since I don't see any critical data in the JSON response).
I always use UUIDs for exactly that reason, and also because it helps when building distributed systems (id clashing is less probable with them)
I'm sorry, my english level is quite low. Should I have written "scrape" ?
I wouldn't say your English is bad, your writing is better than a lot of native speakers π
At the time of posting this, DEV has 119,105 registered users. This is so cool π
dev.to/api/users/119105
yeah => dev.to/api/users/254188
What about me? I'm in Norway...says so on my profile :P
Norway: 33 users.
38 with Oslo.
i'm sure there's quite a few from Denmark as well :)
Sure, but there are a lot of other countries I didn't put in the list because their count of users is low.
Ah I see, I just assumed it was a complete list because of this line:
Let's have a look at the full statistics below:
And to be fair 38 is more than Ireland ;)
In fact, Norway and Denmark are in "other locations".
I am from Romania. :D
Romania: 45 (Incl Bucharest)
Why?
I've been on the internet since 1997. I have never bothered about location, age, sex, species. (Although secretly I would be excited to converse with a cat.)
1 of 57 from South Africa!
Nice, it shows that this is a small world!
I'm π«π·, but living in Barbados π§π§, i would be interested in knowing if i'm the only one from there? π€
I'll have a look at your query this evening.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.