ATLG Sidebar
Uhh OK
Hay all! If this is your first time, welcome! If not welcome back! This week I'm starting a new "series" here on Dev.to. This week so I decided to try to flex my muscle memory and write a small utility that would interact with an API. I'm using the dev.to/api/articles
endpoint in this case. This dovetails into another side project I want to start poking around with! I need a chunk of data on hand for that (we'll come back to that on a different sidebar post if it ever comes together).
Code Walkthrough
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"sync"
"time"
)
If you write Go, JSON-to-Go is going to be your friend. If we need to unmarshal JSON we can simply paste in a sample of the JSON and it will give you a struct. Nice. We don't actually need the entire struct
we could have left only the ID int32 'json:"id"'
as its the only one we use. I'm included the entire thing now, we may use it in the future.
// Articles array JSON struct
type Articles []struct {
TypeOf string `json:"type_of"`
ID int32 `json:"id"`
Title string `json:"title"`
Description string `json:"description"`
CoverImage string `json:"cover_image"`
PublishedAt time.Time `json:"published_at"`
TagList []string `json:"tag_list"`
Slug string `json:"slug"`
Path string `json:"path"`
URL string `json:"url"`
CanonicalURL string `json:"canonical_url"`
CommentsCount int `json:"comments_count"`
PositiveReactionsCount int `json:"positive_reactions_count"`
User struct {
Name string `json:"name"`
Username string `json:"username"`
TwitterUsername string `json:"twitter_username"`
GithubUsername interface{} `json:"github_username"`
WebsiteURL string `json:"website_url"`
ProfileImage string `json:"profile_image"`
ProfileImage90 string `json:"profile_image_90"`
} `json:"user"`
}
I started to lay this out as if we were going to use it as an importable package. It could have been done mostly inline in main()
but maybe we'll flesh this out into a full DEV API client. I'll have to take a look at the v0 API a bit closer and see what is actually supported.
// DevtoClient struct
type DevtoClient struct {
DevtoAPIURL string
Client *http.Client
}
// New returns our DevtoClient
func New(apiurl string, client *http.Client) *DevtoClient {
if client == nil {
client = http.DefaultClient
}
return &DevtoClient{
apiurl,
client,
}
}
Formatting our requests this way might be overkill but, for a start, it puts us in a good spot. The requests can take a few different parameters but we aren't concerned with any of those. At least not this time around, we only want the articles themselves.
// FormatPagedRequest returns *http.Request ready to do() to get one page
func (dtc DevtoClient) FormatPagedRequest(param, paramValue string) (r *http.Request, err error) {
URL := dtc.DevtoAPIURL + "articles/?" + param + "=" + paramValue
fmt.Printf("%v\n", URL)
r, err = http.NewRequest(http.MethodGet, URL, nil)
if err != nil {
return nil, err
}
return r, nil
}
// FormatArticleRequest returns http.Request ready to do() and get an article
func (dtc DevtoClient) FormatArticleRequest(i int32) (r *http.Request, err error) {
URL := fmt.Sprintf(dtc.DevtoAPIURL+"articles/%d", i)
r, err = http.NewRequest(http.MethodGet, URL, nil)
if err != nil {
return nil, err
}
return r, nil
}
This time around I am experimenting with sync.waitGroup
. WaitGroups allow us to kick off a series of Goroutines and wait for them to finish before moving on with the code. We'll see further on in the code when getArticle()
executes as a Goroutine. It is what actually gets the article from the API and writes it to disk. This way we're grabbing one set of 30 article ids. As we parse those we begin getting the articles. Once we've received them all only then do we move on to the next set.
func getArticle(dtc *DevtoClient, i int32, wg *sync.WaitGroup) {
defer wg.Done()
r, err := dtc.FormatArticleRequest(i)
if err != nil {
panic(err)
}
resp, err := dtc.Client.Do(r)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
fileName := fmt.Sprintf("%d.json", i)
ioutil.WriteFile("./out/"+fileName, body, 0666)
}
main()
is straightforward enough. We create our client, using http.DefaultClient
. We've provided the ability to use an alternate configuration if we need it in the future. doit
and c
will be controlling our for
loop and the main body of the program.
func main() {
dtc := New("https://dev.to/api/", nil)
doit := true
c := 1
In each run, through the loop, we get a single page of articles. We then set up our WaitGroup
and our articles
variable. Once we have unmarshalled the articles JSON we get the length of that array. That length tells WaitGroup
how many "times" to wait. Note that we are calling defer wg.Done()
as the first line in the getArticle()
. This subtracts one from the WaitGroup
total allowing us to move on when finished. The current Dev.to article API returns an empty array, []
when there is no data for a page. We check to see if we have that as a response and stop if so.
for doit {
req, err := dtc.FormatPagedRequest("page", fmt.Sprintf("%d", c))
if err != nil {
panic(err)
}
resp, err := dtc.Client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
var wg sync.WaitGroup
var articles Articles
json.Unmarshal(body, &articles)
wg.Add(len(articles))
for i := range articles {
go getArticle(dtc, articles[i].ID, &wg)
}
wg.Wait()
if string(body) != "[]" {
c++
continue
}
doit = false
}
}
Wrapping Up
There we go, the first "sidebar"! I'll probably be adding the code for this into the main "Attempting To Learn Go" repo over on GitHub. Now that I think about it I still need to post last weeks over there!
Have you done any API work in Go? Anything you would do a bit differently? Let me know in the comments below. Consructive comments are always welcome!
You can find the code for this and most of the other Attempting to Learn Go posts in the repo on GitHub.
shindakun / atlg
Source repo for the "Attempting to Learn Go" posts I've been putting up over on dev.to
Attempting to Learn Go
Here you can find the code I've been writing for my Attempting to Learn Go posts that I've been writing and posting over on Dev.to.
Post Index
Enjoy this post? |
---|
How about buying me a coffee? |
Top comments (8)
Hey there!
I'll point out a couple things, firstly, in the following snippet:
You are returning a
*http.Request
and anerror
which means you can simplify it like this:Later on in your worker, you are forgetting to check the errors on
json.Unmarshal(body, &articles)
Other than that, good job on parameterizing the HTTP client instance. Were I to use your API client, I would certainly love to set my own timeouts etc. so this is the right way to go!
Good job!
Thanks for the comment! Ungh! I keep forgetting that I can shorten the returns when I do it that way. One of these days! Good catch on the
json.Unmarshal
not sure how that snuck through, I'll make sure to fix that up before putting the code on GitHub.Cool, another post about the Dev.to API! 🙂
Correct me if I am wrong though but that will go over every single page till it runs out of data right? If so, you might want to be careful actually running that as there would be hundreds of pages and you're grabbing the contents for each article on every page. Maybe chuck a "max articles" count on there somewhere? Or maybe limit it to a specific tag?
The articles API endpoint looks to only return 30 articles per page and seems to cut off around ~460 pages or so. The idea is indeed to grab all public posts as I need a good amount of text for something else I'm toying around with so, in this case, I have a rough idea of how many articles we'll be getting. However, we're only getting 30 articles worth of content in one go which should not cause any issues on the server side or locally. You definitely wouldn't want to be running this repeatedly in a short span or anything though.
Yeah as it is still about 13,000 requests - if that ran often, it could easily hammer the site.
The articles endpoint is cached as far as I know so, as is the code shouldn't cause any significant load on Varnish. Heh. But yeah, imagine trying to simply download the articles one at a time,
api/articles/1
, that what like ~90000 hits. Yuck.Is there a documentation somewhere of the API , I can't seem to find it anywhere. I want to retrieve my articles.
There is no documentation as of yet, I pulled the details out of the code itself on GitHub. If all you need is a dump of your own articles you can request an emailed export at the bottom of the settings page, dev.to/settings/misc.