This was built based on the Backend Project Idea 1 given in the article https://hackernoon.com/15-project-ideas-for-front-end-back-end-and-full-stack-web-developers-j06k35pi
Find project repository at https://github.com/topeomot2/simple-web-crawler-service
Requirements
- Simple web crawler service that takes a page URL and returns the HTML markup of that page.
- Only handles absolute urls.
GET /?url={page absolute url}
Host: localhost:3000
Response
status: 200 OK
content-type: json
body: {
data: "html Content"
}
GET /?url={wrong string}
Host: localhost:3000
Response
status: 400
text: 'send absolute url with protocol included'
Installation
npm install
npm start
Libraries used
Express
Personally, my go to web framework for Node.js apis.
Express actually lives up to the definition on its site. It is Fast, unopinionated, minimalist Framework for Node.js. The unopinionated and minimalist can be a blessing or a curse, depending on what your preferences are.
It means you need to make decisions on what tools you want to use. Express makes no assumptions for you.
But no worries, with the express-generator, spinning up a basic api is simple.
The code below creates a project with express and some folder and setup opinions. The --no-view means we are not using any view template engines.
npx express-generator
express --no-view simple-web-crawler-service
Find out more at https://expressjs.com/en/starter/generator.html
Validator
A library of string validators and sanitizers. Chose this because of the simple isURL function it has which helps us check if the url query parameter is an absolute url with the protocol set.
Never use external inputs to your api without validation and sanitization
if (!req.query || !req.query.url
|| !validator.isURL(req.query.url,
{ require_host: true, require_protocol: true })) {
return res.status(400).send('send absolute url with protocol included')
}
Axios
A very simple promise based HTTP Client. If you know how to use Promises, using Axios will be a breeze. This does all the work of retrieving the content of a page by making a GET request to the url.
const axios = require('axios')
async function getContent(url) {
try {
let response = await axios(url)
return response.data
} catch (error) {
return null
}
}
Jest
Jest is a JavaScript Testing Framework. It works for any form of JavaScript code or anything that compiles to JavaScript i.e TypeScript. It is simple and I would recommend it anytime. It is the only testing framework I use in JavaScript.
- install as a devDependency
npm install jest --save-dev
- add the following line in the scripts section of package.json.
"test": "jest --coverage --watchAll"
--coverage : you want jest to create a coverage report
--watchAll means you want continuous checking of code change and rerunning tests. (This is good for TDD, but can be removed if not desired)
The test can be found in the tests/app.test.js file.
Supertest
The most important tests you can write for apis (and software in general) are integration tests. For apis, "route tests" are the integration tests. Supertest
Route tests are tests that actually call endpoints in the apis and tests for the happy path and sad paths. Supertest is the package for write route test. Supertest is built on superagent, which is an HTTP request library. So your Express app is actually called like if a user was making a request
Happy path is when you call the api correctly with all the expected parameters, you should the correct successful response. Below is a test that checks the response for the happy path.
The sad path is when you call the api incorrectly and you expect api to respond with the agreed response.
But something very important to note, calling apis this way means that all dependencies will be called. Dependencies include things like Databases, 3rd party apis etc. There are 2 ways practically to handle dependencies
Mocking: This is the process of substituting the response from 3rd dependencies so that they are not actually called during the test. This is the approach used here. Instead of using the crawler.js module to call the url, I used Jest to Mock the module and return a response. This makes the test faster and more predictable.
Containerization: this is good for database dependent apis, instead of mocking the database, you can just spin up a container for that database, seed it (fill it with test data) and then run your test against it. This can also be used for other infrastructural dependencies that the pai depends on.
Note: You can also use Mocking for the situation described in the Containerization section. I would advise that database are encapsulated in a service/model and then you can then mock the service/model
This is the first of many project ideas, I want to get done. Most of them will be picked from project ideas, I find online. Please reach out with any advice, improvements or corrections you feel that is needed.
Top comments (1)
Thanks for informing about Validator, it was helpful. Would like to see more of your projects.