I was working on creating a proof of concept RabbitMQ data pipeline in Node where a web app would upload a large csv file to an Express server and the server would stream its content into the pipeline in JSON.
There are two possibilities of uploading a file
1) Send entire file
2) Stream file
Send Entire File
Send entire csv file from browser
fetch('http://localhost:3000/upload', { // Your POST endpoint
method: 'POST',
headers: {
'Content-Type': 'text/csv' // Not necessary if extension is csv
},
body: file // This is your file object
})
.then(success => console.log(success)) // Handle the success response object
.catch(error => console.log(error)) // Handle the error response object
The two important points int the server are
- How to handle request
- How to stream csv file content as json into pipeline
To get a stream of JSON objects from the csv file, create a stream and pipe that stream into fast-csv
.
The resulting code
const app = require('express')()
const textBodyParser = require('body-parser').text
const csv = require('fast-csv')
const { Readable } = require('stream')
// Handle very large file
app.use(text({ type: 'text/csv', limit: '500mb' }))
app.post('/upload', (req, res) => {
const content = Readable.from(req.body)
content
.pipe(csv.parse({ headers: true }))
.on('data', (data) => {
console.log(data) // Handle JSON object
})
res.sendStatus(200)
})
A lot of the tutorials suggest that one use express-fileupload
.It doesn't work if the csv file is not streamed.
Stream File
Stream csv file from browser
// Important that file is sent as FormData
const data = new FormData()
data.append('file', file)
fetch('http://localhost:3000/upload', {
method: 'POST',
body: data,
})
.then((success) => console.log(success)) // Handle the success response object
.catch((error) => console.log(error)) // Handle the error response object => console.log(error)) // Handle the error response object
For the server to handle the stream, the HTTP request must have the header Content-Type: multipart/form-data; boundary=aBoundaryString
, more info found here.
By sending the file as form data we can avoid having to specify this header.The browser will take care of it.
Use busboy
to get the file stream and pipe that to fast-csv
to get a stream of JSON objects.
The resulting code
app.post('/upload', (req, res) => {
const busboy = new Busboy({ headers: req.headers })
// Busboy gives us a lot information regarding the file
busboy.on('file', (__, file) => {
file.pipe(csv.parse({ headers: true })).on('data', (row) => {
// Handle data here. Row is a csv row in JSON
console.log('Row in JSON', row)
})
file.on('end', function () {
// Handle end case here
console.log('file ended')
})
})
busboy.on('finish', function () {
res.writeHead(303, { Connection: 'close', Location: '/' })
res.end()
})
req.pipe(busboy)
})
Top comments (0)