zenstok

Posted on Oct 6

How to Create and Download Files of Unlimited Size in node.js/NestJS

#webdev #node #javascript #nestjs

Hey, folks! Have you ever wondered if it's possible to send a file of unlimited size to your users, all while giving them real-time feedback during the download process? Well, today, we're diving into exactly that. Let’s explore how we can achieve this in NestJS using one of Node.js's most powerful features—Streams.

In this guide, I’m not just going to show you how to download files of unlimited size—I’ll also walk you through the magic of streams and how they really work behind the scenes.

The Scenario: A Manager's Report Download

Imagine a manager needs to download a dynamically generated report containing invoice data, based on a user-defined date range. The twist? We have no idea how big the report will be because the user sets the parameters. We need to stream this data efficiently without overwhelming our server’s memory. Here’s where chunked transfer encoding comes into play. By setting the Transfer-Encoding header to chunked (learn more here), we can send the data in chunks, eliminating the need for a predetermined file size.

But wait—how do we handle massive file sizes? The answer is simple: Node.js Streams.

Why Not Buffers?

At first, you might think we can handle this using file buffers. However, buffers are limited by the server’s memory. If you’re dealing with large files, buffers will quickly hit a ceiling. Instead, we turn to streams, which are the backbone of Node.js when it comes to data management.

Streams process data in small chunks, sending it out without holding everything in memory. This way, we can handle "infinite" files, or at least ones large enough to blow your mind.

What are streams?

Streams in Node.js are powerful objects that let us process batches of data without loading everything into memory. Instead of holding the entire dataset, streams process chunks of data and pass them along as they go. So, because they don’t hold all the data in memory, they can handle massive files. But where do these chunks go? They’re passed on to other streams!

There are two key types of streams you’ll be working with in Node.js: readable and writable streams.

Let’s break it down with an example:

If you’re reading a file’s contents, you’d use the fs.createReadStream method. This returns a readable stream object, which gives you chunks of the file’s data bit by bit.

But once you’ve read the data, how do you send it to the user as an HTTP response?

Well, here’s the cool part: in Node.js, the response object itself is a writable stream. What does that mean? It means you can take the data you’re reading from the readable stream and send it directly to the writable stream (the HTTP response) using a method called .pipe.

With just a few lines of code, the .pipe method lets you effortlessly stream file content directly to the client as an HTTP response—no need to worry about memory limits or large file sizes. Simple, right?

The Three Methods to Stream Large Files in NestJS

We’ll walk through three different ways to stream large files in NestJS, ranging from basic to more advanced, giving you both performance and flexibility.

The Easy Version: Node.js Out of the Box
The Performant Version: String-Based Stream Chunks
The Buffer Version: For When Precision Matters

1. The Easy Version

In this approach, we use Node.js's built-in interfaces to handle the heavy lifting for us. Here’s what the controller code looks like:

@Get('/simple')
@Header('Content-Disposition', `attachment; filename="bigJson.json"`)
@Header('Content-Type', 'application/json')
@Header('Transfer-Encoding', 'chunked')
getBigFileSimple(@Res() res: Response) {
  this.simpleService.getBigFile(res);
}

The @Header decorators set the file to be an attachment, specify the content type, and enable chunked transfer encoding. This allows us to send the file in parts without knowing its total size upfront.

In our service, we have the following method:

@Injectable()
export class SimpleService {
  constructor(private readonly jsonService: JsonService) {}

  getBigFile(res: Response) {
    return this.jsonService.createBigJsonFileStream().pipe(res);
  }
}

With just one line of code, we pipe the file stream to the response! This elegant solution leverages Node.js's stream piping, making it both simple and powerful.

Now, you might wonder—what exactly is this createBigJsonFileStream() method doing?

@Injectable()
export class JsonService {
  createBigJsonFileStream(rowsLength = 25000000): Readable {
    let currentRow = 1;
    rowsLength = this.getRandomLengthFromRange(rowsLength / 2, rowsLength);

    const stream = new Readable({
      read() {
        if (currentRow === 1) {
          this.push('[');
          this.push(`{"id":${currentRow},"name":"element${currentRow}"},`);
        } else if (currentRow === rowsLength) {
          this.push(`{"id":${currentRow},"name":"element${currentRow}"}`);
          this.push(']');
          this.push(null);
        } else {
          this.push(`{"id":${currentRow},"name":"element${currentRow}"},`);
        }
        currentRow++;
      },
    });

    return stream;
  }

  private getRandomLengthFromRange(min: number, max: number): number {
    return Math.floor(Math.random() * (max - min + 1) + min);
  }
}

In this case, we’re not serving a file from disk. Remember, the scenario we’re addressing is where a manager wants to download a report, choosing their own start and end dates. This means we need to dynamically generate the file based on the user's request. And how do we do that? You guessed it—with streams!

What’s happening here?

We’re creating a readable stream using Node.js’s Readable constructor (documentation here). We’re keeping things simple by focusing on the required read() method.

Here’s how it works: every time the stream has data to offer, the read() method is called. As long as we keep pushing data into the stream (and don’t push null), the stream will stay active and keep sending data.

In this example, we’ve set a rowsLength—the total number of rows our JSON file will have. (Note: We’re dynamically altering the rowsLength to simulate the manager selecting a start and end date, creating a random value to represent the report’s size.) Based on this, we push one row at a time into the stream. It’s that simple! We’re dynamically creating a huge JSON file, sending data chunk by chunk, and handling it all seamlessly without loading everything into memory.

2. The Performant Version: String-Based Chunks

In this approach, we fine-tune performance by manually controlling how the data is written to the response stream:

@Injectable()
export class PerformantService {
  constructor(private readonly jsonService: JsonService) {}

  getBigFile(res: Response) {
    const jsonStream = this.jsonService.createBigJsonFileStream();
    jsonStream.setEncoding('utf-8');
    let chunkSize = 0;
    let bigChunk = '';

    jsonStream.on('data', (chunk: string) => {
      bigChunk += chunk;
      chunkSize++;
      if (chunkSize === 1000) {
        const writable = res.write(bigChunk);
        if (!writable) {
          jsonStream.pause();
        }
        bigChunk = '';
        chunkSize = 0;
      }
    });

    res.on('drain', () => {
      jsonStream.resume();
    });

    jsonStream.on('error', (err) => {
      console.error(err);
      res.end();
    });

    jsonStream.on('end', () => {
      if (bigChunk.length) {
        res.write(bigChunk);
      }
      res.end();
    });
  }
}

So, what do we have here?

What do we have here? Well, the logic might look a bit complex, but that's because we’re handling the process of writing to the res writable stream manually. Why? We believe that our approach is faster than simply piping the stream.

So, how do we do it? First, we set the encoding of our JSON stream to UTF-8 using jsonStream.setEncoding('utf-8');. This ensures that instead of receiving default Buffer objects from the stream, we get data as UTF-8 encoded strings.

Next, we define two variables: chunkSize, initialized to 0, and bigChunk, an empty string. The cool thing about streams is that we can listen to events on them. For example, the data event gives us a data chunk every time the stream is read. But when is read called? It's triggered as soon as the readable stream is instantiated and continues until all the data is read—pretty cool, right? We know the stream has finished reading when we hit the end event.

Now, when data arrives from the stream, we want to be efficient. Instead of sending small chunks to the res stream, we batch 1,000 chunks together and send them as one large chunk to our HTTP response, which is itself a writable stream. But wait—why does the write method return a boolean? This introduces us to a crucial concept in Node.js streams: backpressure. As Node.js describes it, “backpressure occurs when data builds up behind a buffer during data transfer.” In simpler terms, it means the res stream is telling us, "Slow down! I can't keep up!"" The solution? We stop sending data! This is done by calling jsonStream.pause(), which halts the stream from reading further.

But what happens next? Now, we need to listen for another stream event—this time on the res writable stream. We need the stream to signal when it has recovered from the backpressure. This is where the "drain" event comes in. The "drain" event is emitted when it’s appropriate to resume writing data to the stream. So, once we catch this event, we simply resume reading from our big JSON stream!

Finally, when the JSON stream reaches the end event, we check if there’s any leftover data in bigChunk and write it to the res stream. After that, we call res.end() to signal that the response is complete.

And just like that, we’ve manually controlled how the data flows through the streams, handled backpressure, and delivered a dynamic file as a response!

3. The Buffer Version: Chunking with Buffers

For a more complex scenario, say we want to send large buffer chunks. Here's how we can handle it:

@Injectable()
export class BufferService {
  constructor(private readonly jsonService: JsonService) {}

  getBigFile(res: Response) {
    const jsonStream = this.jsonService.createBigJsonFileStream();

    const sendingChunkSize = 50 * 1024 * 1024; // 50 MiBs, Make sure to set the sending chunk size larger than the chunk buffer size to avoid unnecessary iterations in the while loop.
    let currentChunkSize = 0;
    let sendingChunk = Buffer.alloc(sendingChunkSize);

    jsonStream.on('data', (chunk: Buffer) => {
      while (chunk.byteLength > 0) {
        const availableSpace = sendingChunkSize - currentChunkSize;

        if (chunk.byteLength <= availableSpace) {
          // If the chunk fits within the remaining space
          chunk.copy(sendingChunk, currentChunkSize);
          currentChunkSize += chunk.byteLength;
          break;
        }

        // Fill the remaining space in the sendingChunk
        chunk.subarray(0, availableSpace).copy(sendingChunk, currentChunkSize);
        currentChunkSize += availableSpace;

        // Send the filled buffer
        const writable = res.write(sendingChunk);

        if (!writable) {
          jsonStream.pause();
        }

        // Reset for the next chunk
        sendingChunk = Buffer.alloc(sendingChunkSize);
        currentChunkSize = 0;
        chunk = chunk.subarray(availableSpace); // Process the rest of the chunk
      }
    });

    res.on('drain', () => {
      jsonStream.resume(); // Resume when writable
    });

    jsonStream.on('end', () => {
      // Send any remaining data that hasn't been flushed
      if (currentChunkSize > 0) {
        res.write(sendingChunk.subarray(0, currentChunkSize));
      }
      res.end();
    });

    jsonStream.on('error', (err) => {
      console.error(err); // Log error for debugging
      res.end();
    });
  }
}

Oh no, what’s going on here? It looks so complicated!

Well, that’s because it is a bit tricky! But here’s the backstory: we received a request from our boss to send file downloads in chunks of 50 MiB, because he has blazing-fast internet, he wants to take full advantage of it.

So, what’s the plan?

We start by setting our sending chunk size to 50 MiB. Next, we initialize a currentChunkSize variable to 0 and create a new buffer object, allocating exactly 50 MiB of memory.

Now, here’s where it gets interesting. Since we’re using fixed-size buffers, we need to ensure that all the JSON data fits perfectly within the allocated space. To do this, we calculate the available space in the current sending chunk. As long as the JSON chunk fits within the remaining space, we copy it to the sending chunk and update the currentChunkSize:

chunk.copy(sendingChunk, currentChunkSize);
currentChunkSize += chunk.byteLength;

The copy() method copies the bytes from the chunk buffer into sendingChunk, starting at the currentChunkSize (which marks the first free byte in the buffer).

What if the available space is larger than the JSON chunk?

This is where things get a bit messy! In this case, we need to extract exactly the right number of bytes from the JSON chunk to fill the available space in the sending chunk. Once it’s perfectly full, we send the chunk to the HTTP response.

After that, we reset the sendingChunk to a fresh buffer and set currentChunkSize back to 0.

But what about the rest of the JSON chunk?

Good question! Any remaining data in the JSON chunk is processed in the next iteration of our loop. The loop ensures that this remaining data is added to the new sendingChunk. This loop continues as long as the JSON chunk is larger than the available space in the buffer.

Technically, we could set the buffer size to send chunks as small as 1 byte to the response stream, but that would be incredibly inefficient. So, to avoid unnecessary iterations in the while loop, it’s important to set a reasonable chunk size.

Demo

For the demo, let's git clone the project here. After cloning, install the dependencies using npm install, and then run the project with npm run start.

On the index page, you’ll find a tutorial that explains how to call each version in action, as shown in the screenshot below:

Conclusion

By leveraging the power of Node.js streams, we can deliver files of virtually any size without consuming large amounts of memory. Whether you're aiming for simplicity, performance, or handling large buffer sizes, these approaches have you covered.

I hope this guide helped you understand how to work with large file downloads in NestJS. Have any ideas or topics you'd like me to cover next? Drop a comment! And don’t forget to subscribe to my newsletter on rabbitbyte.club for more tips!