DEV Community

Cover image for MongoDB TTL and Disk Storage
Franck Pachot
Franck Pachot

Posted on • Edited on

5 1 1 1 1

MongoDB TTL and Disk Storage

In a previous blog post, I explained how MongoDB TTL indexes work and their optimization to avoid fragmentation during scans. However, I didn’t cover the details of on-disk storage. A recent Reddit question is the occasion to explore this aspect further.

Reproducible example

Here is a small program that inserts documents in a loop, with a timestamp and some random text:

// random string to be not too friendly with compression
function getRandomString(length) {
  const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  let result = '';
  const charactersLength = characters.length;
  for (let i = 0; i < length; i++) {
    result += characters.charAt(Math.floor(Math.random() * charactersLength));
  }
  return result;
}
// MongoDB loop for inserting documents with a random string
while (true) {
  const doc = { text: getRandomString(1000), ts: new Date() };
  db.ttl.insertOne(doc);
  insertedCount++;
}
Enter fullscreen mode Exit fullscreen mode

Before executing this, I ran a background function to display statistics every minute, including the number of records, memory usage, and disk size.

// Prints stats every minute
 let insertedCount = 0;
 let maxThroughput = 0;
 let maxCollectionCount = 0;
 let maxCollectionSize = 0;
 let maxStorageSize = 0;
 let maxTotalIndexSize = 0;
 setInterval(async () => {
      const stats = await db.ttl.stats();
      const throughput = insertedCount / 10; // assuming measure over 10 seconds
      const collectionCount = stats.count;
      const collectionSizeMB = stats.size / 1024 / 1024;
      const storageSizeMB = stats.storageSize / 1024 / 1024;
      const totalIndexSizeMB = stats.totalIndexSize / 1024 / 1024;
      maxThroughput = Math.max(maxThroughput, throughput);
      maxCollectionCount = Math.max(maxCollectionCount, collectionCount);
      maxCollectionSize = Math.max(maxCollectionSize, collectionSizeMB);
      maxStorageSize = Math.max(maxStorageSize, storageSizeMB);
      maxTotalIndexSize = Math.max(maxTotalIndexSize, totalIndexSizeMB);
      console.log(`Collection Name: ${stats.ns}
   Throughput:        ${throughput.toFixed(0).padStart(10)} docs/min (max: ${maxThroughput.toFixed(0)} docs/min)
   Collection Size:   ${collectionSizeMB.toFixed(0).padStart(10)} MB (max: ${maxCollectionSize.toFixed(0)} MB)
   Number of Records: ${collectionCount.toFixed(0).padStart(10)}     (max: ${maxCollectionCount.toFixed(0)} docs)
   Storage Size:      ${storageSizeMB.toFixed(0).padStart(10)}    MB (max: ${maxStorageSize.toFixed(0)} MB)
   Total Index Size:  ${totalIndexSizeMB.toFixed(0).padStart(10)} MB (max: ${maxTotalIndexSize.toFixed(0)} MB)`);
      insertedCount = 0;
 }, 60000); // every minute
Enter fullscreen mode Exit fullscreen mode

I created the collection with a TTL index, which automatically expires data older than five minutes:

// TTL expire after 5 minutes
db.ttl.drop();
db.ttl.createIndex({ ts: 1 }, { expireAfterSeconds: 300 });

Enter fullscreen mode Exit fullscreen mode

I let this running to see how the storage size evolves. Note that this was run in on MongoDB 7.0.16 (without the auto compact job that appeared in 8.0).

Output after 3 hours

The consistent insert rate, combined with TTL expiration, keeps the number of documents in the collection relatively stable. Deletions occur every minute, ensuring that the overall document count remains consistent.

Image description

I observe the same from the statistics I log every minute:

Image description

The storage size also remains constant, at 244MB for the table and 7MB for the indexes. The size of files has increased for the first 6 minutes and then remained constant:

Image description

This is sufficient to show that the deletion and insertion do not have a fragmentation effect that would require additional consideration. About 25% is marked as available for reuse and is effectively reused.

It is possible to force a compaction, to temporarily reclaim more space, but it is not necessary:

Collection Name: test.ttl
   Throughput:              3286 docs/min (max: 3327 docs/min)
   Collection Size:          170 MB (max: 198 MB)
   Number of Records:     171026     (max: 198699 docs)
   Storage Size:             244    MB (max: 244 MB)
   Total Index Size:           7 MB (max: 7 MB)

Collection Name: test.ttl
   Throughput:              3299 docs/min (max: 3327 docs/min)
   Collection Size:          170 MB (max: 198 MB)
   Number of Records:     170977     (max: 198699 docs)
   Storage Size:             244    MB (max: 244 MB)
   Total Index Size:           6 MB (max: 7 MB)

Collection Name: test.ttl
   Throughput:              3317 docs/min (max: 3327 docs/min)
   Collection Size:          170 MB (max: 198 MB)
   Number of Records:     170985     (max: 198699 docs)
   Storage Size:             244    MB (max: 244 MB)
   Total Index Size:           6 MB (max: 7 MB)

test> db.runCommand({ compact: 'ttl' });
{ bytesFreed: 49553408, ok: 1 }

Collection Name: test.ttl
   Throughput:              1244 docs/min (max: 3327 docs/min)
   Collection Size:          150 MB (max: 198 MB)
   Number of Records:     150165     (max: 198699 docs)
   Storage Size:             197    MB (max: 244 MB)
   Total Index Size:           6 MB (max: 7 MB)
Collection Name: test.ttl
   Throughput:              3272 docs/min (max: 3327 docs/min)
   Collection Size:          149 MB (max: 198 MB)
   Number of Records:     149553     (max: 198699 docs)
   Storage Size:             203    MB (max: 244 MB)
   Total Index Size:           6 MB (max: 7 MB)

Enter fullscreen mode Exit fullscreen mode

While this has reduced storage, it eventually returns to its normal volume. It's typical for a B-tree to maintain some free space, which helps to minimize frequent space allocations and reclaim.

Here is a focus when I've run manual compaction:

Image description

Conclusion

TTL deletion makes space available for reuse instead of reclaiming it immediately, but this doesn't increase the fragmentation. This space is reused automatically to maintain a total size proportional to the document count, with a constant free space of 25% in my case, to minimize frequent allocations typical of B-Tree implementations.
The TTL mechanism operates autonomously, requiring no manual compaction. If you have any doubt, monitor it. MongoDB offers statistics to compare logical and physical sizes at both the MongoDB layer and WiredTiger storage.

Top comments (4)

Collapse
 
programmerraja profile image
Boopathi

First of all thank you for the articel i have one doubt you said that storage will be reused but how about index does it will be also reused ?

Collapse
 
franckpachot profile image
Franck Pachot

Thanks. Great question. To get an idea I've run the same with an additional index on the text field:

test> db.ttl.createIndex({ text: 1  });
text_1
Enter fullscreen mode Exit fullscreen mode

It is larger than the table:

Collection Name: test.ttl
   Throughput:              3168 docs/min (max: 3230 docs/min)
   Collection Size:          182 MB (max: 191 MB)
   Number of Records:     182908     (max: 191774 docs)
   Storage Size:             239    MB (max: 240 MB)
   Total Index Size:         454 MB (max: 455 MB)
Enter fullscreen mode Exit fullscreen mode

but still it has reached its size quickly after 6 minutes and the deleted space was reused:
Image description

Collapse
 
franckpachot profile image
Franck Pachot

I also tested with descending indexes (because there was a problem in the past):

test> db.ttl.createIndex({ ts: -1 }, { expireAfterSeconds: 300 });
test> db.ttl.createIndex({ ts: -1 , text: 1  });
Enter fullscreen mode Exit fullscreen mode

Still looks good:

Collection Name: test.ttl
   Throughput:              3221 docs/min (max: 3265 docs/min)
   Collection Size:          166 MB (max: 194 MB)
   Number of Records:     166333     (max: 195073 docs)
   Storage Size:             233    MB (max: 238 MB)
   Total Index Size:         264 MB (max: 265 MB)
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
programmerraja profile image
Boopathi

Thank you so much for the detailed explanation, much appreciated.

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay