DEV Community

Cover image for Large Sitemaps: 16M Pages Challenge
Job
Job

Posted on

Large Sitemaps: 16M Pages Challenge

Colorify.Rocks is a project that generates color palettes and provides a page for every HEX color. Each page includes information like:

Sitemap

  • Color Values
  • Complementary Colors
  • Analogous Colors
  • Triadic Colors
  • Shades and Tints
  • Color Blindness Combinations

With a total of 16 million pages, the challenge is managing such a large number efficiently. While all pages are useful, not all colors are equally important. The goal is to optimize the crawl budget so that search engines focus on the most relevant pages.

Including all 16 million pages in sitemaps would require over 300 files (since each sitemap can hold up to 50,000 URLs).

This approach presents two major issues:

  1. Inefficient Crawling: Search engines might waste time crawling less relevant colors.
  2. Complex Management: Maintaining hundreds of sitemap files is time-consuming and prone to errors.

An optimized solution was necessary to prioritize important pages while still ensuring the rest remain discoverable.

By focusing on Material Design-like colors, I prioritized around 25,000 pages representing a diverse and balanced subset of the color spectrum. For example, shades like #F44336 (a vibrant red) highlight the importance of visually impactful colors frequently used in design systems. These carefully selected pages ensure that the sitemap emphasizes relevance without overwhelming search engines.

Here’s why I didn’t choose other methods:

  • Web Safe Colors: Only 216 colors, which is far too few for modern use.
  • Named CSS Colors: About 16 colors, which doesn’t even begin to cover the spectrum.
  • Pure/Near-Pure Colors: Around 2,000 colors, but these miss subtle and popular shades.
  • Popular Color Ranges: While ~2.1 million colors cover a broader set, this is still a large number.

Approaches
By using Material Design-like colors, I created a smaller, more manageable sitemap that still represents a diverse range of colors.

The sitemap doesn’t need to include every page. Each color page links to complementary, analogous, and related colors, allowing search engines to discover the rest of the site naturally.

This way, the sitemap becomes a prioritization tool rather than a full index. Google can focus on the most important pages first, while still crawling the others over time.

Technical Considerations

  • Including all pages would create more than 300 sitemaps, which is unnecessary for most projects.
  • A smaller sitemap with ~25,000 pages makes it easier for search engines to focus on high-value content.
  • Monitoring crawl reports and indexing status helps refine the approach over time.

SEO often requires testing different strategies. This sitemap setup is part of an experiment to see how search engines handle a prioritization-first approach on a large site.

Over time, I’ll adjust based on how Google indexes and crawls the site. It’s about finding the best way to balance efficiency and discoverability.

This approach is just the first step, and I’ll keep refining it as I gather more data. If you’ve dealt with similar challenges, I’d love to hear your thoughts!

Top comments (0)