DEV Community

Peli de Halleux
Peli de Halleux

Posted on

Make Your Docs Friendly To LLMs

This article shows how to modify your Astra Starlight documentation site so that it renders each markdown page in pain text at a known route (e.g. adding .md to each page).

HTML to text is messy, lossy

Why would you go through the trouble of providing a clean, original, copy of your markdown? Because it's much easier to download raw text than try to reconstruct it from a HTML page.

Here's an example to illustrate what tools how to deal with, with the GenAIScript tutorial page

You really want LLMs web crawler to ingest your documentation cleanly, so that RAG works better, and it'll make GenAI prompts downstream work even better.

llms.txt

This idea comes from llms.txt. llms.txt proposes to provide a new well known file (./llms.txt) to tell LLMs how to crawl the site. It also proposes to render documentation pages as markdown/plain text at a known route (<doc path>.md) so that LLM crawlers can consume the clean, original markdown contents of your docs.

Here are some examples:

Astro Starlight

Astro Starlight is a popular documentation system.

The implementation of this feature is suprisingly short and the Astro #starlight Discord was amazing at actually solving this request.

  • add this file in pages/[...entry].md.ts
// test.ts
/**
 * Add a .md route that return the raw markdown content of the page.
 * This is useful for markdown pages; heavy mdx pages will need more work.
 */
import type { APIRoute } from "astro"
import type { InferGetStaticPropsType, GetStaticPaths } from "astro"
import { getCollection } from "astro:content"
export const getStaticPaths = (async () => {
    const entries = await getCollection("docs")
    return entries.map((entry) => ({
        params: { entry: entry.slug },
        props: { entry },
    }))
}) satisfies GetStaticPaths

type Props = InferGetStaticPropsType<typeof getStaticPaths>
export const GET: APIRoute<Props> = (context) => {
    return new Response(context.props.entry.body, {
        headers: {
            "content-type": "text/markdown",
        },
    })
}
Enter fullscreen mode Exit fullscreen mode

That's it, build again and enjoy easy rag-ing of your docs.

What about MDX?

Oopsy, all the cool stuff in your MDX component won't render nicely to text so this is still an unsolved iss.e

Top comments (0)