This article shows how to modify your Astra Starlight documentation site so that it renders each markdown page in pain text at a known route (e.g. adding .md
to each page).
HTML to text is messy, lossy
Why would you go through the trouble of providing a clean, original, copy of your markdown? Because it's much easier to download raw text than try to reconstruct it from a HTML page.
Here's an example to illustrate what tools how to deal with, with the GenAIScript tutorial page
You really want LLMs web crawler to ingest your documentation cleanly, so that RAG works better, and it'll make GenAI prompts downstream work even better.
llms.txt
This idea comes from llms.txt. llms.txt
proposes to provide a new well known file (./llms.txt
) to tell LLMs how to crawl the site. It also proposes to render documentation pages as markdown/plain text at a known route (<doc path>.md
) so that LLM crawlers can consume the clean, original markdown contents of your docs.
Here are some examples:
Astro Starlight
Astro Starlight is a popular documentation system.
The implementation of this feature is suprisingly short and the Astro #starlight Discord was amazing at actually solving this request.
- add this file in
pages/[...entry].md.ts
// test.ts
/**
* Add a .md route that return the raw markdown content of the page.
* This is useful for markdown pages; heavy mdx pages will need more work.
*/
import type { APIRoute } from "astro"
import type { InferGetStaticPropsType, GetStaticPaths } from "astro"
import { getCollection } from "astro:content"
export const getStaticPaths = (async () => {
const entries = await getCollection("docs")
return entries.map((entry) => ({
params: { entry: entry.slug },
props: { entry },
}))
}) satisfies GetStaticPaths
type Props = InferGetStaticPropsType<typeof getStaticPaths>
export const GET: APIRoute<Props> = (context) => {
return new Response(context.props.entry.body, {
headers: {
"content-type": "text/markdown",
},
})
}
That's it, build again and enjoy easy rag-ing of your docs.
What about MDX?
Oopsy, all the cool stuff in your MDX component won't render nicely to text so this is still an unsolved iss.e
Top comments (0)