elvisyao007

AI implementation engineer. I make LLM systems reliable, measurable, and production-ready — eval-driven, secure on-prem RAG/agents. Building in public from one RTX 5090.

Joined on Jun 2, 2026

elvisyao007

Jun 14

A Chinese 8B model beat the Western 8B models at Japanese RAG. I still wouldn't put it in the default deployment — and that distinction is the point.

#llm #rag #machinelearning #japan

4 min read

Want to connect with elvisyao007?

Create an account to connect with elvisyao007. You can also sign in below to proceed if you already have an account.

Create Account

Already have an account? Sign in

elvisyao007

Jun 13

Which Chinese open-source parser is better for Japanese RAG? It's a crossover — BM25 says DeepDoc, dense says MinerU

#rag #llm #machinelearning #japan

4 min read

elvisyao007

Jun 13

Structured parsing helps dense retrieval more than it helps BM25 — measured on Japanese docs, and the gap doubled

#rag #llm #machinelearning #japan

4 min read

elvisyao007

Jun 12

Half of agent evaluation needs no LLM judge — and it's the half that catches the failures that actually hurt

#ai #llm #agents #machinelearning

5 min read

elvisyao007

Jun 12

Does a Chinese document parser actually work on Japanese PDFs? I measured it — and the answer is 'it depends on the font path'

#rag #llm #japan #machinelearning

5 min read

elvisyao007

Jun 11

My local-LLM benchmark gave every model a perfect score. That was the most useful failure of the project.

#llm #machinelearning #ai #japan

5 min read

elvisyao007

Jun 11

I built a self-hosted LLM stack that grades itself — audit trail, per-user auth, and a built-in acceptance test

#llm #selfhosted #devops #ai

6 min read

elvisyao007

Jun 8

Your RAG dashboard can hide a failing retriever: detecting silent regression

#rag #evaluation #llm #opensource

3 min read

elvisyao007

Jun 8

I built a tiny tool to catch the metric trap from my last post

#rag #evaluation #python #opensource

1 min read

elvisyao007

Jun 8

The 33 'grounded-but-wrong' answers were a metric artifact: how ID-based context recall lies on multi-answer datasets

#rag #evaluation #llm #retrieval

4 min read

elvisyao007

Jun 7

faithfulness spread = 0.000: what self-grading RAG eval actually looks like

#rag #llm #ai #machinelearning

4 min read

elvisyao007

Jun 7

My RAG's faithfulness was 0.67. 1 in 3 answers were still wrong.

#rag #llm #ai #machinelearning

6 min read