I started an open-source project that helps A/B test LLM-based apps: LangBear. It'll help you manage prompts and few-shot examples, run A/B tests, get user feedback, and see results.
As long as I cannot provide all at once, I want to know what developers need the most.
If you conduct A/B testing to enhance the quality of LLM responses, what aspect would you be most interested in testing? e.g., Prompts, Vector stores, Foundation models, or Agents
Top comments (1)
hey Sungwon! cool stuff, maybe this would be interesting for you and LangBear 100.builders?