Langfuse Launch Week #2

Langfuse, the open-source LLM engineering platform, is excited to announce its second Launch Week, starting on Monday, November 18, 2024. This week-long event will feature daily platform updates, culminating in a Product Hunt launch on Friday and a Virtual Town Hall on Wednesday.

Focus of Launch Week

Langfuse's second Launch Week is all about supporting the next generation of AI models and integrating the platform more deeply into developer workflows. The updates aim to deliver end-to-end prompt engineering tools specifically designed for product teams, enhancing the robustness and versatility of AI applications.

🔻 Day 0: Prompt Management for Vercel AI SDK

On the first day, Langfuse introduced native integration of its Prompt Management with the Vercel AI SDK. This integration enables developers to:

Version and release prompts directly in Langfuse.
Utilize prompts via the Vercel AI SDK.
Seamlessly monitor metrics like latency, costs, and usage.

This update answers critical questions for developers:

Which prompt version caused a specific bug?
What’s the cost and latency impact of each prompt version?
Which prompt versions are most used?

🆚 Day 1: Dataset Experiment Run Comparison View

The second day brought a new comparison view for dataset experiment runs within Langfuse Datasets. This powerful feature allows teams to:

Analyze multiple experiment runs side-by-side.
Compare application performance across test dataset experiments.
Explore metrics like latency and costs.
Drill down into individual dataset items.

This enhancement is particularly valuable for testing different prompts, models, or application configurations, making it a must-have tool for teams working on AI-powered products.

⚖️ Day 2: LLM-as-a-Judge Evaluations for Datasets

Day 2 of Launch Week 2 brings managed LLM-as-a-judge evaluators to dataset experiments. Assign evaluators to your datasets and they will automatically run on new experiment runs, scoring your outputs based on your evaluation criteria.

You can run any LLM-as-a-judge prompt, Langfuse comes with templates for the following evaluation criteria: Hallucination, Helpfulness, Relevance, Toxicity, Correctness, Contextrelevance, Contextcorrectness, Conciseness.

Langfuse LLM-as-a-judge works with any LLM that supports tool/function calling that is accessible via the following APIs: OpenAI, Azure OpenAI, Anthropic, AWS Bedrock. Via LLM gateways such as LiteLLM, virtually any popular LLM can be used via the OpenAI connector.

Upcoming Events

📆 Virtual Town Hall

Join Langfuse for a live Virtual Town Hall on Wednesday, November 20, 2024, at 10 am PT / 7 pm CET. This session will include:

Live demonstrations of the new features.
Insights into integrating these updates into workflows.
A sneak peek into the future of Langfuse, including the upcoming V3 release.

🅿️ Product Hunt Launch

Langfuse will make its third appearance on Product Hunt on Friday, November 22, 2024, showcasing the highlights of Launch Week and engaging with the tech community.

Stay Updated

Stay connected with Langfuse during Launch Week:

🌟 Star the project on GitHub to show your support.
Follow Langfuse on Twitter and LinkedIn for updates.
Subscribe to the Langfuse mailing list to receive daily updates throughout the week.

Learn more: Langfuse Blog

DEV Community

Langfuse Launch Week #2

Focus of Launch Week

🔻 Day 0: Prompt Management for Vercel AI SDK

🆚 Day 1: Dataset Experiment Run Comparison View

⚖️ Day 2: LLM-as-a-Judge Evaluations for Datasets

Upcoming Events

📆 Virtual Town Hall

🅿️ Product Hunt Launch

Stay Updated

Top comments (0)

Read next

Writing Integration And Unit Tests for a Simple Fast API application using Pytest

In the age of AI, sharpen your skills

Founding Full Stack Engineer

First steps of using Bee Agent Framework and watsonx.ai!