binyam

Posted on Jul 29 • Originally published at binyam.io

Cost-Tracking and Model-Spend Monitoring with LiteLLM

#ai #llm #sre #grafana

As AI models become more powerful and widely used, managing costs is crucial—especially when working with multiple LLM providers like OpenAI, Anthropic, or Mistral. Without proper tracking, expenses can spiral out of control.

Enter LiteLLM, a lightweight library that standardizes interactions with various LLM APIs while offering built-in cost-tracking features. In this post, we'll explore how to implement cost monitoring and spend analytics to keep your AI budget in check.

Why Track LLM Costs?

Large Language Models (LLMs) charge based on:

Tokens processed (input + output)
Model choice (GPT-4 Turbo vs. Claude Haiku)
API usage frequency

Without monitoring, you might:

Accidentally exceed budgets with high-volume requests.
Waste money on overpriced models for simple tasks.
Lack visibility into which projects or users consume the most resources.

Step 1: Setting Up LiteLLM for Cost-Tracking

LiteLLM provides a unified interface for multiple LLM providers and logs token usage + costs automatically.

Installation

pip install litellm

Basic Usage with Cost Tracking

from litellm import completion
import os

# Set API keys (e.g., OpenAI)
os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Explain AI in 1 sentence."}],
)

print(f"Response: {response.choices[0].message.content}")
print(f"Cost: ${response.cost:.4f}")  # LiteLLM calculates cost automatically!

Output

Response: AI is the simulation of human intelligence processes by machines.
Cost: $0.0001

Step 2: Monitoring Spend Across Teams & Projects

LiteLLM can log requests to SQL, BigQuery, or Prometheus for deeper analysis.

Logging to SQLite

from litellm import completion
from litellm.integrations.sql_logger import SQLLogger

# Initialize logger
sql_logger = SQLLogger(
    table_name="llm_logs",  # Logs token counts, costs, and timestamps
    db_path="./llm_spend.db"
)

response = completion(
    model="gpt-4",
    messages=[{"content": "Write a Python function for Fibonacci.", "role": "user"}],
    logger=sql_logger,
)

Now, query your database:

SELECT model, SUM(cost) as total_cost 
FROM llm_logs 
GROUP BY model;

Example Output

Model	Total Cost
gpt-3.5-turbo	$12.45
claude-3-haiku	$3.20

Step 3: Setting Budget Alerts

Prevent overspending by adding hard limits or Slack alerts.

Hard Budget Limit

from litellm import BudgetManager

budget_manager = BudgetManager(project="marketing-campaign", total_budget=100)

try:
    response = completion(
        model="gpt-4",
        messages=[{"role": "user", "content": "Generate 10 blog ideas"}],
        budget_manager=budget_manager,
    )
except Exception as e:
    print(f"Budget exceeded: {e}")

Slack Alerts

from litellm import alerting

alerting.slack_alert(
    webhook_url="your-slack-webhook",
    message="Warning: Project 'marketing-campaign' has spent 90% of its budget!"
)

Step 4: Optimizing Costs

Once you track spending, optimize with:

Model Switching: Use cheaper models (e.g., Haiku for simple tasks).
Caching: Cache frequent queries with Redis.
Batching: Combine multiple requests into one.

Example: Fallback to Cheaper Model

response = completion(
    model=["gpt-4", "gpt-3.5-turbo"],  # Fallback chain
    messages=[{"role": "user", "content": "Explain quantum computing."}],
)

Conclusion

With LiteLLM, you can:
✅ Track costs in real-time across providers.
✅ Log spending per team/project.
✅ Set budget limits and alerts.
✅ Optimize model usage for cost efficiency.

Start implementing today, and never get blindsided by an unexpected AI bill again!

What's your biggest cost challenge with LLMs? Let's discuss in the comments! 🚀

Further Reading:

DEV Community