As AI models become more powerful and widely used, managing costs is crucial—especially when working with multiple LLM providers like OpenAI, Anthropic, or Mistral. Without proper tracking, expenses can spiral out of control.
Enter LiteLLM, a lightweight library that standardizes interactions with various LLM APIs while offering built-in cost-tracking features. In this post, we'll explore how to implement cost monitoring and spend analytics to keep your AI budget in check.
Why Track LLM Costs?
Large Language Models (LLMs) charge based on:
- Tokens processed (input + output)
- Model choice (GPT-4 Turbo vs. Claude Haiku)
- API usage frequency
Without monitoring, you might:
- Accidentally exceed budgets with high-volume requests.
- Waste money on overpriced models for simple tasks.
- Lack visibility into which projects or users consume the most resources.
Step 1: Setting Up LiteLLM for Cost-Tracking
LiteLLM provides a unified interface for multiple LLM providers and logs token usage + costs automatically.
Installation
pip install litellm
Basic Usage with Cost Tracking
from litellm import completion
import os
# Set API keys (e.g., OpenAI)
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Explain AI in 1 sentence."}],
)
print(f"Response: {response.choices[0].message.content}")
print(f"Cost: ${response.cost:.4f}") # LiteLLM calculates cost automatically!
Output
Response: AI is the simulation of human intelligence processes by machines.
Cost: $0.0001
Step 2: Monitoring Spend Across Teams & Projects
LiteLLM can log requests to SQL, BigQuery, or Prometheus for deeper analysis.
Logging to SQLite
from litellm import completion
from litellm.integrations.sql_logger import SQLLogger
# Initialize logger
sql_logger = SQLLogger(
table_name="llm_logs", # Logs token counts, costs, and timestamps
db_path="./llm_spend.db"
)
response = completion(
model="gpt-4",
messages=[{"content": "Write a Python function for Fibonacci.", "role": "user"}],
logger=sql_logger,
)
Now, query your database:
SELECT model, SUM(cost) as total_cost
FROM llm_logs
GROUP BY model;
Example Output
Model | Total Cost |
---|---|
gpt-3.5-turbo | $12.45 |
claude-3-haiku | $3.20 |
Step 3: Setting Budget Alerts
Prevent overspending by adding hard limits or Slack alerts.
Hard Budget Limit
from litellm import BudgetManager
budget_manager = BudgetManager(project="marketing-campaign", total_budget=100)
try:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Generate 10 blog ideas"}],
budget_manager=budget_manager,
)
except Exception as e:
print(f"Budget exceeded: {e}")
Slack Alerts
from litellm import alerting
alerting.slack_alert(
webhook_url="your-slack-webhook",
message="Warning: Project 'marketing-campaign' has spent 90% of its budget!"
)
Step 4: Optimizing Costs
Once you track spending, optimize with:
- Model Switching: Use cheaper models (e.g., Haiku for simple tasks).
-
Caching: Cache frequent queries with
Redis
. - Batching: Combine multiple requests into one.
Example: Fallback to Cheaper Model
response = completion(
model=["gpt-4", "gpt-3.5-turbo"], # Fallback chain
messages=[{"role": "user", "content": "Explain quantum computing."}],
)
Conclusion
With LiteLLM, you can:
✅ Track costs in real-time across providers.
✅ Log spending per team/project.
✅ Set budget limits and alerts.
✅ Optimize model usage for cost efficiency.
Start implementing today, and never get blindsided by an unexpected AI bill again!
What's your biggest cost challenge with LLMs? Let's discuss in the comments! 🚀
Further Reading:
Top comments (0)