Token Counting for OpenAI API Requests
Problem Statement
OpenAI models have fixed context windows (e.g., 4096 tokens for GPT-3.5 Turbo). When generating text, you need to set the max_tokens
parameter to ensure the combined prompt and response don't exceed this limit. The challenge is accurately counting prompt tokens before sending API requests to calculate the correct max_tokens
value and avoid truncation or errors.
Recommended Solution
Use OpenAI's official tiktoken
Python library to count tokens. For enhanced accuracy with Chat Completions API messages, use OpenAI's helper function that accounts for message formatting tokens.
Installation
pip install --upgrade tiktoken openai
Basic Token Counting
For simple string prompts:
import tiktoken
def count_tokens(text: str, model: str) -> int:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Example usage
text = "Hello world, let's test tiktoken."
model = "gpt-3.5-turbo"
print(count_tokens(text, model)) # Output: 9
Token Counting for Chat Messages
For Chat Completions API messages (with roles/metadata):
import tiktoken
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
# Map models to their encodings and token allocation rules
model_data = {
"gpt-3.5-turbo": {"token_count": 3, "name_tokens": 1},
"gpt-4": {"token_count": 3, "name_tokens": 1},
"gpt-3.5-turbo-0301": {"token_count": 4, "name_tokens": -1}
}
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
# Handle model variants (e.g., gpt-3.5-turbo, gpt-4)
if "gpt-3.5-turbo" in model:
model_to_use = "gpt-3.5-turbo"
elif "gpt-4" in model:
model_to_use = "gpt-4"
else:
raise ValueError(f"Unsupported model: {model}")
tokens_per_message = model_data[model_to_use]["token_count"]
tokens_per_name = model_data[model_to_use]["name_tokens"]
total_tokens = 0
for message in messages:
total_tokens += tokens_per_message
for key, value in message.items():
total_tokens += len(encoding.encode(value))
if key == "name":
total_tokens += tokens_per_name
total_tokens += 3 # Additional tokens for assistant priming
return total_tokens
# Example usage
messages = [
{"role": "system", "content": "Translate corporate jargon to plain English."},
{"role": "user", "content": "Optimize cross-platform engagement channels."}
]
print(num_tokens_from_messages(messages, "gpt-3.5-turbo")) # Output: 24
Setting max_tokens
def calculate_max_tokens(messages, model):
context_lengths = {
"gpt-3.5-turbo": 4096,
"gpt-4": 8192,
"gpt-4o": 128000
}
prompt_tokens = num_tokens_from_messages(messages, model)
return context_lengths[model] - prompt_tokens
# Example of setting max_tokens in API request
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=calculate_max_tokens(messages, "gpt-3.5-turbo")
)
Alternative Libraries
For non-Python environments:
- JavaScript:
@dqbd/tiktoken
- C#//.NET:
SharpToken
- Java:
jtokkit
- PHP:
GPT-3-Encoder-PHP
Important Considerations
WARNING
Token counting accuracy depends on the exact model version. Always verify against the API's reported usage when possible.
TIP
Model context sizes change between versions. Current defaults (June 2024):
- GPT-4o: 128,000 tokens
- GPT-4 Turbo: 128,000 tokens
- GPT-3.5 Turbo: 16,385 tokens
Best Practices
- Always count tokens server-side before API requests
- Maintain a 10-20% buffer within the context limit
- Test token counting with known examples
- Monitor API usage statistics for validation
Using these token counting methods ensures optimal use of OpenAI models while avoiding truncation or overage errors.