Skip to content

Token Counting for OpenAI API Requests

Problem Statement

OpenAI models have fixed context windows (e.g., 4096 tokens for GPT-3.5 Turbo). When generating text, you need to set the max_tokens parameter to ensure the combined prompt and response don't exceed this limit. The challenge is accurately counting prompt tokens before sending API requests to calculate the correct max_tokens value and avoid truncation or errors.

Use OpenAI's official tiktoken Python library to count tokens. For enhanced accuracy with Chat Completions API messages, use OpenAI's helper function that accounts for message formatting tokens.

Installation

bash
pip install --upgrade tiktoken openai

Basic Token Counting

For simple string prompts:

python
import tiktoken

def count_tokens(text: str, model: str) -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Example usage
text = "Hello world, let's test tiktoken."
model = "gpt-3.5-turbo"
print(count_tokens(text, model))  # Output: 9

Token Counting for Chat Messages

For Chat Completions API messages (with roles/metadata):

python
import tiktoken

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    # Map models to their encodings and token allocation rules
    model_data = {
        "gpt-3.5-turbo": {"token_count": 3, "name_tokens": 1},
        "gpt-4": {"token_count": 3, "name_tokens": 1},
        "gpt-3.5-turbo-0301": {"token_count": 4, "name_tokens": -1}
    }
    
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    
    # Handle model variants (e.g., gpt-3.5-turbo, gpt-4)
    if "gpt-3.5-turbo" in model:
        model_to_use = "gpt-3.5-turbo"
    elif "gpt-4" in model:
        model_to_use = "gpt-4"
    else:
        raise ValueError(f"Unsupported model: {model}")
    
    tokens_per_message = model_data[model_to_use]["token_count"]
    tokens_per_name = model_data[model_to_use]["name_tokens"]
    total_tokens = 0
    
    for message in messages:
        total_tokens += tokens_per_message
        for key, value in message.items():
            total_tokens += len(encoding.encode(value))
            if key == "name":
                total_tokens += tokens_per_name
    
    total_tokens += 3  # Additional tokens for assistant priming
    return total_tokens

# Example usage
messages = [
    {"role": "system", "content": "Translate corporate jargon to plain English."},
    {"role": "user", "content": "Optimize cross-platform engagement channels."}
]
print(num_tokens_from_messages(messages, "gpt-3.5-turbo"))  # Output: 24

Setting max_tokens

python
def calculate_max_tokens(messages, model):
    context_lengths = {
        "gpt-3.5-turbo": 4096,
        "gpt-4": 8192,
        "gpt-4o": 128000
    }
    prompt_tokens = num_tokens_from_messages(messages, model)
    return context_lengths[model] - prompt_tokens

# Example of setting max_tokens in API request
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    max_tokens=calculate_max_tokens(messages, "gpt-3.5-turbo")
)

Alternative Libraries

For non-Python environments:

  • JavaScript: @dqbd/tiktoken
  • C#//.NET: SharpToken
  • Java: jtokkit
  • PHP: GPT-3-Encoder-PHP

Important Considerations

WARNING

Token counting accuracy depends on the exact model version. Always verify against the API's reported usage when possible.

TIP

Model context sizes change between versions. Current defaults (June 2024):

  • GPT-4o: 128,000 tokens
  • GPT-4 Turbo: 128,000 tokens
  • GPT-3.5 Turbo: 16,385 tokens

Best Practices

  1. Always count tokens server-side before API requests
  2. Maintain a 10-20% buffer within the context limit
  3. Test token counting with known examples
  4. Monitor API usage statistics for validation

Using these token counting methods ensures optimal use of OpenAI models while avoiding truncation or overage errors.