icon: LiNotebookTabs
Title: Tokens
✦ For every Large Language Models (LLMs), "tokens" play a crucial role. They are the smallest units of text that the model can understand and manipulate.
GPT-4
likely uses a different tokenization process compared to Gemini
from Google. A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens =~ 75 words). Note that this is useful for a rough estimation. There are tools from each model provider that can be used to more accurately count the number of tokens.
gpt-4-turbo
, gpt-4o
, and gemini 1.5 pro
onwards, many newer models can now process a significantly larger number of tokens, reducing the criticality for strict token counting.MODEL | Description (by the respective companies) | CONTEXT WINDOW |
---|---|---|
OpenAI Models overview of models | ||
gpt-4-0125-preview | The latest GPT-4 model. Returns a maximum of 4,096 output tokens. | 128,000 tokens |
gpt-3.5-turbo-0125 | The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats. Returns a maximum of 4,096 output tokens | 128,000 tokens |
Claude Models models overview | ||
claude-3-5-sonnet | Highest level of intelligence and capability (among Claude Modes). Returns a maximum of 8,192 output tokens | 200,000 tokens |
claude-3-opus | Powerful model for highly complex tasks. Top-level performance, intelligence, fluency, and understanding. Returns a maximum of 4,096 output tokens. | 200,000 tokens |
Google Gemini models overview | ||
gemini 1.5 flash | Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. | 1,000,000 tokens |
gemini 1.5 pro | Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once | 2,000,000 tokens |
You can use the tool below to understand how a piece of text might be tokenized by a language model, and the total count of tokens in that piece of text.
Note: This widget currently does not support gpt-4o
and gpt-4o-mini
✦ For many of the LLMs, the pricing is based on the number of tokens processed.
OpenAI
.
✦ Below are the pricing table for OpenAI's GPT models for reference:
Model | Pricing for Input Tokens | Pricing for Output Tokens |
---|---|---|
gpt-4o | $5.00 / 1M input tokens | $15.00 / 1M output tokens |
gpt-4o-mini | $0.150 / 1M input tokens | $0.600 / 1M output tokens |
gpt-4o-mini
and gpt-4o-mini-2024-07-18
??The name gpt-4o-mini
serves as a generic reference to the latest model in this class. gpt-4o-mini-2024-07-18
is the fully declared name of the specific version released on July 18, 2024.
This naming convention helps distinguish between different versions and updates of the model, ensuring clarity and precision when referring to a particular release.
For the training content in this Bootcamp, we can safely use the generic name gpt-4o-mini
which points to the latest model in our notebooks. For more info, visit Models - OpenAI API
The price is accurate at the time of writing.
Official Pricing Page: https://openai.com/pricing
We can use the code below to estimate the token counts in the prompt that we will send to LLM.
# This a simplifedfunction is for calculating the tokens given the "text"
# ⚠️ This is simplified implementation that should only be used for a rough estimation
import tiktoken
def count_tokens(text):
encoding = tiktoken.encoding_for_model('gpt-4o-mini')
return len(encoding.encode(text))>)
✦ While the above code is sufficient for approximating the token counts, if you need more accurate token counts on the prompt, please refer the code below:
We recommend to use this function for calculating the tokens in actual projects
Don't worry about understand this function line-by-line, it's a utility tool
encoding.encode(value)
in the last few lines of the codeimport tiktoken
def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):
"""Return the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
tokens_per_message = 3
tokens_per_name = 1
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3 # every reply is primed with <|start|>assistant<|message|>
return num_tokens
# For more details, See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
tokens_per_message = 3
and tokens_per_name = 1
The variable tokens_per_message
is set to 3 for certain models (including “gpt-3.5-turbo-0613”, “gpt-3.5-turbo-16k-0613”, “gpt-4-0314”, “gpt-4-32k-0314”, “gpt-4-0613”, “gpt-4-32k-0613”) because each message in these models is encoded with three special tokens: start, role, and end.
Here’s a breakdown:
The variable tokens_per_name is set to 1 because when a name is present in the message, it is encoded as a single token.
For tokens_per_name
, a name is an optional field in the message dictionary that represents the name of the sender of the message. If a name is provided, it is included in the encoding of the message and takes up one token.