icon: LiNotebookTabs
Title: Key Parameters for LLMs
Helper Function
in the notebook, we only pass in three arguments to the create()
method.# This is a function that send input (i.e., prompt) to LLM and receive the output from the LLM
def get_completion(prompt, model="gpt-4o-mini"):
messages = [{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0, # this is the degree of randomness of the model's output
)
client.chat.completion.create()
method,visit the offcial API reference here
✦ In the context of Large Language Models (LLMs) like GPT3.5 or GPT-4o, “temperature” refers to a parameter that controls the randomness of the model’s predictions.
✦ Technically, it adjusts the probability distribution of the next token being generated, influencing the diversity of the generated text
Softmax function
is often used in machine learning models to convert raw scores (also known as logits) into probabilities.✦ Table below shows candidates of word for completing the prompt "Singapore has a lot of beautiful ...".
softmax function
.softmax function
.Softmax with Temperature
) are less extreme
✦ See the following for the illustration of the concept.
temperature
, we can control the trade-off between diversity and confidence in the model’s predictions. Word | Logits | Softmax | Softmax with LOW temperature | Softmax with High tempetaure |
---|---|---|---|---|
scenaries | 20 | 0.881 | 1.000 | 0.8808 |
buildings | 18 | 0.119 | 0.000 | 0.1192 |
people | 5 | 0.000 | 0.000 | 0.000 |
gardens | 2 | 0.000 | 0.000 | 0.000 |
💡 You don't have to worry about understanding the equation or memorizing it.
It's more for us to understand the intuition on where is the temperature
being used
Softmax
Softmax with Temperature
The live calculation to show the intuition of the Temperature
is included in the Notebook of this week. Try it out!
Top-K sampling strategy
.The live calculation to show the intuition of the Top-K
process is included in the Notebook of this week. Try it out!
Top-K
or Top-P
is used, but not both at the same time. They are different strategies for controlling the trade-off between diversity and confidence in the model’s predictions.max_tokens
n
helper function
that we use to call LLMs, like the one below:!pip install tiktoken
!pip install openai
# This is the "Updated" helper function for calling LLM,
# to expose the parameters that we have discussed
def get_completion(prompt, model="gpt-3.5-turbo", temperature=0, top_p=1.0, max_tokens=1024, n=1):
messages = [{"role": "user", "content": prompt}]
response = openai.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
top_p=top_p,
max_tokens=max_tokens,
n=1
)
return response.choices[0].message.content
On OpenAI's API reference, it is stated that we generally recommend altering temperature
or top_p
but not both.
We suggest to stick with the official recommendation from OpenAI to only change the temperature
as the primary way to change the "creativity" of the LLM output
For those who want to explore or experiment further with both the parameters, this table contains various combinations of the two parameters and a description of the different scenarios they will be potentially useful for. We caveat that is not officially recommended by OpenAI and should be used with caution.
Use Case | Temperature | Top_p | Description |
---|---|---|---|
Code Generation | 0.2 | 0.1 | Generates code that adheres to established patterns and conventions. Output is more deterministic and focused. Useful for generating syntactically correct code. |
Creative Writing | 0.7 | 0.8 | Generates creative and diverse text for storytelling. Output is more exploratory and less constrained by patterns. |
Chatbot Responses | 0.5 | 0.5 | Generates conversational responses that balance coherence and diversity. Output is more natural and engaging. |
Code Comment Generation | 0.3 | 0.2 | Generates code comments that are more likely to be concise and relevant. Output is more deterministic and adheres to conventions. |
Data Analysis Scripting | 0.2 | 0.1 | Generates data analysis scripts that are more likely to be correct and efficient. Output is more deterministic and focused. |
Exploratory Code Writing | 0.6 | 0.7 | Generates code that explores alternative solutions and creative approaches. Output is less constrained by established patterns. |