In-Context Prompt Learning
Evaluation
Hrithik Datta
,
Founding Software Engineer
July 15, 2024
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.
This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context
ANSWER_GIVEN_CONTEXT_TEMPLATE = """
You will be given a question and a context. You should
provide an answer to the question based on the context.
Here is an example on how to answer the question based
on the context: Question: What are some ways WebBizz
uses technology to improve customer experience? Context:
WebBizz has implemented a chatbot on their website to
provide instant support to customers. They have also
introduced a loyalty program that rewards customers for
repeat purchases. WebBizz allows for the creation of
wishlists, which can be shared with friends and family.
They also offer personalized recommendations based on
past purchases. WebBizz uses technology in the warehouse
to optimize inventory management and ensure timely
delivery of orders. Answer: WebBizz uses technology to
improve customer experience by implementing a chatbot on
their website, introducing a loyalty program, allowing
for the creation of wishlists, and offering personalized
recommendations. This is an ideal example of how you
should answer the question based on the context provided.
Note how the non-relevant information is omitted, and
the focus is on the key points related to the question.
"""
# Register the model to use in a test run
model_under_test = okareo.register_model(
name="OpenAI Answering Model",
model=OpenAIModel(
model_id="gpt-3.5-turbo",
temperature=0,
system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
user_prompt_template=USER_PROMPT_TEMPLATE,
),
update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.
As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
Increased customer satisfaction due to clear, direct answers
Reduced time spent by customers reading responses
Potential reduction in follow-up queries, easing the load on customer service
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.
This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context
ANSWER_GIVEN_CONTEXT_TEMPLATE = """
You will be given a question and a context. You should
provide an answer to the question based on the context.
Here is an example on how to answer the question based
on the context: Question: What are some ways WebBizz
uses technology to improve customer experience? Context:
WebBizz has implemented a chatbot on their website to
provide instant support to customers. They have also
introduced a loyalty program that rewards customers for
repeat purchases. WebBizz allows for the creation of
wishlists, which can be shared with friends and family.
They also offer personalized recommendations based on
past purchases. WebBizz uses technology in the warehouse
to optimize inventory management and ensure timely
delivery of orders. Answer: WebBizz uses technology to
improve customer experience by implementing a chatbot on
their website, introducing a loyalty program, allowing
for the creation of wishlists, and offering personalized
recommendations. This is an ideal example of how you
should answer the question based on the context provided.
Note how the non-relevant information is omitted, and
the focus is on the key points related to the question.
"""
# Register the model to use in a test run
model_under_test = okareo.register_model(
name="OpenAI Answering Model",
model=OpenAIModel(
model_id="gpt-3.5-turbo",
temperature=0,
system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
user_prompt_template=USER_PROMPT_TEMPLATE,
),
update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.
As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
Increased customer satisfaction due to clear, direct answers
Reduced time spent by customers reading responses
Potential reduction in follow-up queries, easing the load on customer service
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.
This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context
ANSWER_GIVEN_CONTEXT_TEMPLATE = """
You will be given a question and a context. You should
provide an answer to the question based on the context.
Here is an example on how to answer the question based
on the context: Question: What are some ways WebBizz
uses technology to improve customer experience? Context:
WebBizz has implemented a chatbot on their website to
provide instant support to customers. They have also
introduced a loyalty program that rewards customers for
repeat purchases. WebBizz allows for the creation of
wishlists, which can be shared with friends and family.
They also offer personalized recommendations based on
past purchases. WebBizz uses technology in the warehouse
to optimize inventory management and ensure timely
delivery of orders. Answer: WebBizz uses technology to
improve customer experience by implementing a chatbot on
their website, introducing a loyalty program, allowing
for the creation of wishlists, and offering personalized
recommendations. This is an ideal example of how you
should answer the question based on the context provided.
Note how the non-relevant information is omitted, and
the focus is on the key points related to the question.
"""
# Register the model to use in a test run
model_under_test = okareo.register_model(
name="OpenAI Answering Model",
model=OpenAIModel(
model_id="gpt-3.5-turbo",
temperature=0,
system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
user_prompt_template=USER_PROMPT_TEMPLATE,
),
update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.
As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
Increased customer satisfaction due to clear, direct answers
Reduced time spent by customers reading responses
Potential reduction in follow-up queries, easing the load on customer service
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.
This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context
ANSWER_GIVEN_CONTEXT_TEMPLATE = """
You will be given a question and a context. You should
provide an answer to the question based on the context.
Here is an example on how to answer the question based
on the context: Question: What are some ways WebBizz
uses technology to improve customer experience? Context:
WebBizz has implemented a chatbot on their website to
provide instant support to customers. They have also
introduced a loyalty program that rewards customers for
repeat purchases. WebBizz allows for the creation of
wishlists, which can be shared with friends and family.
They also offer personalized recommendations based on
past purchases. WebBizz uses technology in the warehouse
to optimize inventory management and ensure timely
delivery of orders. Answer: WebBizz uses technology to
improve customer experience by implementing a chatbot on
their website, introducing a loyalty program, allowing
for the creation of wishlists, and offering personalized
recommendations. This is an ideal example of how you
should answer the question based on the context provided.
Note how the non-relevant information is omitted, and
the focus is on the key points related to the question.
"""
# Register the model to use in a test run
model_under_test = okareo.register_model(
name="OpenAI Answering Model",
model=OpenAIModel(
model_id="gpt-3.5-turbo",
temperature=0,
system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
user_prompt_template=USER_PROMPT_TEMPLATE,
),
update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.
As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
Increased customer satisfaction due to clear, direct answers
Reduced time spent by customers reading responses
Potential reduction in follow-up queries, easing the load on customer service
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.