In-Context Prompt Learning
Evaluation

Hrithik Datta
,
Founding Software Engineer
July 15, 2024
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.

This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context 
ANSWER_GIVEN_CONTEXT_TEMPLATE = """ 
  You will be given a question and a context. You should 
  provide an answer to the question based on the context. 
  Here is an example on how to answer the question based 
  on the context: Question: What are some ways WebBizz 
  uses technology to improve customer experience? Context: 
  WebBizz has implemented a chatbot on their website to 
  provide instant support to customers. They have also 
  introduced a loyalty program that rewards customers for 
  repeat purchases. WebBizz allows for the creation of 
  wishlists, which can be shared with friends and family. 
  They also offer personalized recommendations based on 
  past purchases. WebBizz uses technology in the warehouse 
  to optimize inventory management and ensure timely 
  delivery of orders. Answer: WebBizz uses technology to 
  improve customer experience by implementing a chatbot on 
  their website, introducing a loyalty program, allowing 
  for the creation of wishlists, and offering personalized 
  recommendations. This is an ideal example of how you 
  should answer the question based on the context provided. 
  Note how the non-relevant information is omitted, and 
  the focus is on the key points related to the question. 
""" 
# Register the model to use in a test run 
model_under_test = okareo.register_model(
  name="OpenAI Answering Model",
  model=OpenAIModel(
    model_id="gpt-3.5-turbo",
    temperature=0,
    system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
    user_prompt_template=USER_PROMPT_TEMPLATE,
  ),
  update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.

As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
- Increased customer satisfaction due to clear, direct answers 
- Reduced time spent by customers reading responses 
- Potential reduction in follow-up queries, easing the load on customer service 
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.

This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context 
ANSWER_GIVEN_CONTEXT_TEMPLATE = """ 
  You will be given a question and a context. You should 
  provide an answer to the question based on the context. 
  Here is an example on how to answer the question based 
  on the context: Question: What are some ways WebBizz 
  uses technology to improve customer experience? Context: 
  WebBizz has implemented a chatbot on their website to 
  provide instant support to customers. They have also 
  introduced a loyalty program that rewards customers for 
  repeat purchases. WebBizz allows for the creation of 
  wishlists, which can be shared with friends and family. 
  They also offer personalized recommendations based on 
  past purchases. WebBizz uses technology in the warehouse 
  to optimize inventory management and ensure timely 
  delivery of orders. Answer: WebBizz uses technology to 
  improve customer experience by implementing a chatbot on 
  their website, introducing a loyalty program, allowing 
  for the creation of wishlists, and offering personalized 
  recommendations. This is an ideal example of how you 
  should answer the question based on the context provided. 
  Note how the non-relevant information is omitted, and 
  the focus is on the key points related to the question. 
""" 
# Register the model to use in a test run 
model_under_test = okareo.register_model(
  name="OpenAI Answering Model",
  model=OpenAIModel(
    model_id="gpt-3.5-turbo",
    temperature=0,
    system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
    user_prompt_template=USER_PROMPT_TEMPLATE,
  ),
  update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.

As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
- Increased customer satisfaction due to clear, direct answers 
- Reduced time spent by customers reading responses 
- Potential reduction in follow-up queries, easing the load on customer service 
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.

This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context 
ANSWER_GIVEN_CONTEXT_TEMPLATE = """ 
  You will be given a question and a context. You should 
  provide an answer to the question based on the context. 
  Here is an example on how to answer the question based 
  on the context: Question: What are some ways WebBizz 
  uses technology to improve customer experience? Context: 
  WebBizz has implemented a chatbot on their website to 
  provide instant support to customers. They have also 
  introduced a loyalty program that rewards customers for 
  repeat purchases. WebBizz allows for the creation of 
  wishlists, which can be shared with friends and family. 
  They also offer personalized recommendations based on 
  past purchases. WebBizz uses technology in the warehouse 
  to optimize inventory management and ensure timely 
  delivery of orders. Answer: WebBizz uses technology to 
  improve customer experience by implementing a chatbot on 
  their website, introducing a loyalty program, allowing 
  for the creation of wishlists, and offering personalized 
  recommendations. This is an ideal example of how you 
  should answer the question based on the context provided. 
  Note how the non-relevant information is omitted, and 
  the focus is on the key points related to the question. 
""" 
# Register the model to use in a test run 
model_under_test = okareo.register_model(
  name="OpenAI Answering Model",
  model=OpenAIModel(
    model_id="gpt-3.5-turbo",
    temperature=0,
    system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
    user_prompt_template=USER_PROMPT_TEMPLATE,
  ),
  update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.

As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
- Increased customer satisfaction due to clear, direct answers 
- Reduced time spent by customers reading responses 
- Potential reduction in follow-up queries, easing the load on customer service 
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.

This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
# Define a template to prompt the model to provide an answer
# based on the context 
ANSWER_GIVEN_CONTEXT_TEMPLATE = """ 
  You will be given a question and a context. You should 
  provide an answer to the question based on the context. 
  Here is an example on how to answer the question based 
  on the context: Question: What are some ways WebBizz 
  uses technology to improve customer experience? Context: 
  WebBizz has implemented a chatbot on their website to 
  provide instant support to customers. They have also 
  introduced a loyalty program that rewards customers for 
  repeat purchases. WebBizz allows for the creation of 
  wishlists, which can be shared with friends and family. 
  They also offer personalized recommendations based on 
  past purchases. WebBizz uses technology in the warehouse 
  to optimize inventory management and ensure timely 
  delivery of orders. Answer: WebBizz uses technology to 
  improve customer experience by implementing a chatbot on 
  their website, introducing a loyalty program, allowing 
  for the creation of wishlists, and offering personalized 
  recommendations. This is an ideal example of how you 
  should answer the question based on the context provided. 
  Note how the non-relevant information is omitted, and 
  the focus is on the key points related to the question. 
""" 
# Register the model to use in a test run 
model_under_test = okareo.register_model(
  name="OpenAI Answering Model",
  model=OpenAIModel(
    model_id="gpt-3.5-turbo",
    temperature=0,
    system_prompt_template=ANSWER_GIVEN_CONTEXT_TEMPLATE,
    user_prompt_template=USER_PROMPT_TEMPLATE,
  ),
  update=True
)
Now, we can run the evaluation again to see if the changes have improved the model's performance.

As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
- Increased customer satisfaction due to clear, direct answers 
- Reduced time spent by customers reading responses 
- Potential reduction in follow-up queries, easing the load on customer service 
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.



