In-Context Prompt Learning

Evaluation

Hrithik Datta
Founding Software Engineer
July 15, 2024
In-context learning is a fast and easy way to start improving prompt results. Let's learn how to use in-context learning to quickly improve a simple LLM response for a common business use case: customer service queries on an e-commerce platform. The goal is to generate responses that are not only concise and accurate but also appropriate in tone and context.
What is In-Context Learning
In-Context Learning is a fancy way of saying "provide examples." It is useful to think of LLMs as brilliant but inexperienced assistants. Without experience, anyones (much less an LLMs) response to a question will be unpredictable. By adding examples with explanations, the LLM gains experience and "learns" how you want it to behave. But you can't provide an example for every user input. This is where scenarios and evaluation come in. By creating a metric baseline for the models initial behavior, you can measurably add "just enough" in-context learning to improve efficiently. Let's improve a simple prompt for WebBizz to see how this works.
Setting up the model and prompt
Using Okareo, the evaluation process begins with setting up a scenario for evaluation. Scenarios are the inputs that will go to the language model. The LLM then generates a response, which Okareo analyzes.
For those who want to follow along or see the complete code, you can find the full notebook here.
Using a simple customer service example, we first set up the model with our simple prompt, "Answer the question". The results generated by the model are then evaluated by Okareo as shown below.

This prompt and model got a score of 4.04 on conciseness and 4.26 on relevance. Let's see how we can improve this.
Improving the prompt
We can improve the prompt by using in-context learning to teach the model the right way to answer a question. We can do this by providing examples of good and bad responses to the model. For this example, we will be adding an example of a good response to the prompt. See below for the code that updates the prompt.
Now, we can run the evaluation again to see if the changes have improved the model's performance.

As we can see from the second picture, our in-context learning approach has improved both the conciseness and relevance of the response. The answer now better addresses the customer's specific questions without any extraneous information.
Conclusion
This improvement in response quality can have significant real-world impacts:
Increased customer satisfaction due to clear, direct answers
Reduced time spent by customers reading responses
Potential reduction in follow-up queries, easing the load on customer service
By iteratively using Okareo to evaluate and refine our prompts, we can continuously improve our LLM's performance for specific tasks. This process of evaluation, refinement, and re-evaluation becomes a powerful tool in developing more effective AI-driven solutions.
Remember, the key to effective LLM optimization is continuous evaluation and refinement. Regular use of evaluation tools, combined with thoughtful prompt engineering, can lead to significant improvements in your generated responses.
Okareo automates the process of LLM evaluation
Start evaluating LLM applications in CI with Okareo and ensure that your LLM-powered functionality works as expected while also saving you countless hours of manual testing. Okareo is free to use for smaller projects (of up to 5k model data points, 1k evaluated rows and 50k scenario tokens).
You can get started with Okareo immediately for free. The example evaluation in this tutorial will help you get started, as well as our cookbook project which contains many more working examples.



