Technology RadarTechnology Radar

promptfoo

Promptfoo is a tool for prompt testing and evaluating LLM applications. It helps create effective prompts and models by providing benchmarks specific to your needs, speeds up the evaluation process with caching and concurrency, and allows automatic scoring of outputs based on defined metrics.

Updates

Assess

For a website we are integrating GenAI features, such as a bot that allows you to chat with a virtual assistant. The virtual assistant will be able to answer question about the company, its products and helps you discover content on the website.

We are using promptfoo to define test cases for the expected prompts, so that we can (a) be sure we have fitting answers to the questions that users may ask, and (b) can safely upgrade or change the model to a more tailored one. In this we are inspired by what Mark Russinovich, Technical Directory at Microsoft Azure, in Changelog Podcast episode #594, tells us about practices for benchmarking results.