Testing and Improving Prompts with Evalite
A/B test Nudge prompt variants and iteratively improve them using Evalite.
This guide shows how Nudge and Evalite work together to A/B test prompts and drive continuous improvement.
- Nudge defines prompt structure and variants
- Evalite executes those variants on real data and scores behavior
A/B Testing Prompt Variants
Variants in Nudge represent controlled changes on top of a shared base prompt.
import { prompt } from "@nudge-ai/core";
export const summarizerPrompt = prompt("summarizer", (p) =>
p
.persona("expert summarizer")
.input("text to summarize")
.output("concise summary")
.variant("short", (v) =>
v.constraint("limit to 1–2 sentences")
)
.variant("detailed", (v) =>
v
.do("explain context")
.do("include examples")
)
);Each variant produces a distinct system prompt while keeping intent consistent.
Comparing Variants with Evalite
Evalite’s each() API maps directly to Nudge variants.
import { evalite } from "evalite";
import { generateText } from "ai";
import { exactMatch } from "evalite/scorers";
import { summarizerPrompt } from "./summarizer.prompt";
import "./prompts.gen";
evalite.each(
summarizerPrompt.variantNames.map((name) => ({
name,
input: { variant: name },
}))
)("Summarizer Variants", {
data: async () => [
{
input: "Climate change refers to long-term shifts...",
expected: "Climate change involves long-term shifts...",
},
],
task: async (input, variant) => {
return generateText({
model: "gpt-4o-mini",
system: summarizerPrompt.toString({ variant: variant.variant }),
prompt: input.input,
});
},
scorers: [
{
scorer: ({ output, expected }) =>
exactMatch({ actual: output, expected }),
},
],
});All variants are evaluated on the same inputs, making comparisons repeatable.
From A/B Results to Prompt Contracts
Evalite answers what performs better in practice. Nudge enforces what must never regress.
Once Evalite exposes stable failure modes (length, tone, missing facts), encode them directly:
.test(
"Climate change refers to long-term shifts...",
(output) => output.split(/\s+/).length <= 150,
"Must stay under 150 words"
);Self-Improvement
Nudge can automatically refine prompts based on failing tests:
npx @nudge-ai/cli improve --judge