Schema Validation Retry with Cross-Step Learning

Nikola Balic (@nibzard)· emerging

问题

大语言模型（LLMs）并非总能生成符合预期schema的有效结构化输出。即便重试即可成功，仅执行单次校验的机制仍会导致任务失败。

这类问题在多步骤工作流中会进一步加剧：

Schema 违规：LLM生成的JSON不符合预期的Zod/JSON Schema
一次性失败：单次尝试失败即终止整个工作流
无法从错误中学习：每个步骤会独立重复相同的错误
Token 浪费：失败的响应仍会占用context并产生成本
工作流脆弱：LLM的不稳定输出会导致Agent不可靠

方案

实现带详细错误反馈与跨步骤错误累积的多轮重试机制，让Agent能从整个工作流的验证失败经验中学习。

核心机制

1. 带详细反馈的多轮重试：

const maxAttempts = 3;

for (let attempt = 0; attempt < maxAttempts; attempt++) {
  const result = await ctx.llm.invokeStructured({ schema, options }, msgs);

  if (result.parsed) {
    return result.parsed;  // 验证成功 - 退出重试循环
  }

  // 提取详细的Zod验证错误
  const validationError = getZodError(result.rawText);

  // 存储错误用于跨步骤学习
  ctx.schemaErrors?.push({
    stepIndex: currStep,
    error: validationError,
    rawResponse: result.rawText || "",
  });

  // 追加错误反馈用于下一次重试
  msgs = [
    ...msgs,
    { role: "assistant", content: result.rawText },
    { role: "user", content: `Validation errors:\n${validationError}\nPlease fix.` },
  ];
}

// 若所有尝试均失败，携带累积错误上下文抛出异常
throw new SchemaValidationError(`经过${maxAttempts}次尝试后仍失败`, {
  stepIndex: currStep,
  errors: ctx.schemaErrors?.slice(-3)  // 最近3次错误
});

2. 跨步骤错误累积：

Agent会维护一个最近Schema错误的滚动窗口，并将这些错误信息纳入后续LLM调用：

interface AgentContext {
  schemaErrors: Array<{
    stepIndex: number;
    error: string;          // Zod验证错误信息
    rawResponse: string;    // LLM的实际输出内容
    timestamp: number;
  }>;
}

// 每个步骤开始前，追加最近错误引导LLM
const recentErrors = ctx.schemaErrors
  .slice(-3)  // 仅保留最近3条错误，避免Context内容过载
  .map(e => `Step ${e.stepIndex}: ${e.error}`)
  .join('\n');

if (recentErrors) {
  msgs.push({
    role: "system",
    content: `需规避的最近Schema验证错误：\n${recentErrors}`
  });
}

3. 结构化反馈循环：

每次重试迭代都会提供具体、可落地的反馈：

function getZodError(rawText: string): string {
  try {
    const parsed = JSON.parse(rawText);
    const result = zodSchema.safeParse(parsed);

    if (!result.success) {
      // 清晰格式化Zod错误信息
      return result.error.issues
        .map(issue => {
          const path = issue.path.join('.');
          return `${path}: ${issue.message}（接收值：${JSON.stringify(issue.received)}）`;
        })
        .join('\n');
    }
  } catch {
    return "无法解析为JSON格式";
  }
}

架构设计

sequenceDiagram
    participant LLM as LLM
    participant 验证器 as Zod 验证器
    participant 上下文 as Agent 上下文
    participant 错误历史 as 错误历史

    loop 多步骤工作流
        Note over LLM,错误历史: 步骤N
        LLM->>验证器: 生成结构化输出
        验证器->>验证器: 根据Schema执行验证

        alt 验证通过
            验证器-->>LLM: 验证成功
            LLM->>错误历史: 记录成功（重置连续错误次数）
        else 验证失败 - 第1次尝试
            验证器-->>LLM: Zod错误详情
            LLM->>上下文: 将错误存入历史记录
            LLM->>LLM: 携带错误反馈进行重试
        end

        alt 仍验证失败 - 第2次尝试
            验证器-->>LLM: Zod错误详情
            LLM->>上下文: 将错误存入历史记录
            上下文->>LLM: 带入最近3条历史错误
            LLM->>LLM: 携带跨步骤上下文重试
        end

        alt 仍验证失败 - 第3次尝试
            验证器-->>LLM: Zod错误详情
            LLM->>上下文: 存入最终错误信息
            上下文-->>上下文: 标记当前步骤执行失败
        end
    end

如何使用

1. 用Zod定义Schema

使用带明确错误提示的严格Schema：

import { z } from 'zod';

const ActionSchema = z.object({
  type: z.enum(['click', 'fill', 'wait', 'scroll']),
  elementId: z.string().min(1, "elementId为必填项"),
  arguments: z.array(z.string()).optional(),
  confidence: z.number().min(0).max(1),
});

const StepOutputSchema = z.object({
  action: ActionSchema,
  reasoning: z.string().max(500, "请保持推理内容简洁"),
});

2. 实现重试包装器

async function invokeWithRetry<T>(
  ctx: AgentContext,
  schema: z.ZodSchema<T>,
  messages: Message[],
  stepIndex: number
): Promise<T> {
  const maxAttempts = 3;

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const result = await ctx.llm.invokeStructured({ schema }, messages);

    if (result.parsed) {
      // 成功 - 清除该步骤的错误记录
      ctx.recentErrors = ctx.recentErrors.filter(e => e.stepIndex !== stepIndex);
      return result.parsed;
    }

    // 构建详细的错误信息
    const error = formatZodError(result.error);
    const errorEntry = {
      stepIndex,
      error,
      rawResponse: result.rawText,
      attempt,
    };

    // 存储错误信息用于跨步骤学习
    ctx.schemaErrors.push(errorEntry);

    // 添加反馈信息用于重试
    messages = [
      ...messages,
      { role: "assistant", content: result.rawText },
      { role: "user", content: `验证失败：\n${error}\n请修正后重试。` },
    ];
  }

  throw new Error(`经过${maxAttempts}次尝试后，Schema验证仍失败`);
}

3. 注入错误上下文

在每个步骤开始前，注入近期错误信息以避免重复犯错：

function prepareMessagesForStep(
  ctx: AgentContext,
  stepIndex: number,
  task: string
): Message[] {
  const messages = [
    { role: "system", content: systemPrompt },
    { role: "user", content: task },
  ];

  // 添加错误历史
  const recentErrors = ctx.schemaErrors
    .slice(-3)  // 最近3条错误
    .filter(e => e.stepIndex < stepIndex)  // 仅包含之前步骤的错误
    .map(e => `步骤 ${e.stepIndex}：${e.error}`)
    .join('\n');

  if (recentErrors) {
    messages.push({
      role: "system",
      content: `请避免以下过往错误：\n${recentErrors}`,
    });
  }

  return messages;
}

4. 配置Agent上下文

interface AgentConfig {
  maxValidationAttempts: number;
  errorHistorySize: number;      // 保留的错误记录数量
  crossStepErrorCount: number;   // 每步注入的错误记录数量
}

const context: AgentContext = {
  schemaErrors: [],
  config: {
    maxValidationAttempts: 3,
    errorHistorySize: 10,
    crossStepErrorCount: 3,
  },
};

权衡

优势：

更高的成功率：3次重试机制显著提升结构化输出的可靠性
跨步骤学习：Agent可避免在工作流中重复犯错
详细错误反馈：Zod错误引导LLM定位至具体修复点
更高效的调试：错误历史提供诊断信息，助力问题排查
可配置的平衡策略：能够在尝试次数与成本/延迟之间进行调优

劣势：

延迟增加：触发重试时，多次LLM调用会额外增加延迟
成本上升：失败的尝试仍会消耗Token
上下文冗余：若不加以限制，错误历史会占用大量Token
效果无法保证：部分LLM难以根据错误信息完成修正
复杂度提升：需要额外开发重试逻辑与错误管理机制

缓解策略：

限制跨步骤错误窗口（仅保留最近3条错误）以控制Token消耗
利用缓存机制，对重复工作流跳过重试操作
为每个步骤设置超时时间，防止无限制重试
记录失败案例，逐步优化Prompt
考虑选用对结构化输出支持更好的模型

参考文献

关键词：

本次涉及HyperAgent工具的GitHub原始实现源码路径、Zod Schema校验库官方文档，以及结构化输出规范、动作缓存与重放两类相关技术模式。

直译：

HyperAgent GitHub代码仓库 - 原始实现（详见src/agent/tools/agent.ts文件第424-509行）

Zod校验官方文档 - Schema校验库

相关模式：结构化输出规范、动作缓存与重放

来源摘要

正在获取来源并生成中文摘要…

来源: https://github.com/hyperbrowserai/HyperAgent

← 返回社区