Action Caching & Replay Pattern

Nikola Balic (@nibzard)· emerging

问题

基于LLM的Agent执行成本高昂（涉及金钱成本与延迟成本）且具有非确定性。多次运行同一工作流会产生不同结果，还会重复消耗LLM相关成本。

这会引发以下多方面问题：

成本激增：即便执行的是完全相同的任务，每次运行工作流都会消耗LLM令牌
非确定性：相同输入在多次运行中会产生不同输出
缺乏回归测试能力：无法验证修复操作不会破坏现有工作流
迭代缓慢：不支付LLM成本就无法快速测试变更
无法集成CI/CD：对Agent工作流开展自动化测试不具备可行性

方案

在执行过程中记录每个操作及其精准元数据（XPaths、框架索引、执行细节），无需调用LLM即可实现确定性重放。即便页面结构发生细微变化，缓存也能保留足够信息以重放操作。

核心方案

操作缓存条目 存储完整的执行元数据：

interface ActionCacheEntry {
  stepIndex: number;           // 工作流中的执行顺序
  instruction: string;         // 自然语言描述的操作指令
  elementId: string;           // 编码后的框架索引-后端节点ID
  method: string;              // 操作方法：click、fill、type等
  arguments: string[];         // 方法参数列表
  frameIndex: number;          // 针对iframe的框架上下文索引
  xpath: string;               // 标准化后的元素XPath
  actionType: string;          // 操作分类
  success: boolean;            // 执行结果状态
  message: string;             // 输出信息或错误提示
}

interface ActionCacheOutput {
  actions: ActionCacheEntry[];
  finalState: {
    success: boolean;
    error?: string;
    duration: number;          // 总执行时长
  };
}

执行过程中构建缓存条目：

export const buildActionCacheEntry = ({
  stepIndex,
  action,
  actionOutput,
  domState,
}: {
  stepIndex: number;
  action: ActionType;
  actionOutput: ActionOutput;
  domState: A11yDOMState;  // 无障碍DOM状态
}): ActionCacheEntry => {
  const instruction = extractInstruction(action);  // 提取操作指令
  const elementId = extractElementId(action);      // 提取元素ID
  const method = extractMethod(action);            // 提取操作方法
  const args = extractArguments(action);           // 提取方法参数
  const frameIndex = extractFrameIndex(elementId); // 从元素ID中解析框架索引
  const xpath = normalizeXPath(
    // 优先从DOM状态的XPath映射表获取，备用从调试信息中提取
    domState.xpathMap?.[encodedId] || extractXPathFromDebug(actionOutput)
  );

  return {
    stepIndex,
    instruction,
    elementId,
    method,
    arguments: args,
    frameIndex,
    xpath,
    actionType: action.type,
    success: actionOutput.success,
    message: actionOutput.message,
  };
};

智能降级的重放机制：

// 基于缓存重放，支持XPath重试与LLM降级备选
const replay = await page.runFromActionCache(cache, {
  maxXPathRetries: 3,           // 最多重试XPath解析3次
  fallbackToLLM: true,          // 若XPath解析失败则调用LLM作为备选
  debug: true,
});

// 重放流程：
// 1. 直接尝试使用缓存的XPath
// 2. 若失败，尝试标准化后的XPath变体
// 3. 若仍失败，调用LLM解析目标元素
// 4. 将成功解析结果更新至缓存

生成独立可执行脚本：

// 将缓存的工作流导出为TypeScript脚本
const script = generateScriptFromCache(cache);

// 生成的可运行代码示例：
// import { chromium } from 'playwright';
// const page = await chromium.newPage();
// await page.click('div[data-testid="login-button"]');
// await page.fill('input[name="email"]', 'user@example.com');
// ...

架构设计

graph TB
    subgraph "首次执行（LLM驱动）"
        A[用户请求] --> B[搭载LLM的Agent]
        B --> C[操作1：点击登录按钮]
        C --> D[构建缓存条目#1]
        D --> E[操作2：填写邮箱]
        E --> F[构建缓存条目#2]
        F --> G[执行完成]
    end

    subgraph "后续执行（缓存驱动）"
        H[用户请求] --> I{是否存在缓存？}
        I -->|是| J[加载操作缓存]
        J --> K[重放操作1]
        K --> L{XPath可用？}
        L -->|是| M[重放操作2]
        L -->|否| N[LLM降级备选]
        N --> M
        M --> O{XPath可用？}
        O -->|是| P[执行完成 - 零LLM调用成本]
        O -->|否| Q[LLM降级备选]
        Q --> P
    end

    style B fill:#f9f,stroke:#333
    style P fill:#9f9,stroke:#333

如何使用

1. 启用动作缓存

大多数Agent框架都提供缓存配置项，执行任务时开启即可：

const agent = new HyperAgent(browser);
const cache = await agent.executeTask("登录并导航至仪表盘", {
  enableActionCache: true,
});

2. 持久化缓存

将缓存保存以备后续复用：

import fs from 'fs';

fs.writeFileSync(
  'workflows/login-cache.json',
  JSON.stringify(cache, null, 2)
);

3. 从缓存重放

const savedCache = JSON.parse(
  fs.readFileSync('workflows/login-cache.json', 'utf-8')
);

const result = await page.runFromActionCache(savedCache, {
  maxXPathRetries: 3,
  fallbackToLLM: true,
});

4. CI/CD 集成

// test.js - 自动化回归测试
describe('用户登录流程', () => {
  it('应成功完成登录', async () => {
    const cache = JSON.parse(readFileSync('workflows/login-cache.json'));
    const result = await page.runFromActionCache(cache);

    expect(result.finalState.success).toBe(true);
  });
});

5. 脚本生成

将工作流导出为独立脚本，用于手动调试或CI环境：

# 从缓存生成TypeScript脚本
npx hyperagent script workflows/login-cache.json > login.test.ts

来源摘要

正在获取来源并生成中文摘要…

来源: https://github.com/hyperbrowserai/HyperAgent

← 返回社区