When architects discuss microservices, conversations often focus on APIs, cloud platforms, containers, service meshes, and databases. However, one of the most overlooked architectural decisions is the ...
背景 多轮对话场景,如果每次对话新建一个实例的话记忆保存THROTTLED是失效的 影响 每次请求额外消耗 ~50% 的 token(flush LLM call 需要将完整对话上下文再发送一次) 用户感知:推理结束后还要等 25-30s 才收到 Agent 完成事件 配置 flushMinGapSeconds: 120 完全无效,形同虚设 根因分析 // ...