MODEL
AgentDoG
Overview
AgentDoG (v1.5) is a lightweight and scalable safety-alignment framework for AI agents that trains small open-model variants (0.8B–8B) on roughly 1k samples to act as real-time safety guardrails for frontier agents. Targets the cost-per-call constraint that makes closed-source guardrails impractical for high-volume agent deployments.
Timeline
- 2026-05-30-AI-Digest — Paper published as arXiv:2605.29801 (▲82) by Dongrui Liu et al. Proposes a taxonomy-guided safety alignment framework that trains lightweight 0.8B–8B variants on ~1k samples to match closed-source guardrails (notably GPT-5.4), with a Docker-level RL/SFT environment cutting deployment overhead by roughly two orders of magnitude. Why it matters: pushes small open models into a credible role as real-time safety guardrails for frontier agents, where the cost-per-call is the real constraint.
Key Developments
-
Lightweight 0.8B–8B Safety Variants on ~1k Samples: Taxonomy-guided alignment trains 0.8B–8B variants on roughly 1k samples to match closed-source guardrails (notably GPT-5.4) — a credible cost-architecture for real-time safety guardrails around frontier agents (2026-05-30-AI-Digest).
-
Docker-Level RL/SFT Environment — ~2 Orders of Magnitude Deployment-Overhead Cut: The framework’s RL/SFT environment is shipped at Docker-level granularity, with the authors reporting roughly two orders of magnitude lower deployment overhead than comparable closed-source guardrail pipelines.
Related
See also: MOC - Open Source Models, MOC - Agent Security.