MODEL

AgentDoG

modeltopic-noteopen-sourcesafetyalignment

Overview

AgentDoG (v1.5) is a lightweight and scalable safety-alignment framework for AI agents that trains small open-model variants (0.8B–8B) on roughly 1k samples to act as real-time safety guardrails for frontier agents. Targets the cost-per-call constraint that makes closed-source guardrails impractical for high-volume agent deployments.

Timeline

  • 2026-05-30-AI-Digest — Paper published as arXiv:2605.29801 (▲82) by Dongrui Liu et al. Proposes a taxonomy-guided safety alignment framework that trains lightweight 0.8B–8B variants on ~1k samples to match closed-source guardrails (notably GPT-5.4), with a Docker-level RL/SFT environment cutting deployment overhead by roughly two orders of magnitude. Why it matters: pushes small open models into a credible role as real-time safety guardrails for frontier agents, where the cost-per-call is the real constraint.

Key Developments

  1. Lightweight 0.8B–8B Safety Variants on ~1k Samples: Taxonomy-guided alignment trains 0.8B–8B variants on roughly 1k samples to match closed-source guardrails (notably GPT-5.4) — a credible cost-architecture for real-time safety guardrails around frontier agents (2026-05-30-AI-Digest).

  2. Docker-Level RL/SFT Environment — ~2 Orders of Magnitude Deployment-Overhead Cut: The framework’s RL/SFT environment is shipped at Docker-level granularity, with the authors reporting roughly two orders of magnitude lower deployment overhead than comparable closed-source guardrail pipelines.

See also: MOC - Open Source Models, MOC - Agent Security.