IMPLEMENTATION-PLAN.md

AgentContainers — Implementation Plan

Delivery Philosophy

Ship in vertical slices. Each phase must be independently runnable and leave the repo in a better state than it found it. No phase should be a pure infrastructure phase that blocks all visible progress.

The first milestone is intentionally narrow: prove the matrix model works end to end before expanding the combination matrix.

Phase 0 — Repository Bootstrap

Goal: Establish the repo scaffold, toolchain decisions, and contribution conventions before any implementation begins.

Duration estimate: 1–2 days

Deliverables

[ ] Final repo directory structure created (see ARCHITECTURE.md §5)
[ ] .editorconfig and .gitattributes committed
[ ] CONTRIBUTING.md drafted
[ ] JSON Schema files created for all manifest types (stubs acceptable)
[ ] Generator project scaffold created (src/AgentContainers.Generator, src/AgentContainers.Core)
[ ] Script entrypoints created (scripts/generate.ps1, scripts/generate.sh) as no-op stubs
[ ] Baseline CI workflow committed (run on PR, no build logic yet)

Acceptance Criteria

[ ] Repo structure matches ARCHITECTURE.md §5 exactly
[ ] dotnet build src/ succeeds (even with empty projects)
[ ] CI workflow runs and passes on an empty PR

Key Decisions Resolved in This Phase

Generator language: .NET 10 (C#) — enforced by project creation
Definition format: YAML — enforced by schema stub creation
Template engine: Scriban — added as NuGet dependency in this phase

Phase 1 — Core Matrix Model

Goal: Implement the manifest schema, generator core, and drift detection. No Docker required yet.

Duration estimate: 3–5 days

Deliverables

[ ] JSON Schema definitions finalized for all manifest types (see MANIFEST-MODEL.md)
[ ] AgentContainers.Core library implementing:
- Manifest deserialization (YAML → typed C# models)
- Inheritance/layer resolution
- Compatibility rule evaluation
- Capability token matching
- Combination matrix expansion
[ ] AgentContainers.Generator CLI implementing:
- generate command
- validate command (schema + reference + compatibility checks)
- list-matrix command (dry-run, prints planned combinations)
- Scriban template rendering
- Stable output writing with content-hash tracking
[ ] Sample definitions for at least one base, one agent, and one toolpack (stubs)
[ ] Generated output folder structure populated from stubs
[ ] Drift detection CI step: generate && git diff --exit-code generated/

Acceptance Criteria

[ ] dotnet run -- validate passes on valid definitions and fails clearly on invalid ones
[ ] dotnet run -- generate produces byte-identical output on repeated runs
[ ] Invalid compatibility (e.g., Node-requiring agent on Python-only base) fails with a clear error
[ ] CI drift detection step fails on a PR that modifies a definition without regenerating
[ ] dotnet run -- list-matrix emits a tabular summary of planned image IDs

Unit Test Targets (Phase 1)

Manifest deserialization round-trip
Inheritance resolution with three levels of nesting
Compatibility rule evaluation: valid and invalid combinations
Template rendering for a minimal Dockerfile
Drift detection: content hash comparison

Phase 2 — Foundational Image Families

Goal: Ship real base images and a combo image. Prove the build pipeline works end to end.

Duration estimate: 3–5 days

Deliverables

[ ] definitions/bases/node-bun.yaml — full manifest
[ ] definitions/bases/python.yaml — full manifest
[ ] definitions/bases/dotnet.yaml — full manifest
[ ] definitions/combos/node-py-dotnet.yaml
[ ] definitions/common-tools/default.yaml
[ ] Generated Dockerfiles for all bases and the combo
[ ] Runtime smoke test expectations declared in each manifest
[ ] scripts/build-local.ps1 — builds a single image by ID
[ ] CI build workflow: builds all base and combo images on PR
[ ] Generated docs/image-table.md listing all available images

Acceptance Criteria

[ ] All base images build from generated Dockerfiles without errors
[ ] Each base image passes its declared runtime validation commands
[ ] Common tools are present and correct in each image
[ ] Combo image contains all three runtimes and passes combined validation
[ ] Tags are generated in the canonical format
[ ] OCI labels are present on built images

Deferred to Later Phases

Rust, C/C++, Haskell base images
ARM64 platform support
Multi-stage builds for size optimization

Phase 3 — Agent Overlays

Goal: Install and validate the first two agent providers. Formalize the overlay contract.

Duration estimate: 3–5 days

Deliverables

[ ] definitions/agents/claude.yaml — full manifest with env, mounts, health, install
[ ] definitions/agents/openclaw.yaml — full manifest
[ ] definitions/agents/codex.yaml — full manifest
[ ] definitions/agents/copilot.yaml — full manifest
[ ] Generated Dockerfiles for all base × agent combinations (v1 matrix)
[ ] Agent overlay template (templates/dockerfiles/agent.dockerfile.scriban)
[ ] Runtime smoke tests for each agent overlay
[ ] Per-agent documentation page (docs/agents/claude.md, docs/agents/openclaw.md, docs/agents/codex.md, docs/agents/copilot.md)

v1 Agent Matrix

Base	Claude	OpenClaw	Codex	Copilot
node-bun	✓	✓	✓	✓
python	— (deferred)	✓	— (deferred)	— (deferred)
dotnet	— (deferred)	— (deferred)	— (deferred)	— (deferred)
node-py-dotnet	✓	✓	✓	✓

Acceptance Criteria

[ ] Each agent overlay builds successfully on compatible bases
[ ] Attempting to build an incompatible combination fails with a clear generator error
[ ] Each agent image passes agent-specific runtime smoke tests
[ ] All required environment variables are documented in the manifest
[ ] Config directories and persistence paths are documented in the manifest

Key Integration Points (from External Reference Synthesis)

Claude Code:

Likely requires ~/.config/claude bind mount or equivalent volume for config persistence
Expects ANTHROPIC_API_KEY injected via environment
Non-root user ergonomics: must run as the dev user without elevated permissions
Interactive and non-interactive modes both supported

OpenClaw:

Service/container pattern: exposes an API or service endpoint
Requires container networking awareness (compose network, port declaration)
Configuration driven via environment variables
Compose-friendly: health endpoint expected

Phase 4 — Tool Packs

Goal: Demonstrate that optional tool packs compose cleanly without image explosion.

Duration estimate: 2–3 days

Deliverables

[ ] definitions/toolpacks/headroom.yaml — full manifest
[ ] Generated Dockerfiles for all agent × headroom combinations in the v1 matrix
[ ] Headroom compose sidecar fragment (generated/compose/fragments/headroom-service.yaml)
[ ] Tool pack documentation page (docs/toolpacks/headroom.md)

Headroom Integration Specifics

Headroom operates as a proxy/optimization sidecar, not installed directly into the agent image. The tool pack manifest must provide:

A standalone Headroom sidecar service definition (for compose stacks that want it as a separate container)
An optional embedded variant (if Headroom can be co-installed in the agent image)
Shared network and environment variable coordination schema
Health/readiness dependency declarations so agent containers wait for Headroom to be ready

The headroom tool pack installs the Headroom CLI into agent images and documents the required environment variables for routing through it. The sidecar pattern is the recommended v1 deployment model.

Acceptance Criteria

[ ] Headroom tool pack builds and passes smoke tests
[ ] Tool pack compatibility rules are enforced (fails on incompatible base)
[ ] Headroom compose fragment includes healthcheck: and depends_on: condition: service_healthy
[ ] Documentation explains sidecar vs. embedded deployment options

Phase 5 — Compose Topologies

Goal: Ship realistic, working Compose examples that prove multi-agent setups are easy to launch.

Duration estimate: 3–4 days

Deliverables

[ ] generated/compose/examples/solo-claude/docker-compose.yaml
[ ] generated/compose/examples/dual-agent/docker-compose.yaml
[ ] generated/compose/examples/gateway-headroom/docker-compose.yaml
[ ] Compose smoke test (scripts/validate.ps1 --compose solo-claude)
[ ] docs/compose/ documentation for each example
[ ] .env.example file per stack documenting required environment variables

Stack Descriptions

Solo Claude:

1 container: claude agent on node-bun base
Mounts: ./workspace:/workspace, ~/.config/claude:/home/dev/.config/claude
Network: none required externally
Health: Claude availability check

Dual Agent:

2 containers: claude (node-bun) + openclaw (node-bun)
Shared workspace volume
Separate state volumes
Shared internal network

Gateway + Headroom:

3 containers: openclaw + headroom sidecar + optional claude
OpenClaw depends on Headroom health
Headroom proxies token requests
External port exposure on OpenClaw API

Acceptance Criteria

[ ] Each stack starts with docker compose up using documented prerequisites
[ ] Health checks are wired with depends_on: condition: service_healthy where applicable
[ ] Persistent volumes are named and documented
[ ] .env.example lists every required variable

Phase 6 — Publishing and Cataloging

Goal: Automate image publication and produce a usable catalog.

Duration estimate: 2–3 days

Deliverables

[ ] .github/workflows/publish.yaml — pushes to registry on merge to default branch
[ ] Tag policy enforcement in generator (definitions/tag-policies/default.yaml)
[ ] generated/manifests/image-catalog.json — machine-readable catalog
[ ] Generated IMAGE-CATALOG.md — human-readable catalog table
[ ] Selective build logic: only rebuild images whose definitions changed (based on manifest hash)
[ ] Registry configuration: support for GitHub Container Registry (ghcr.io) as primary

Acceptance Criteria

[ ] Only changed images are rebuilt and pushed on merge
[ ] Published images carry full OCI label set
[ ] Image catalog is regenerated and committed on each publish run
[ ] Tags include CalVer + latest + git SHA

Phase 7 — Hardening and Scale-Out

Goal: Security hardening, provenance, community extensibility. Post-v1 milestone.

Deliverables (Post-v1)

[ ] Trivy/Grype vulnerability scan integrated into CI (gate on HIGH severity)
[ ] SBOM generation using Syft or Docker BuildKit SBOM output
[ ] SLSA provenance attestation (Level 1 minimum)
[ ] Policy check for risky manifest declarations (e.g., privileged: true without documented rationale)
[ ] Additional base images: rust, cpp, haskell
[ ] Additional agent overlays: opencode, gemini
[ ] Additional tool packs: gh-azure, discord, build-tools, diagnostics
[ ] ARM64 multi-platform builds

Work Tracks and Parallelism

These tracks can proceed in parallel after Phase 0 is complete.

Track	Owner Focus	Primary Output
A — Generator & Schema	Core engine, schema design	`src/`, `schemas/`, `definitions/`
B — Image Authoring	Manifest writing, Dockerfile validation	`definitions/`, `generated/dockerfiles/`
C — Validation & Testing	Smoke tests, CI harness	`src/AgentContainers.Validation/`, CI
D — Compose & Examples	Compose stacks, fragment model	`generated/compose/`
E — Docs & Catalog	Docs, catalog generation	`docs/`, `IMAGE-CATALOG.md`

Track A must complete Phase 1 before Track B can produce real artifacts. All other tracks can begin scaffold work immediately.

Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
Image matrix explosion	High	Medium	Explicit publish profiles; curated v1 matrix
Agent installer instability (upstream changes)	High	High	Pin installer versions; isolate per-agent install fragments
Compose complexity / fragile examples	Medium	Medium	Start with simple stacks; add complexity incrementally
Generator drift (defs vs. generated)	Medium	High	Drift detection CI step; enforce on every PR
Scriban template complexity leaking logic	Medium	Medium	Keep templates logic-minimal; complex decisions in generator
Non-root permission surprises per agent	Medium	Medium	Validate each agent at smoke-test time under `dev` user

First End-to-End Milestone (MVP)

This milestone is the target for the first demo-able state of the system.

Scope:

Generator runs and produces correct Dockerfiles for node-bun-claude, node-bun-openclaw, node-bun-codex, node-bun-copilot, node-py-dotnet-openclaw-headroom
All three build locally
solo-claude Compose example starts and passes health check
CI drift detection catches a definition change that was not regenerated
OCI labels are present on all built images

Not required for this milestone: publishing, full matrix, additional agents, SBOM.

Table of Contents

IMPLEMENTATION-PLAN.md

AgentContainers — Implementation Plan

Delivery Philosophy

Phase 0 — Repository Bootstrap

Deliverables

Acceptance Criteria

Key Decisions Resolved in This Phase

Phase 1 — Core Matrix Model

Deliverables

Acceptance Criteria

Unit Test Targets (Phase 1)

Phase 2 — Foundational Image Families

Deliverables

Acceptance Criteria

Deferred to Later Phases

Phase 3 — Agent Overlays

Deliverables

v1 Agent Matrix

Acceptance Criteria

Key Integration Points (from External Reference Synthesis)

Phase 4 — Tool Packs

Deliverables

Headroom Integration Specifics

Acceptance Criteria

Phase 5 — Compose Topologies

Deliverables

Stack Descriptions

Acceptance Criteria

Phase 6 — Publishing and Cataloging

Deliverables

Acceptance Criteria

Phase 7 — Hardening and Scale-Out

Deliverables (Post-v1)

Work Tracks and Parallelism

Risks and Mitigations

First End-to-End Milestone (MVP)