Agent Orchestration Guide
How to plan and execute parallel LLM agent waves for documentation and test creation tasks.
Overview
Use parallel agents when you have more than three independent tasks of similar complexity. This pattern was validated across 20+ waves (Sessions 9-29) producing 750+ Behat scenarios and updating 200+ documentation files.
Core rules:
| Rule | Rationale |
|---|---|
| One agent per output file | Prevents merge conflicts and keeps scope narrow |
| Opus model for all agents | Sub-opus models miss edge cases and hallucinate fixture data |
| Inherit parent permissions | Agents must not prompt for Bash/Edit/Read permissions the parent already has |
| Never poll subagents | Launch and wait. Polling wastes primary context and can cause race conditions |
| Pre-assign all resources | DBs, BT keys, output file names decided before launch |
Execution model: This guide covers the operational mechanics of parallel agents. For the higher-level LLM-human collaboration model, see LLM Workflow Guide. For Behat-specific creation patterns, see Behat Creation Guide.
Resource Planning
DB Pool
Ten MySQL databases are available: pcrai_test_01 through pcrai_test_10. Each test-running agent gets exactly one DB. Checkout and checkin use scripts/db-pool.json with flock to prevent races.
# Check current pool status
cat tests/scripts/db-pool.json
Port Allocation (Browser Tests)
Browser test agents need three resources each: an artisan server, a Chrome instance, and a Chrome profile directory.
| Agent | DB | Artisan Port | Chrome Debug Port | Chrome Profile |
|---|---|---|---|---|
| 1 | pcrai_test_01 | 8001 | 9223 | /shared/chrome-profiles/agent-01 |
| 2 | pcrai_test_02 | 8002 | 9224 | /shared/chrome-profiles/agent-02 |
| 3 | pcrai_test_03 | 8003 | 9225 | /shared/chrome-profiles/agent-03 |
| ... | ... | ... | ... | ... |
| 8 | pcrai_test_08 | 8008 | 9230 | /shared/chrome-profiles/agent-08 |
BT Key and Output File Pre-Assignment
Before launching any wave, assign sequential BT keys and output filenames. This prevents collisions and makes post-wave tracking straightforward.
| Agent | DB | BT Key | Target | Input Files | Output | Status |
|-------|-----|---------|--------|-------------|--------|--------|
| 1 | pcrai_test_01 | BT-XXXX | RULE_NAME | std-rule-xxx.md, existing BT-YYYY | new feature file | pending |
| 2 | pcrai_test_02 | BT-XXXY | RULE_NAME | std-rule-yyy.md | new feature file | pending |
| 3 | — | — | DOC_UPDATE | srs-domain.md | edited file | pending |
Agent Count Estimation
| Scenario | Max Agents | Notes |
|---|---|---|
| Behat API creation | 8-10 | Limited by DB pool; each takes 2-5 min per scenario |
| Browser test creation | 8 | Limited by ports 8001-8008 and Chrome instances |
| Doc updates (SRS/SDS/STD) | 10+ | No DB needed; limited only by context management |
| Quality review (QR) | 10+ | No DB needed; file review only |
| Mixed wave (creation + QR) | 8 creation + QR after | QR runs as a second pass, never concurrent with creation |
Small rules (1-3 scenarios) can be bundled 2-3 per agent. Large rules (8+ scenarios) should always get a dedicated agent.
Pre-Wave Setup Checklist
Reset Pool Databases
The --path flags are critical. Without them, only 2 of 276 migrations run (telescope + jobs), and the seeder crashes on missing tables.
for i in $(seq -w 1 8); do
mysql -h 127.0.0.1 -u sail -ppassword -e "DROP DATABASE IF EXISTS pcrai_test_$i; CREATE DATABASE pcrai_test_$i;"
cd /shared/code/req_docs/code
DB_HOST=127.0.0.1 DB_DATABASE=pcrai_test_$i DB_USERNAME=sail DB_PASSWORD=password \
php artisan migrate:fresh --path=database/migrations --path=database/migrations/app --path=database/migrations/audit --seed --quiet
done
Lock DBs in Pool JSON
Update the pool JSON to mark assigned DBs as in-use before launching agents:
# Lock DB 01 for agent 1
flock tests/scripts/db-pool.json -c 'jq ".pools[0].locked = true" tests/scripts/db-pool.json > /tmp/pool.json && mv /tmp/pool.json tests/scripts/db-pool.json'
Browser Test Infrastructure (If Needed)
# Start Chrome instances (one per agent)
for i in $(seq 1 8); do
PORT=$((9222 + i))
PROFILE="/shared/chrome-profiles/agent-$(printf '%02d' $i)"
mkdir -p "$PROFILE"
google-chrome --headless --disable-gpu --remote-debugging-port=$PORT --user-data-dir="$PROFILE" &
done
# Start artisan servers (one per agent)
for i in $(seq 1 8); do
PORT=$((8000 + i))
DB="pcrai_test_$(printf '%02d' $i)"
cd /shared/code/req_docs/code
PHP_CLI_SERVER_WORKERS=8 DB_HOST=127.0.0.1 DB_DATABASE=$DB DB_USERNAME=sail DB_PASSWORD=password \
php artisan serve --port=$PORT --no-reload &
done
Verify Clean State
# Check for stale Chrome locks
find /shared/chrome-profiles/ -name "SingletonLock" -delete 2>/dev/null
# Verify all DBs are accessible
for i in $(seq -w 1 8); do
mysql -h 127.0.0.1 -u sail -ppassword -e "SELECT 1" pcrai_test_$i > /dev/null 2>&1 && echo "pcrai_test_$i: OK" || echo "pcrai_test_$i: FAIL"
done
Creation Agent Prompt Template
Generic Template
Adapt this template for any workstream. Replace {PLACEHOLDERS} with wave-specific values.
## Task: {TASK_DESCRIPTION}
### READ FIRST
Read these files before starting:
1. {GUIDE_PATH} — authoritative guide for this workstream
2. {EXISTING_EXAMPLE} — example of the desired output format
3. {INPUT_FILE} — source requirements/specifications
### What to Create
- {OUTPUT_FILES_LIST}
### Details
{DETAILS_TABLE_OR_REQUIREMENTS}
### Iterative Strategy (for test creation)
1. Create all fixtures/files
2. Dry-run to verify parsing
3. Run with minimal assertions first
4. Check actual output values
5. Update assertions to match actual values
6. Re-run to confirm all pass
### Running Tests
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
./vendor/bin/behat "{FEATURE_FILE_PATH}"
### Success Criteria
- {CRITERIA_LIST}
Variant: Doc Update Agent (No DB)
## Task: Update {DOC_TYPE} for {DOMAIN}
### READ FIRST
1. {AUTHORING_GUIDE} — format and conventions
2. {EXISTING_DOC} — file to update
3. {CHANGE_MANIFEST_OR_DIFF} — what changed
### What to Update
- File: {DOC_PATH}
- Sections affected: {SECTION_LIST}
### Rules
- Preserve existing REQ IDs (immutable)
- Add {#anchor-name} for new sections
- Update cross-references to other docs
- Add Reviewer Notes entry for each change
### Output
Summary of changes: sections added/modified/removed, new REQ IDs (if any).
Variant: Behat API Test Agent
## Task: Create Behat scenarios for {RULE_NAME}
### READ FIRST
1. docusaurus/docs/guides/llm/guide-llm-behat-creation.md — 40 gotchas, config pitfalls
2. {STD_FILE} — test vectors to implement
3. {EXISTING_BT_FILE} — example of working test for this rule
### What to Create
- Feature file: tests/exports/v3/{OUTPUT_FILE}
- Config: tests/support_files/{BT_KEY}/config.xlsx (copy from {BASE_CONFIG})
- Run files: tests/support_files/{BT_KEY}/*.json
### Test Vectors
{TV_TABLE}
### Environment
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
cd /shared/code/req_docs/code && ./vendor/bin/behat "tests/exports/v3/{OUTPUT_FILE}"
### Tags
- Feature-level: @USE_SAME_CONFIG (line 1, before Feature:)
- Scenario-level: @{BT_KEY} @TV-{RULE}-{NNN}-{NNN}
### Success Criteria
- All scenarios pass
- Every TV tag maps to a real STD entry
- Assertions check specific output values (not just "no error")
Variant: Browser Test Agent
## Task: Create browser tests for {DOMAIN}
### READ FIRST
1. docusaurus/docs/guides/guide-browser-tests.md — step defs, gotchas, parallel execution
2. {STD_FILE} — test cases to implement
3. code/features/bootstrap/BrowserContext.php — available step definitions
### What to Create
- Feature file: tests/exports/browser/{OUTPUT_FILE}
### Infrastructure
Artisan: http://127.0.0.1:{ARTISAN_PORT}
Chrome: 127.0.0.1:{CHROME_PORT}
DB: {DB_NAME}
### Key Gotchas
- Vue SPA needs Pusher keys baked into build
- PHP_CLI_SERVER_WORKERS=8 required (--no-reload)
- BaseTextbox has 500ms debounce — wait 800ms+ before clicking submit
- SVG elements: use JS querySelector, not Mink CSS selectors
- Notifications auto-dismiss after 4s — use "I should see a notification containing"
### Success Criteria
- All scenarios pass
- Each scenario tests a specific TC from the STD
- No flaky waits (use explicit element waits, not sleep)
QR (Quality Review) Protocol
Quality Review is the most important step in each wave. It catches fabricated tags, false passes, and weak assertions before they enter the test suite. QR was run after every wave in Sessions 9-29 and caught 6-12% WRONG items in API tests, 3-8% in browser tests.
7-Dimension Checklist
| # | Dimension | What to Check |
|---|---|---|
| 1 | Scenario execution | Every scenario runs and passes |
| 2 | Tag correctness | @TV-/@TC- tags map to real STD entries; no fabricated tags |
| 3 | Assertion quality | Assertions test specific content/behavior, not just "page loads" or "no error" |
| 4 | STD alignment | Scenarios match test cases defined in STD files |
| 5 | No duplicate coverage | New scenarios are not redundant with existing ones |
| 6 | File organization | Background usage, per-scenario tags, feature-level @USE_SAME_CONFIG on line 1 |
| 7 | Fixture availability | Referenced config files, run files, and users exist in tests/support_files/ |
Rating Scale
| Rating | Meaning | Action Required |
|---|---|---|
| CORRECT | Scenario passes and assertions are meaningful | None |
| WEAK | Passes but assertions could be stronger | Suggest improvements; fix if straightforward (tag corrections, assertion strengthening) |
| WRONG | Tag mismatch, missing assertion, or false-pass risk | Must fix before proceeding to next wave |
QR Agent Template
## Task: Quality Review of {FILE_LIST}
You are a QR agent. Review each scenario in the given feature file(s).
### READ FIRST
1. {STD_FILE} — source of truth for TV/TC definitions
2. {FEATURE_FILES} — files to review
### For Each Scenario, Check:
1. Runs and passes (execute the test if DB assigned, otherwise review only)
2. @TV-/@TC- tags map to real STD entries
3. Assertions test the stated requirement (not just "page loads" or "no error")
4. No duplicate coverage with existing scenarios
5. Data fixtures exist in tests/support_files/
6. @USE_SAME_CONFIG on line 1 if single-config file
7. BT key tag is first scenario-level tag (not @USE_SAME_CONFIG)
### Report Format
For each scenario:
- **CORRECT**: Passes, assertions meaningful
- **WEAK**: Passes, but [specific improvement suggestion]
- **WRONG**: [specific issue — must fix]
### Fix Protocol
- Fix all WRONG items immediately
- Fix WEAK items if straightforward (tag corrections, assertion strengthening)
- For WEAK items requiring structural changes, document but do not fix
### Summary
Report totals: N CORRECT, N WEAK, N WRONG
List all WRONG items with file:line and fix applied.
Timing and Resource Rules
| Rule | Detail |
|---|---|
| QR runs after all creation agents complete | Never run QR concurrent with creation |
| QR agents do not need a DB | They review files and check tags — no test execution unless specifically assigned |
| Fix all WRONG before next wave | WRONG items left unfixed compound into larger problems |
| One QR agent per 15-30 scenarios | ~15 min per agent at this load |
| QR agents can review multiple files | Unlike creation agents, QR agents handle batches |
Empirical WRONG Rates
| Test Type | WRONG Rate | Common Issues |
|---|---|---|
| API tests (Behat) | 6-12% | Fabricated TV tags, stale fixture references, assertions checking wrong field |
| Browser tests | 3-8% | Incorrect CSS selectors, timing issues, assertions on auto-dismissed elements |
| Doc updates | 2-5% | Broken cross-references, wrong anchor names, stale counts |
Post-Wave Checklist
Run through this after every wave completes and QR passes:
- [ ] All creation agents completed successfully
- [ ] QR pass completed — 0 WRONG items remaining
- [ ] All new/modified files committed
- [ ] Tracking docs updated (dashboard, traceability matrix, coverage report)
- [ ] Resource pool released (DBs unlocked in pool JSON, artisan/Chrome stopped if browser wave)
- [ ] Session handoff updated with wave results
- [ ] Issues logged in traceability/known-issues.md (if any discovered)
- [ ] BT key counter updated in MEMORY.md
- [ ] Gap totals recalculated and recorded
Parallelization by Phase
This table maps the Version Update Workflow phases to agent parallelization patterns.
| Phase | Parallelizable | Pattern | Max Agents |
|---|---|---|---|
| 0: Confluence Ingestion | Partially | One agent per domain section | ~10 |
| 1: Change Detection | Partially | Inventory extraction: 6 parallel agents | 6 |
| 2: Impact Analysis | No | Requires complete Change Manifest | 1 |
| 3: SRS Updates | Yes | One agent per domain | ~10 |
| 4: SDS Updates | Partially | Independent sections in parallel | ~5 |
| 5a: STD Updates | Yes | One agent per STD file | ~10 |
| 5b: Behat API Creation | Yes | Wave-based, 8-10 per wave + QR pass | 10 |
| 5c: Browser Test Creation | Yes | Wave-based, 8 per wave + QR pass | 8 |
| 6: Code Mapping Sync | Yes | One agent per code module | ~10 |
| 7: Validation | No | Sequential validation suite | 1 |
Key constraint: Phases 5b and 5c are the most agent-intensive. Plan for multiple waves within each. A typical STD reconciliation takes 6-8 waves of 8-10 agents each, plus one QR wave per creation wave.
Version Upgrade Agent Patterns
When upgrading application code between versions (e.g., v3.0.0 to v3.0.1), the work decomposes into distinct phases with different parallelization characteristics. These patterns were validated during the v3.0.0-to-v3.0.1 upgrade (Session 35).
Typical Wave Structure
| Wave | What | Parallelism | DB Needed | Notes |
|---|---|---|---|---|
| 1. Code upgrade | Fetch files, merge conflicts, rebuild frontend, run migrations | Sequential, main context | Yes (all pool DBs for migration) | Human-in-the-loop for merge conflict decisions |
| 2. Regression + classify | Run full suite via behat-optimizer.py run --suite all | 1 optimizer run (internally parallel) | Yes (pool) | Produces JUnit XML + failure categories |
| 3. Test fixes | Fix broken tests per failure category | Parallel agents per category | Yes (1 DB per agent) | Follow with QR pass |
| 4. Research / manifest | Build change manifest, map files, identify new reqs | 4 parallel read-only agents | No | All read-only: manifest, file-mapping, new-reqs, new-files |
| 5. Doc updates | Update SRS, SDS, STD, code_tags.json | Parallel by doc type | No | Each agent writes different files to avoid conflicts |
| 6. Validation | Final regression, report generation | Sequential | Yes (pool) | Single optimizer run to confirm zero regressions |
Scaling Lessons (from v3.0.1 Upgrade)
Phase 5 (research/manifest) agents are fully independent — they only read code and docs, so you can run many in parallel without DB slots. During v3.0.1, 4 research agents ran simultaneously (manifest, file-mapping, new-reqs, new-files) with zero conflicts.
More commits = more Phase 5 subagents. The v3.0.1 upgrade had 13 commits, handled by 4 research agents. For upgrades with 20+ commits, split file-mapping across commit groups (e.g., commits 1-10 and 11-20 to separate agents) to avoid context overflow.
QR agents do not need DB slots — save pool DBs for test execution agents only. During v3.0.1, QR agents reviewed all test fixes without consuming any DB resources.
Limit heavy-context agents to 6 concurrent. Running more than 6 background agents with large context loads (e.g., full code analysis + fixture creation) risks session instability. Lighter agents (QR, doc updates, read-only research) can safely exceed this limit.
Doc update agents (Phase 6) can run 4 in parallel if each writes to a different doc set (SRS, SDS, STD, code_tags.json). This was validated during v3.0.1 Phase 6 with zero file conflicts.
Wave 3: Test Fix Agents (Parallel by Failure Category)
After regression identifies failures, group them by root cause (e.g., "19 WREP failures from new dependency prefix", "4 INHN flips from lims-status fix", "5 resolution guard failures"). Each category gets its own agent with a dedicated DB.
| Agent | DB | Category | Failure Count | Output |
|-------|-----|----------|--------------|--------|
| 1 | pcrai_test_01 | WREP dependency prefix | 19 | Updated assertions + version tags |
| 2 | pcrai_test_02 | INHN→INHP lims-status | 4 | Updated assertions + version tags |
| 3 | pcrai_test_03 | Resolution guard | 5 | Config fixes or KCI tags |
| QR-1 | — | Review agents 1-3 | — | Tag + assertion audit |
QR agents run after all fix agents complete and do not need a DB — they review files only (TV coverage, assertions, tags, fixture correctness).
Wave 4: Research Agents (All Read-Only)
Four parallel agents, each with a distinct research scope. None need a DB since they only read code and docs.
| Agent | Scope | Key Outputs |
|---|---|---|
| Manifest | Diff upstream commits, catalog every changed file and function | Change manifest document |
| File mapping | Map changed files to existing SRS/SDS/STD docs | Impact matrix (which docs need updates) |
| New requirements | Identify new features, new rules, new config fields | New REQ candidates with proposed IDs |
| New files | Identify new app files not in code_tags.json | code_tags.json additions |
Wave 5: Doc Update Agents (Parallel by Doc Type)
Each agent writes to a different doc set, so there are no file conflicts.
| Agent | Doc Type | Files | Notes |
|---|---|---|---|
| 1 | SRS | docusaurus/docs/srs/ | New/updated requirements |
| 2 | SDS | docusaurus/docs/sds/ | Architecture changes, new execution patterns |
| 3 | STD | docusaurus/docs/std/ | New test vectors, updated TV tables |
| 4 | Code mapping | tests/catalogue/code_tags.json | New file-to-req mappings |
After all four complete, run python3 docusaurus/scripts/generate-unified-traceability.py --render-md to regenerate the traceability matrix.
Special Agent Types
KCI/KL audit agents are read-only. They review @KCI and @KL tagged scenarios against the new version's code to determine which known issues are now resolved. No DB needed; can run many in parallel.
Code contamination audit compares fetched files against the upstream tag to verify no unintended drift. Uses gh api to pull upstream file contents for comparison. Single agent, no DB.
Browser test agents for version upgrades follow the same resource pattern as new browser test creation: each needs an artisan server (unique port), a Chrome instance (unique debug port), a unique DB, and a unique Chrome profile directory. See Port Allocation above for the mapping table.
Risk Register Template
Common risks encountered across Sessions 9-29, with proven mitigations:
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Agent context overflow | Medium | Agent fails mid-task | Keep scope narrow: 1 file per agent, limit input context to essential files only |
| DB pool exhaustion | Low | Agents queue or fail | Pre-assign DBs; max 8 test agents per wave; QR agents never get DBs |
| Concurrent file edits | Medium | Merge conflicts | Never assign same file to two agents; pre-assign output filenames |
| Fabricated TV tags | High | False traceability | QR pass catches these; 6-12% WRONG rate is normal and expected |
| Stale fixture data | Medium | Test failures | Always verify fixtures exist before running; copy from known-good base configs |
| PhpSpreadsheet corruption | Low | Config unusable | Use PHP (not Python openpyxl) for xlsx edits; keep clean base configs in input/example/ |
| Permission inheritance failure | Medium | Agent prompts for permissions, stalling | Use general-purpose subagent type; if still failing, run from main context |
| Subagent polling waste | Low | Primary context burned | Never poll — launch and wait; subagents report when done |
| Browser test flakiness | Medium | False failures | Use explicit element waits, clean Chrome profiles, PHP_CLI_SERVER_WORKERS=8 |
Session Handoff Template
Use this at the end of each session to ensure continuity:
## Session N Handoff — {DATE}
### Status
- Wave {N} of {TOTAL}: {STATUS}
- Agents launched: {COUNT}, Completed: {COUNT}, Failed: {COUNT}
### Counts
- Scenarios created: {N}
- Scenarios passing: {N} (KCI: {N}, KL: {N})
- Rules fully closed: {N}
- Gap TVs remaining: ~{N} (~{N} actionable)
### Files Modified
{LIST_OF_FILES}
### Issues Discovered
{LIST_OF_ISSUES_WITH_IDS_OR_NONE}
### BT Key Counter
Next available: BT-{XXXX}
### Next Steps
{WHAT_NEEDS_TO_HAPPEN_NEXT}
### Human Decisions Needed
{QUESTIONS_REQUIRING_HUMAN_INPUT_OR_NONE}