Skip to main content
Version: 3.0.1

Agent Orchestration Guide

How to plan and execute parallel LLM agent waves for documentation and test creation tasks.

Overview

Use parallel agents when you have more than three independent tasks of similar complexity. This pattern was validated across 20+ waves (Sessions 9-29) producing 750+ Behat scenarios and updating 200+ documentation files.

Core rules:

RuleRationale
One agent per output filePrevents merge conflicts and keeps scope narrow
Opus model for all agentsSub-opus models miss edge cases and hallucinate fixture data
Inherit parent permissionsAgents must not prompt for Bash/Edit/Read permissions the parent already has
Never poll subagentsLaunch and wait. Polling wastes primary context and can cause race conditions
Pre-assign all resourcesDBs, BT keys, output file names decided before launch

Execution model: This guide covers the operational mechanics of parallel agents. For the higher-level LLM-human collaboration model, see LLM Workflow Guide. For Behat-specific creation patterns, see Behat Creation Guide.


Resource Planning

DB Pool

Ten MySQL databases are available: pcrai_test_01 through pcrai_test_10. Each test-running agent gets exactly one DB. Checkout and checkin use scripts/db-pool.json with flock to prevent races.

# Check current pool status
cat tests/scripts/db-pool.json

Port Allocation (Browser Tests)

Browser test agents need three resources each: an artisan server, a Chrome instance, and a Chrome profile directory.

AgentDBArtisan PortChrome Debug PortChrome Profile
1pcrai_test_0180019223/shared/chrome-profiles/agent-01
2pcrai_test_0280029224/shared/chrome-profiles/agent-02
3pcrai_test_0380039225/shared/chrome-profiles/agent-03
...............
8pcrai_test_0880089230/shared/chrome-profiles/agent-08

BT Key and Output File Pre-Assignment

Before launching any wave, assign sequential BT keys and output filenames. This prevents collisions and makes post-wave tracking straightforward.

| Agent | DB | BT Key | Target | Input Files | Output | Status |
|-------|-----|---------|--------|-------------|--------|--------|
| 1 | pcrai_test_01 | BT-XXXX | RULE_NAME | std-rule-xxx.md, existing BT-YYYY | new feature file | pending |
| 2 | pcrai_test_02 | BT-XXXY | RULE_NAME | std-rule-yyy.md | new feature file | pending |
| 3 | — | — | DOC_UPDATE | srs-domain.md | edited file | pending |

Agent Count Estimation

ScenarioMax AgentsNotes
Behat API creation8-10Limited by DB pool; each takes 2-5 min per scenario
Browser test creation8Limited by ports 8001-8008 and Chrome instances
Doc updates (SRS/SDS/STD)10+No DB needed; limited only by context management
Quality review (QR)10+No DB needed; file review only
Mixed wave (creation + QR)8 creation + QR afterQR runs as a second pass, never concurrent with creation

Small rules (1-3 scenarios) can be bundled 2-3 per agent. Large rules (8+ scenarios) should always get a dedicated agent.


Pre-Wave Setup Checklist

Reset Pool Databases

The --path flags are critical. Without them, only 2 of 276 migrations run (telescope + jobs), and the seeder crashes on missing tables.

for i in $(seq -w 1 8); do
mysql -h 127.0.0.1 -u sail -ppassword -e "DROP DATABASE IF EXISTS pcrai_test_$i; CREATE DATABASE pcrai_test_$i;"
cd /shared/code/req_docs/code
DB_HOST=127.0.0.1 DB_DATABASE=pcrai_test_$i DB_USERNAME=sail DB_PASSWORD=password \
php artisan migrate:fresh --path=database/migrations --path=database/migrations/app --path=database/migrations/audit --seed --quiet
done

Lock DBs in Pool JSON

Update the pool JSON to mark assigned DBs as in-use before launching agents:

# Lock DB 01 for agent 1
flock tests/scripts/db-pool.json -c 'jq ".pools[0].locked = true" tests/scripts/db-pool.json > /tmp/pool.json && mv /tmp/pool.json tests/scripts/db-pool.json'

Browser Test Infrastructure (If Needed)

# Start Chrome instances (one per agent)
for i in $(seq 1 8); do
PORT=$((9222 + i))
PROFILE="/shared/chrome-profiles/agent-$(printf '%02d' $i)"
mkdir -p "$PROFILE"
google-chrome --headless --disable-gpu --remote-debugging-port=$PORT --user-data-dir="$PROFILE" &
done

# Start artisan servers (one per agent)
for i in $(seq 1 8); do
PORT=$((8000 + i))
DB="pcrai_test_$(printf '%02d' $i)"
cd /shared/code/req_docs/code
PHP_CLI_SERVER_WORKERS=8 DB_HOST=127.0.0.1 DB_DATABASE=$DB DB_USERNAME=sail DB_PASSWORD=password \
php artisan serve --port=$PORT --no-reload &
done

Verify Clean State

# Check for stale Chrome locks
find /shared/chrome-profiles/ -name "SingletonLock" -delete 2>/dev/null

# Verify all DBs are accessible
for i in $(seq -w 1 8); do
mysql -h 127.0.0.1 -u sail -ppassword -e "SELECT 1" pcrai_test_$i > /dev/null 2>&1 && echo "pcrai_test_$i: OK" || echo "pcrai_test_$i: FAIL"
done

Creation Agent Prompt Template

Generic Template

Adapt this template for any workstream. Replace {PLACEHOLDERS} with wave-specific values.

## Task: {TASK_DESCRIPTION}

### READ FIRST
Read these files before starting:
1. {GUIDE_PATH} — authoritative guide for this workstream
2. {EXISTING_EXAMPLE} — example of the desired output format
3. {INPUT_FILE} — source requirements/specifications

### What to Create
- {OUTPUT_FILES_LIST}

### Details
{DETAILS_TABLE_OR_REQUIREMENTS}

### Iterative Strategy (for test creation)
1. Create all fixtures/files
2. Dry-run to verify parsing
3. Run with minimal assertions first
4. Check actual output values
5. Update assertions to match actual values
6. Re-run to confirm all pass

### Running Tests
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
./vendor/bin/behat "{FEATURE_FILE_PATH}"

### Success Criteria
- {CRITERIA_LIST}

Variant: Doc Update Agent (No DB)

## Task: Update {DOC_TYPE} for {DOMAIN}

### READ FIRST
1. {AUTHORING_GUIDE} — format and conventions
2. {EXISTING_DOC} — file to update
3. {CHANGE_MANIFEST_OR_DIFF} — what changed

### What to Update
- File: {DOC_PATH}
- Sections affected: {SECTION_LIST}

### Rules
- Preserve existing REQ IDs (immutable)
- Add {#anchor-name} for new sections
- Update cross-references to other docs
- Add Reviewer Notes entry for each change

### Output
Summary of changes: sections added/modified/removed, new REQ IDs (if any).

Variant: Behat API Test Agent

## Task: Create Behat scenarios for {RULE_NAME}

### READ FIRST
1. docusaurus/docs/guides/llm/guide-llm-behat-creation.md — 40 gotchas, config pitfalls
2. {STD_FILE} — test vectors to implement
3. {EXISTING_BT_FILE} — example of working test for this rule

### What to Create
- Feature file: tests/exports/v3/{OUTPUT_FILE}
- Config: tests/support_files/{BT_KEY}/config.xlsx (copy from {BASE_CONFIG})
- Run files: tests/support_files/{BT_KEY}/*.json

### Test Vectors
{TV_TABLE}

### Environment
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
cd /shared/code/req_docs/code && ./vendor/bin/behat "tests/exports/v3/{OUTPUT_FILE}"

### Tags
- Feature-level: @USE_SAME_CONFIG (line 1, before Feature:)
- Scenario-level: @{BT_KEY} @TV-{RULE}-{NNN}-{NNN}

### Success Criteria
- All scenarios pass
- Every TV tag maps to a real STD entry
- Assertions check specific output values (not just "no error")

Variant: Browser Test Agent

## Task: Create browser tests for {DOMAIN}

### READ FIRST
1. docusaurus/docs/guides/guide-browser-tests.md — step defs, gotchas, parallel execution
2. {STD_FILE} — test cases to implement
3. code/features/bootstrap/BrowserContext.php — available step definitions

### What to Create
- Feature file: tests/exports/browser/{OUTPUT_FILE}

### Infrastructure
Artisan: http://127.0.0.1:{ARTISAN_PORT}
Chrome: 127.0.0.1:{CHROME_PORT}
DB: {DB_NAME}

### Key Gotchas
- Vue SPA needs Pusher keys baked into build
- PHP_CLI_SERVER_WORKERS=8 required (--no-reload)
- BaseTextbox has 500ms debounce — wait 800ms+ before clicking submit
- SVG elements: use JS querySelector, not Mink CSS selectors
- Notifications auto-dismiss after 4s — use "I should see a notification containing"

### Success Criteria
- All scenarios pass
- Each scenario tests a specific TC from the STD
- No flaky waits (use explicit element waits, not sleep)

QR (Quality Review) Protocol

Quality Review is the most important step in each wave. It catches fabricated tags, false passes, and weak assertions before they enter the test suite. QR was run after every wave in Sessions 9-29 and caught 6-12% WRONG items in API tests, 3-8% in browser tests.

7-Dimension Checklist

#DimensionWhat to Check
1Scenario executionEvery scenario runs and passes
2Tag correctness@TV-/@TC- tags map to real STD entries; no fabricated tags
3Assertion qualityAssertions test specific content/behavior, not just "page loads" or "no error"
4STD alignmentScenarios match test cases defined in STD files
5No duplicate coverageNew scenarios are not redundant with existing ones
6File organizationBackground usage, per-scenario tags, feature-level @USE_SAME_CONFIG on line 1
7Fixture availabilityReferenced config files, run files, and users exist in tests/support_files/

Rating Scale

RatingMeaningAction Required
CORRECTScenario passes and assertions are meaningfulNone
WEAKPasses but assertions could be strongerSuggest improvements; fix if straightforward (tag corrections, assertion strengthening)
WRONGTag mismatch, missing assertion, or false-pass riskMust fix before proceeding to next wave

QR Agent Template

## Task: Quality Review of {FILE_LIST}

You are a QR agent. Review each scenario in the given feature file(s).

### READ FIRST
1. {STD_FILE} — source of truth for TV/TC definitions
2. {FEATURE_FILES} — files to review

### For Each Scenario, Check:
1. Runs and passes (execute the test if DB assigned, otherwise review only)
2. @TV-/@TC- tags map to real STD entries
3. Assertions test the stated requirement (not just "page loads" or "no error")
4. No duplicate coverage with existing scenarios
5. Data fixtures exist in tests/support_files/
6. @USE_SAME_CONFIG on line 1 if single-config file
7. BT key tag is first scenario-level tag (not @USE_SAME_CONFIG)

### Report Format
For each scenario:
- **CORRECT**: Passes, assertions meaningful
- **WEAK**: Passes, but [specific improvement suggestion]
- **WRONG**: [specific issue — must fix]

### Fix Protocol
- Fix all WRONG items immediately
- Fix WEAK items if straightforward (tag corrections, assertion strengthening)
- For WEAK items requiring structural changes, document but do not fix

### Summary
Report totals: N CORRECT, N WEAK, N WRONG
List all WRONG items with file:line and fix applied.

Timing and Resource Rules

RuleDetail
QR runs after all creation agents completeNever run QR concurrent with creation
QR agents do not need a DBThey review files and check tags — no test execution unless specifically assigned
Fix all WRONG before next waveWRONG items left unfixed compound into larger problems
One QR agent per 15-30 scenarios~15 min per agent at this load
QR agents can review multiple filesUnlike creation agents, QR agents handle batches

Empirical WRONG Rates

Test TypeWRONG RateCommon Issues
API tests (Behat)6-12%Fabricated TV tags, stale fixture references, assertions checking wrong field
Browser tests3-8%Incorrect CSS selectors, timing issues, assertions on auto-dismissed elements
Doc updates2-5%Broken cross-references, wrong anchor names, stale counts

Post-Wave Checklist

Run through this after every wave completes and QR passes:

- [ ] All creation agents completed successfully
- [ ] QR pass completed — 0 WRONG items remaining
- [ ] All new/modified files committed
- [ ] Tracking docs updated (dashboard, traceability matrix, coverage report)
- [ ] Resource pool released (DBs unlocked in pool JSON, artisan/Chrome stopped if browser wave)
- [ ] Session handoff updated with wave results
- [ ] Issues logged in traceability/known-issues.md (if any discovered)
- [ ] BT key counter updated in MEMORY.md
- [ ] Gap totals recalculated and recorded

Parallelization by Phase

This table maps the Version Update Workflow phases to agent parallelization patterns.

PhaseParallelizablePatternMax Agents
0: Confluence IngestionPartiallyOne agent per domain section~10
1: Change DetectionPartiallyInventory extraction: 6 parallel agents6
2: Impact AnalysisNoRequires complete Change Manifest1
3: SRS UpdatesYesOne agent per domain~10
4: SDS UpdatesPartiallyIndependent sections in parallel~5
5a: STD UpdatesYesOne agent per STD file~10
5b: Behat API CreationYesWave-based, 8-10 per wave + QR pass10
5c: Browser Test CreationYesWave-based, 8 per wave + QR pass8
6: Code Mapping SyncYesOne agent per code module~10
7: ValidationNoSequential validation suite1

Key constraint: Phases 5b and 5c are the most agent-intensive. Plan for multiple waves within each. A typical STD reconciliation takes 6-8 waves of 8-10 agents each, plus one QR wave per creation wave.


Version Upgrade Agent Patterns

When upgrading application code between versions (e.g., v3.0.0 to v3.0.1), the work decomposes into distinct phases with different parallelization characteristics. These patterns were validated during the v3.0.0-to-v3.0.1 upgrade (Session 35).

Typical Wave Structure

WaveWhatParallelismDB NeededNotes
1. Code upgradeFetch files, merge conflicts, rebuild frontend, run migrationsSequential, main contextYes (all pool DBs for migration)Human-in-the-loop for merge conflict decisions
2. Regression + classifyRun full suite via behat-optimizer.py run --suite all1 optimizer run (internally parallel)Yes (pool)Produces JUnit XML + failure categories
3. Test fixesFix broken tests per failure categoryParallel agents per categoryYes (1 DB per agent)Follow with QR pass
4. Research / manifestBuild change manifest, map files, identify new reqs4 parallel read-only agentsNoAll read-only: manifest, file-mapping, new-reqs, new-files
5. Doc updatesUpdate SRS, SDS, STD, code_tags.jsonParallel by doc typeNoEach agent writes different files to avoid conflicts
6. ValidationFinal regression, report generationSequentialYes (pool)Single optimizer run to confirm zero regressions

Scaling Lessons (from v3.0.1 Upgrade)

Phase 5 (research/manifest) agents are fully independent — they only read code and docs, so you can run many in parallel without DB slots. During v3.0.1, 4 research agents ran simultaneously (manifest, file-mapping, new-reqs, new-files) with zero conflicts.

More commits = more Phase 5 subagents. The v3.0.1 upgrade had 13 commits, handled by 4 research agents. For upgrades with 20+ commits, split file-mapping across commit groups (e.g., commits 1-10 and 11-20 to separate agents) to avoid context overflow.

QR agents do not need DB slots — save pool DBs for test execution agents only. During v3.0.1, QR agents reviewed all test fixes without consuming any DB resources.

Limit heavy-context agents to 6 concurrent. Running more than 6 background agents with large context loads (e.g., full code analysis + fixture creation) risks session instability. Lighter agents (QR, doc updates, read-only research) can safely exceed this limit.

Doc update agents (Phase 6) can run 4 in parallel if each writes to a different doc set (SRS, SDS, STD, code_tags.json). This was validated during v3.0.1 Phase 6 with zero file conflicts.

Wave 3: Test Fix Agents (Parallel by Failure Category)

After regression identifies failures, group them by root cause (e.g., "19 WREP failures from new dependency prefix", "4 INHN flips from lims-status fix", "5 resolution guard failures"). Each category gets its own agent with a dedicated DB.

| Agent | DB | Category | Failure Count | Output |
|-------|-----|----------|--------------|--------|
| 1 | pcrai_test_01 | WREP dependency prefix | 19 | Updated assertions + version tags |
| 2 | pcrai_test_02 | INHN→INHP lims-status | 4 | Updated assertions + version tags |
| 3 | pcrai_test_03 | Resolution guard | 5 | Config fixes or KCI tags |
| QR-1 | — | Review agents 1-3 | — | Tag + assertion audit |

QR agents run after all fix agents complete and do not need a DB — they review files only (TV coverage, assertions, tags, fixture correctness).

Wave 4: Research Agents (All Read-Only)

Four parallel agents, each with a distinct research scope. None need a DB since they only read code and docs.

AgentScopeKey Outputs
ManifestDiff upstream commits, catalog every changed file and functionChange manifest document
File mappingMap changed files to existing SRS/SDS/STD docsImpact matrix (which docs need updates)
New requirementsIdentify new features, new rules, new config fieldsNew REQ candidates with proposed IDs
New filesIdentify new app files not in code_tags.jsoncode_tags.json additions

Wave 5: Doc Update Agents (Parallel by Doc Type)

Each agent writes to a different doc set, so there are no file conflicts.

AgentDoc TypeFilesNotes
1SRSdocusaurus/docs/srs/New/updated requirements
2SDSdocusaurus/docs/sds/Architecture changes, new execution patterns
3STDdocusaurus/docs/std/New test vectors, updated TV tables
4Code mappingtests/catalogue/code_tags.jsonNew file-to-req mappings

After all four complete, run python3 docusaurus/scripts/generate-unified-traceability.py --render-md to regenerate the traceability matrix.

Special Agent Types

KCI/KL audit agents are read-only. They review @KCI and @KL tagged scenarios against the new version's code to determine which known issues are now resolved. No DB needed; can run many in parallel.

Code contamination audit compares fetched files against the upstream tag to verify no unintended drift. Uses gh api to pull upstream file contents for comparison. Single agent, no DB.

Browser test agents for version upgrades follow the same resource pattern as new browser test creation: each needs an artisan server (unique port), a Chrome instance (unique debug port), a unique DB, and a unique Chrome profile directory. See Port Allocation above for the mapping table.


Risk Register Template

Common risks encountered across Sessions 9-29, with proven mitigations:

RiskLikelihoodImpactMitigation
Agent context overflowMediumAgent fails mid-taskKeep scope narrow: 1 file per agent, limit input context to essential files only
DB pool exhaustionLowAgents queue or failPre-assign DBs; max 8 test agents per wave; QR agents never get DBs
Concurrent file editsMediumMerge conflictsNever assign same file to two agents; pre-assign output filenames
Fabricated TV tagsHighFalse traceabilityQR pass catches these; 6-12% WRONG rate is normal and expected
Stale fixture dataMediumTest failuresAlways verify fixtures exist before running; copy from known-good base configs
PhpSpreadsheet corruptionLowConfig unusableUse PHP (not Python openpyxl) for xlsx edits; keep clean base configs in input/example/
Permission inheritance failureMediumAgent prompts for permissions, stallingUse general-purpose subagent type; if still failing, run from main context
Subagent polling wasteLowPrimary context burnedNever poll — launch and wait; subagents report when done
Browser test flakinessMediumFalse failuresUse explicit element waits, clean Chrome profiles, PHP_CLI_SERVER_WORKERS=8

Session Handoff Template

Use this at the end of each session to ensure continuity:

## Session N Handoff — {DATE}

### Status
- Wave {N} of {TOTAL}: {STATUS}
- Agents launched: {COUNT}, Completed: {COUNT}, Failed: {COUNT}

### Counts
- Scenarios created: {N}
- Scenarios passing: {N} (KCI: {N}, KL: {N})
- Rules fully closed: {N}
- Gap TVs remaining: ~{N} (~{N} actionable)

### Files Modified
{LIST_OF_FILES}

### Issues Discovered
{LIST_OF_ISSUES_WITH_IDS_OR_NONE}

### BT Key Counter
Next available: BT-{XXXX}

### Next Steps
{WHAT_NEEDS_TO_HAPPEN_NEXT}

### Human Decisions Needed
{QUESTIONS_REQUIRING_HUMAN_INPUT_OR_NONE}