Version: 3.0.1

Agent Orchestration Guide

How to plan and execute parallel LLM agent waves for documentation and test creation tasks.

Overview

Use parallel agents when you have more than three independent tasks of similar complexity. This pattern was validated across 20+ waves (Sessions 9-29) producing 750+ Behat scenarios and updating 200+ documentation files.

Core rules:

Rule	Rationale
One agent per output file	Prevents merge conflicts and keeps scope narrow
Opus model for all agents	Sub-opus models miss edge cases and hallucinate fixture data
Inherit parent permissions	Agents must not prompt for Bash/Edit/Read permissions the parent already has
Never poll subagents	Launch and wait. Polling wastes primary context and can cause race conditions
Pre-assign all resources	DBs, BT keys, output file names decided before launch

Execution model: This guide covers the operational mechanics of parallel agents. For the higher-level LLM-human collaboration model, see LLM Workflow Guide. For Behat-specific creation patterns, see Behat Creation Guide.

Resource Planning

DB Pool

Ten MySQL databases are available: pcrai_test_01 through pcrai_test_10. Each test-running agent gets exactly one DB. Checkout and checkin use scripts/db-pool.json with flock to prevent races.

# Check current pool status
cat tests/scripts/db-pool.json

Port Allocation (Browser Tests)

Browser test agents need three resources each: an artisan server, a Chrome instance, and a Chrome profile directory.

Agent	DB	Artisan Port	Chrome Debug Port	Chrome Profile
1	pcrai_test_01	8001	9223	/shared/chrome-profiles/agent-01
2	pcrai_test_02	8002	9224	/shared/chrome-profiles/agent-02
3	pcrai_test_03	8003	9225	/shared/chrome-profiles/agent-03
...	...	...	...	...
8	pcrai_test_08	8008	9230	/shared/chrome-profiles/agent-08

BT Key and Output File Pre-Assignment

Before launching any wave, assign sequential BT keys and output filenames. This prevents collisions and makes post-wave tracking straightforward.

| Agent | DB | BT Key | Target | Input Files | Output | Status |
|-------|-----|---------|--------|-------------|--------|--------|
| 1 | pcrai_test_01 | BT-XXXX | RULE_NAME | std-rule-xxx.md, existing BT-YYYY | new feature file | pending |
| 2 | pcrai_test_02 | BT-XXXY | RULE_NAME | std-rule-yyy.md | new feature file | pending |
| 3 | — | — | DOC_UPDATE | srs-domain.md | edited file | pending |

Agent Count Estimation

Scenario	Max Agents	Notes
Behat API creation	8-10	Limited by DB pool; each takes 2-5 min per scenario
Browser test creation	8	Limited by ports 8001-8008 and Chrome instances
Doc updates (SRS/SDS/STD)	10+	No DB needed; limited only by context management
Quality review (QR)	10+	No DB needed; file review only
Mixed wave (creation + QR)	8 creation + QR after	QR runs as a second pass, never concurrent with creation

Small rules (1-3 scenarios) can be bundled 2-3 per agent. Large rules (8+ scenarios) should always get a dedicated agent.

Pre-Wave Setup Checklist

Reset Pool Databases

The --path flags are critical. Without them, only 2 of 276 migrations run (telescope + jobs), and the seeder crashes on missing tables.

for i in $(seq -w 1 8); do
  mysql -h 127.0.0.1 -u sail -ppassword -e "DROP DATABASE IF EXISTS pcrai_test_$i; CREATE DATABASE pcrai_test_$i;"
  cd /shared/code/req_docs/code
  DB_HOST=127.0.0.1 DB_DATABASE=pcrai_test_$i DB_USERNAME=sail DB_PASSWORD=password \
    php artisan migrate:fresh --path=database/migrations --path=database/migrations/app --path=database/migrations/audit --seed --quiet
done

Lock DBs in Pool JSON

Update the pool JSON to mark assigned DBs as in-use before launching agents:

# Lock DB 01 for agent 1
flock tests/scripts/db-pool.json -c 'jq ".pools[0].locked = true" tests/scripts/db-pool.json > /tmp/pool.json && mv /tmp/pool.json tests/scripts/db-pool.json'

Browser Test Infrastructure (If Needed)

# Start Chrome instances (one per agent)
for i in $(seq 1 8); do
  PORT=$((9222 + i))
  PROFILE="/shared/chrome-profiles/agent-$(printf '%02d' $i)"
  mkdir -p "$PROFILE"
  google-chrome --headless --disable-gpu --remote-debugging-port=$PORT --user-data-dir="$PROFILE" &
done

# Start artisan servers (one per agent)
for i in $(seq 1 8); do
  PORT=$((8000 + i))
  DB="pcrai_test_$(printf '%02d' $i)"
  cd /shared/code/req_docs/code
  PHP_CLI_SERVER_WORKERS=8 DB_HOST=127.0.0.1 DB_DATABASE=$DB DB_USERNAME=sail DB_PASSWORD=password \
    php artisan serve --port=$PORT --no-reload &
done

Verify Clean State

# Check for stale Chrome locks
find /shared/chrome-profiles/ -name "SingletonLock" -delete 2>/dev/null

# Verify all DBs are accessible
for i in $(seq -w 1 8); do
  mysql -h 127.0.0.1 -u sail -ppassword -e "SELECT 1" pcrai_test_$i > /dev/null 2>&1 && echo "pcrai_test_$i: OK" || echo "pcrai_test_$i: FAIL"
done

Creation Agent Prompt Template

Generic Template

Adapt this template for any workstream. Replace {PLACEHOLDERS} with wave-specific values.

## Task: {TASK_DESCRIPTION}

### READ FIRST
Read these files before starting:
1. {GUIDE_PATH} — authoritative guide for this workstream
2. {EXISTING_EXAMPLE} — example of the desired output format
3. {INPUT_FILE} — source requirements/specifications

### What to Create
- {OUTPUT_FILES_LIST}

### Details
{DETAILS_TABLE_OR_REQUIREMENTS}

### Iterative Strategy (for test creation)
1. Create all fixtures/files
2. Dry-run to verify parsing
3. Run with minimal assertions first
4. Check actual output values
5. Update assertions to match actual values
6. Re-run to confirm all pass

### Running Tests
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
  ./vendor/bin/behat "{FEATURE_FILE_PATH}"

### Success Criteria
- {CRITERIA_LIST}

Variant: Doc Update Agent (No DB)

## Task: Update {DOC_TYPE} for {DOMAIN}

### READ FIRST
1. {AUTHORING_GUIDE} — format and conventions
2. {EXISTING_DOC} — file to update
3. {CHANGE_MANIFEST_OR_DIFF} — what changed

### What to Update
- File: {DOC_PATH}
- Sections affected: {SECTION_LIST}

### Rules
- Preserve existing REQ IDs (immutable)
- Add {#anchor-name} for new sections
- Update cross-references to other docs
- Add Reviewer Notes entry for each change

### Output
Summary of changes: sections added/modified/removed, new REQ IDs (if any).

Variant: Behat API Test Agent

## Task: Create Behat scenarios for {RULE_NAME}

### READ FIRST
1. docusaurus/docs/guides/llm/guide-llm-behat-creation.md — 40 gotchas, config pitfalls
2. {STD_FILE} — test vectors to implement
3. {EXISTING_BT_FILE} — example of working test for this rule

### What to Create
- Feature file: tests/exports/v3/{OUTPUT_FILE}
- Config: tests/support_files/{BT_KEY}/config.xlsx (copy from {BASE_CONFIG})
- Run files: tests/support_files/{BT_KEY}/*.json

### Test Vectors
{TV_TABLE}

### Environment
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE={DB_NAME} \
DB_USERNAME=sail DB_PASSWORD=password RATE_LIMIT_MAX_ATTEMPTS=9999 \
  cd /shared/code/req_docs/code && ./vendor/bin/behat "tests/exports/v3/{OUTPUT_FILE}"

### Tags
- Feature-level: @USE_SAME_CONFIG (line 1, before Feature:)
- Scenario-level: @{BT_KEY} @TV-{RULE}-{NNN}-{NNN}

### Success Criteria
- All scenarios pass
- Every TV tag maps to a real STD entry
- Assertions check specific output values (not just "no error")

Variant: Browser Test Agent

## Task: Create browser tests for {DOMAIN}

### READ FIRST
1. docusaurus/docs/guides/guide-browser-tests.md — step defs, gotchas, parallel execution
2. {STD_FILE} — test cases to implement
3. code/features/bootstrap/BrowserContext.php — available step definitions

### What to Create
- Feature file: tests/exports/browser/{OUTPUT_FILE}

### Infrastructure
Artisan: http://127.0.0.1:{ARTISAN_PORT}
Chrome: 127.0.0.1:{CHROME_PORT}
DB: {DB_NAME}

### Key Gotchas
- Vue SPA needs Pusher keys baked into build
- PHP_CLI_SERVER_WORKERS=8 required (--no-reload)
- BaseTextbox has 500ms debounce — wait 800ms+ before clicking submit
- SVG elements: use JS querySelector, not Mink CSS selectors
- Notifications auto-dismiss after 4s — use "I should see a notification containing"

### Success Criteria
- All scenarios pass
- Each scenario tests a specific TC from the STD
- No flaky waits (use explicit element waits, not sleep)

QR (Quality Review) Protocol

Quality Review is the most important step in each wave. It catches fabricated tags, false passes, and weak assertions before they enter the test suite. QR was run after every wave in Sessions 9-29 and caught 6-12% WRONG items in API tests, 3-8% in browser tests.

7-Dimension Checklist

#	Dimension	What to Check
1	Scenario execution	Every scenario runs and passes
2	Tag correctness	`@TV-`/`@TC-` tags map to real STD entries; no fabricated tags
3	Assertion quality	Assertions test specific content/behavior, not just "page loads" or "no error"
4	STD alignment	Scenarios match test cases defined in STD files
5	No duplicate coverage	New scenarios are not redundant with existing ones
6	File organization	Background usage, per-scenario tags, feature-level `@USE_SAME_CONFIG` on line 1
7	Fixture availability	Referenced config files, run files, and users exist in `tests/support_files/`

Rating Scale

Rating	Meaning	Action Required
CORRECT	Scenario passes and assertions are meaningful	None
WEAK	Passes but assertions could be stronger	Suggest improvements; fix if straightforward (tag corrections, assertion strengthening)
WRONG	Tag mismatch, missing assertion, or false-pass risk	Must fix before proceeding to next wave

QR Agent Template

## Task: Quality Review of {FILE_LIST}

You are a QR agent. Review each scenario in the given feature file(s).

### READ FIRST
1. {STD_FILE} — source of truth for TV/TC definitions
2. {FEATURE_FILES} — files to review

### For Each Scenario, Check:
1. Runs and passes (execute the test if DB assigned, otherwise review only)
2. @TV-/@TC- tags map to real STD entries
3. Assertions test the stated requirement (not just "page loads" or "no error")
4. No duplicate coverage with existing scenarios
5. Data fixtures exist in tests/support_files/
6. @USE_SAME_CONFIG on line 1 if single-config file
7. BT key tag is first scenario-level tag (not @USE_SAME_CONFIG)

### Report Format
For each scenario:
- **CORRECT**: Passes, assertions meaningful
- **WEAK**: Passes, but [specific improvement suggestion]
- **WRONG**: [specific issue — must fix]

### Fix Protocol
- Fix all WRONG items immediately
- Fix WEAK items if straightforward (tag corrections, assertion strengthening)
- For WEAK items requiring structural changes, document but do not fix

### Summary
Report totals: N CORRECT, N WEAK, N WRONG
List all WRONG items with file:line and fix applied.

Timing and Resource Rules

Rule	Detail
QR runs after all creation agents complete	Never run QR concurrent with creation
QR agents do not need a DB	They review files and check tags — no test execution unless specifically assigned
Fix all WRONG before next wave	WRONG items left unfixed compound into larger problems
One QR agent per 15-30 scenarios	~15 min per agent at this load
QR agents can review multiple files	Unlike creation agents, QR agents handle batches

Empirical WRONG Rates

Test Type	WRONG Rate	Common Issues
API tests (Behat)	6-12%	Fabricated TV tags, stale fixture references, assertions checking wrong field
Browser tests	3-8%	Incorrect CSS selectors, timing issues, assertions on auto-dismissed elements
Doc updates	2-5%	Broken cross-references, wrong anchor names, stale counts

Post-Wave Checklist

Run through this after every wave completes and QR passes:

- [ ] All creation agents completed successfully
- [ ] QR pass completed — 0 WRONG items remaining
- [ ] All new/modified files committed
- [ ] Tracking docs updated (dashboard, traceability matrix, coverage report)
- [ ] Resource pool released (DBs unlocked in pool JSON, artisan/Chrome stopped if browser wave)
- [ ] Session handoff updated with wave results
- [ ] Issues logged in traceability/known-issues.md (if any discovered)
- [ ] BT key counter updated in MEMORY.md
- [ ] Gap totals recalculated and recorded

Parallelization by Phase

This table maps the Version Update Workflow phases to agent parallelization patterns.

Phase	Parallelizable	Pattern	Max Agents
0: Confluence Ingestion	Partially	One agent per domain section	~10
1: Change Detection	Partially	Inventory extraction: 6 parallel agents	6
2: Impact Analysis	No	Requires complete Change Manifest	1
3: SRS Updates	Yes	One agent per domain	~10
4: SDS Updates	Partially	Independent sections in parallel	~5
5a: STD Updates	Yes	One agent per STD file	~10
5b: Behat API Creation	Yes	Wave-based, 8-10 per wave + QR pass	10
5c: Browser Test Creation	Yes	Wave-based, 8 per wave + QR pass	8
6: Code Mapping Sync	Yes	One agent per code module	~10
7: Validation	No	Sequential validation suite	1

Key constraint: Phases 5b and 5c are the most agent-intensive. Plan for multiple waves within each. A typical STD reconciliation takes 6-8 waves of 8-10 agents each, plus one QR wave per creation wave.

Version Upgrade Agent Patterns

When upgrading application code between versions (e.g., v3.0.0 to v3.0.1), the work decomposes into distinct phases with different parallelization characteristics. These patterns were validated during the v3.0.0-to-v3.0.1 upgrade (Session 35).

Typical Wave Structure

Wave	What	Parallelism	DB Needed	Notes
1. Code upgrade	Fetch files, merge conflicts, rebuild frontend, run migrations	Sequential, main context	Yes (all pool DBs for migration)	Human-in-the-loop for merge conflict decisions
2. Regression + classify	Run full suite via `behat-optimizer.py run --suite all`	1 optimizer run (internally parallel)	Yes (pool)	Produces JUnit XML + failure categories
3. Test fixes	Fix broken tests per failure category	Parallel agents per category	Yes (1 DB per agent)	Follow with QR pass
4. Research / manifest	Build change manifest, map files, identify new reqs	4 parallel read-only agents	No	All read-only: manifest, file-mapping, new-reqs, new-files
5. Doc updates	Update SRS, SDS, STD, code_tags.json	Parallel by doc type	No	Each agent writes different files to avoid conflicts
6. Validation	Final regression, report generation	Sequential	Yes (pool)	Single optimizer run to confirm zero regressions

Scaling Lessons (from v3.0.1 Upgrade)

Phase 5 (research/manifest) agents are fully independent — they only read code and docs, so you can run many in parallel without DB slots. During v3.0.1, 4 research agents ran simultaneously (manifest, file-mapping, new-reqs, new-files) with zero conflicts.

More commits = more Phase 5 subagents. The v3.0.1 upgrade had 13 commits, handled by 4 research agents. For upgrades with 20+ commits, split file-mapping across commit groups (e.g., commits 1-10 and 11-20 to separate agents) to avoid context overflow.

QR agents do not need DB slots — save pool DBs for test execution agents only. During v3.0.1, QR agents reviewed all test fixes without consuming any DB resources.

Limit heavy-context agents to 6 concurrent. Running more than 6 background agents with large context loads (e.g., full code analysis + fixture creation) risks session instability. Lighter agents (QR, doc updates, read-only research) can safely exceed this limit.

Doc update agents (Phase 6) can run 4 in parallel if each writes to a different doc set (SRS, SDS, STD, code_tags.json). This was validated during v3.0.1 Phase 6 with zero file conflicts.

Wave 3: Test Fix Agents (Parallel by Failure Category)

After regression identifies failures, group them by root cause (e.g., "19 WREP failures from new dependency prefix", "4 INHN flips from lims-status fix", "5 resolution guard failures"). Each category gets its own agent with a dedicated DB.

| Agent | DB | Category | Failure Count | Output |
|-------|-----|----------|--------------|--------|
| 1 | pcrai_test_01 | WREP dependency prefix | 19 | Updated assertions + version tags |
| 2 | pcrai_test_02 | INHN→INHP lims-status | 4 | Updated assertions + version tags |
| 3 | pcrai_test_03 | Resolution guard | 5 | Config fixes or KCI tags |
| QR-1 | — | Review agents 1-3 | — | Tag + assertion audit |

QR agents run after all fix agents complete and do not need a DB — they review files only (TV coverage, assertions, tags, fixture correctness).

Wave 4: Research Agents (All Read-Only)

Four parallel agents, each with a distinct research scope. None need a DB since they only read code and docs.

Agent	Scope	Key Outputs
Manifest	Diff upstream commits, catalog every changed file and function	Change manifest document
File mapping	Map changed files to existing SRS/SDS/STD docs	Impact matrix (which docs need updates)
New requirements	Identify new features, new rules, new config fields	New REQ candidates with proposed IDs
New files	Identify new app files not in code_tags.json	code_tags.json additions

Wave 5: Doc Update Agents (Parallel by Doc Type)

Each agent writes to a different doc set, so there are no file conflicts.

Agent	Doc Type	Files	Notes
1	SRS	`docusaurus/docs/srs/`	New/updated requirements
2	SDS	`docusaurus/docs/sds/`	Architecture changes, new execution patterns
3	STD	`docusaurus/docs/std/`	New test vectors, updated TV tables
4	Code mapping	`tests/catalogue/code_tags.json`	New file-to-req mappings

After all four complete, run python3 docusaurus/scripts/generate-unified-traceability.py --render-md to regenerate the traceability matrix.

Special Agent Types

KCI/KL audit agents are read-only. They review @KCI and @KL tagged scenarios against the new version's code to determine which known issues are now resolved. No DB needed; can run many in parallel.

Code contamination audit compares fetched files against the upstream tag to verify no unintended drift. Uses gh api to pull upstream file contents for comparison. Single agent, no DB.

Browser test agents for version upgrades follow the same resource pattern as new browser test creation: each needs an artisan server (unique port), a Chrome instance (unique debug port), a unique DB, and a unique Chrome profile directory. See Port Allocation above for the mapping table.

Risk Register Template

Common risks encountered across Sessions 9-29, with proven mitigations:

Risk	Likelihood	Impact	Mitigation
Agent context overflow	Medium	Agent fails mid-task	Keep scope narrow: 1 file per agent, limit input context to essential files only
DB pool exhaustion	Low	Agents queue or fail	Pre-assign DBs; max 8 test agents per wave; QR agents never get DBs
Concurrent file edits	Medium	Merge conflicts	Never assign same file to two agents; pre-assign output filenames
Fabricated TV tags	High	False traceability	QR pass catches these; 6-12% WRONG rate is normal and expected
Stale fixture data	Medium	Test failures	Always verify fixtures exist before running; copy from known-good base configs
PhpSpreadsheet corruption	Low	Config unusable	Use PHP (not Python openpyxl) for xlsx edits; keep clean base configs in `input/example/`
Permission inheritance failure	Medium	Agent prompts for permissions, stalling	Use `general-purpose` subagent type; if still failing, run from main context
Subagent polling waste	Low	Primary context burned	Never poll — launch and wait; subagents report when done
Browser test flakiness	Medium	False failures	Use explicit element waits, clean Chrome profiles, `PHP_CLI_SERVER_WORKERS=8`

Session Handoff Template

Use this at the end of each session to ensure continuity:

## Session N Handoff — {DATE}

### Status
- Wave {N} of {TOTAL}: {STATUS}
- Agents launched: {COUNT}, Completed: {COUNT}, Failed: {COUNT}

### Counts
- Scenarios created: {N}
- Scenarios passing: {N} (KCI: {N}, KL: {N})
- Rules fully closed: {N}
- Gap TVs remaining: ~{N} (~{N} actionable)

### Files Modified
{LIST_OF_FILES}

### Issues Discovered
{LIST_OF_ISSUES_WITH_IDS_OR_NONE}

### BT Key Counter
Next available: BT-{XXXX}

### Next Steps
{WHAT_NEEDS_TO_HAPPEN_NEXT}

### Human Decisions Needed
{QUESTIONS_REQUIRING_HUMAN_INPUT_OR_NONE}

Overview​

Resource Planning​

DB Pool​

Port Allocation (Browser Tests)​

BT Key and Output File Pre-Assignment​

Agent Count Estimation​

Pre-Wave Setup Checklist​

Reset Pool Databases​

Lock DBs in Pool JSON​

Browser Test Infrastructure (If Needed)​

Verify Clean State​

Creation Agent Prompt Template​

Generic Template​

Variant: Doc Update Agent (No DB)​

Variant: Behat API Test Agent​

Variant: Browser Test Agent​

QR (Quality Review) Protocol​

7-Dimension Checklist​

Rating Scale​

QR Agent Template​

Timing and Resource Rules​

Empirical WRONG Rates​

Post-Wave Checklist​

Parallelization by Phase​

Version Upgrade Agent Patterns​

Typical Wave Structure​

Scaling Lessons (from v3.0.1 Upgrade)​

Wave 3: Test Fix Agents (Parallel by Failure Category)​

Wave 4: Research Agents (All Read-Only)​

Wave 5: Doc Update Agents (Parallel by Doc Type)​

Special Agent Types​

Risk Register Template​

Session Handoff Template​