Version: 3.0.1

Guide: Creating and Editing Behat Tests

Created: 2026-02-03 Last Updated: 2026-02-12 Status: Living document - update as new patterns discovered Note: Consolidated from guide-behat-authoring.md + GUIDE_BEHAT_TEST_CREATION.md + dev-testing-guide.md execution sections

For step definitions and parameter reference, see dev-testing-guide.md.

Quick Start Checklist

Minimum steps to create a working test:

Create support directory: tests/support_files/BT-XXXX/
Copy a working config XLSX into it (prefer v3/v30/v31 configs)
Copy and modify a JSON run fixture
Create feature file in tests/exports/cucumber/v3/
Dry-run: cd /shared/code/req_docs/code && ./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run
Full run (see How Test Execution Works)
Verify assertions match actual system output

Before writing any test, verify:

Check	Where to look	Why
Rule applies to your well role	Config -> Rules Mapping sheet	Rules only fire on mapped (target, role) pairs
Expected error code exists	Config -> Error Codes sheet	Missing codes cause unhandled exceptions
Error's PREVENTS ANALYSIS flag	Config -> Error Codes sheet	Determines whether later rules can run
Outcome string	Config -> Combined Outcomes sheet	Well-level outcome comes from here, not the rule title
Well label -> role mapping	Config -> Control Labels sheet	Label determines role (PC, NEG, NTC, etc.)
Config can exercise the TV's conditions	STD decision table -> required inputs vs config capabilities	Some TVs are impossible with certain configs (see Gotcha #37)
Assertion distinguishes rule-ran from default	STD -> expected output vs default (no-error) output	If the expected outcome is the same as the default, the test can't prove the rule ran (see Gotcha #38)
Resolution codes exist for error	Config -> Error Resolutions sheet	Resolution steps fail with "Resolution is not allowed" if no codes configured (see Gotcha #41)

How Test Execution Works

File Locations

/shared/code/req_docs/tests/
  exports/cucumber/
    v3/                      # 55 consolidated feature files (BT-9xxx, verified passing)
    legacy/                  # 10 legacy files providing unique TV coverage
    archive/                 # Superseded/debug files (not run)
  support_files/BT-XXXX/    # Fixtures per test (JSON runs, XLSX configs)
  catalogue/                 # Feature catalogue, cross-references

/shared/code/req_docs/new_tests/
  exports/cucumber/          # NEW test files go here (tracked by main repo)
  support_files/BT-XXXX/    # NEW fixtures go here (tracked by main repo)

/shared/code/req_docs/code/
  features/bootstrap/        # Step definitions (FeatureContext.php, BaseFeatureContext.php)
  behat.yml                  # Behat config (paths, suites)

`tests/` vs `new_tests/`

The tests/ directory is a git subtree and is listed in .gitignore. New files created there will NOT be committed to the main repo.

For git tracking: Always copy/create new test files in new_tests/ (same structure as tests/).

For Behat execution: Files must be in tests/ because Behat looks up fixtures via tests/support_files/BT-XXXX/. So:

Create files in both tests/ (for execution) and new_tests/ (for git)
OR create in tests/ first, then copy to new_tests/ after verification
The new_tests/ copy is the authoritative version for git

How Fixture Lookup Works

The @TEST_BT-XXXX tag on a scenario determines the support files directory:

Behat reads @TEST_BT-5177 tag from the scenario
Strips TEST_ prefix -> BT-5177
All file references resolve to tests/support_files/BT-5177/
Given The configuration "foo.xlsx" is loaded -> loads support_files/BT-5177/foo.xlsx
When Upload the run file "bar.json" -> loads support_files/BT-5177/bar.json

Consequence: All scenarios sharing a @TEST_BT-XXXX tag share the same fixture directory. Feature file location (v3/, legacy/, any subdirectory) does not affect lookup.

Running Tests

cd /shared/code/req_docs/code

# Single test by feature file
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
  DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
  DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
  DB_USERNAME=sail DB_PASSWORD=password \
  ./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature"

# Single test by tag
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
  DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
  DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
  DB_USERNAME=sail DB_PASSWORD=password \
  ./vendor/bin/behat --tags=@TEST_BT-5035

# Dry-run (parse check only, no DB needed)
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run

IMPORTANT environment variables:

RATE_LIMIT_MAX_ATTEMPTS=9999 -- always required. The app has a custom rate limiter (5 req/60s per URL+IP). Without it, tests with >2 scenarios fail with "too many request" errors.
DB_HOST=127.0.0.1 -- always required. The .env file has DB_HOST=mysql (Docker hostname that does not resolve outside Docker).
DB_AUDIT_HOST=127.0.0.1 -- always required. Same Docker hostname issue. Without it, audit writes fail silently.
DB_AUDIT_DATABASE=$DB -- always required when using DB pool. Must match DB_DATABASE.

NEVER use: DB_HOST=mysql, MYSQL_ATTR_SSL_VERIFY_SERVER_CERT, /workspace/ paths.

Parallel Execution (DB Pool)

Behat's @BeforeScenario hook drops and recreates all tables, so two instances sharing the same database corrupt each other. A database pool of 10 MySQL databases (pcrai_test_01 through pcrai_test_10) enables safe parallel execution.

Setup: /shared/code/req_docs/tests/scripts/setup-test-dbs.sh (idempotent, creates all 10 DBs with grants for sail@localhost)

How it works:

CreatesApplication.php respects DB_DATABASE env var (falls back to pcrai_test if unset)
Agents checkout a DB from /shared/code/req_docs/tests/scripts/db-pool.json using flock + jq for atomicity
Each Behat run uses its own DB -- no conflicts

Checkout -> Run -> Checkin:

# Checkout first available DB
DB=$(jq -r '[.databases[] | select(.status=="available")][0].name' /shared/code/req_docs/tests/scripts/db-pool.json)
flock /shared/code/req_docs/tests/scripts/db-pool.lock jq \
  --arg db "$DB" --arg label "BT-XXXX" --arg ts "$(date -Iseconds)" \
  '(.databases[] | select(.name==$db)) |= (.status="in_use"|.locked_by=$label|.locked_at=$ts)' \
  /shared/code/req_docs/tests/scripts/db-pool.json > /shared/code/req_docs/tests/scripts/db-pool.tmp \
  && mv /shared/code/req_docs/tests/scripts/db-pool.tmp /shared/code/req_docs/tests/scripts/db-pool.json

# Run test
cd /shared/code/req_docs/code
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
  DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
  DB_DATABASE=$DB DB_AUDIT_DATABASE=$DB \
  DB_USERNAME=sail DB_PASSWORD=password \
  ./vendor/bin/behat "../tests/exports/cucumber/XX_BT-XXXX.feature"

# Checkin (always -- even on failure)
flock /shared/code/req_docs/tests/scripts/db-pool.lock jq \
  --arg db "$DB" \
  '(.databases[] | select(.name==$db)) |= (.status="available"|.locked_by=null|.locked_at=null)' \
  /shared/code/req_docs/tests/scripts/db-pool.json > /shared/code/req_docs/tests/scripts/db-pool.tmp \
  && mv /shared/code/req_docs/tests/scripts/db-pool.tmp /shared/code/req_docs/tests/scripts/db-pool.json

Pool capacity: 10 databases, so up to 10 parallel test runs. If all are in use, wait and retry.

Without pool (backward compatible): DB_DATABASE=pcrai_test still works -- the single-DB fallback is unchanged.

Fresh DBs need pre-migration before first Behat run. migrate:fresh on an empty DB (no migrations table) skips db:wipe. Fix:

mysql -h 127.0.0.1 -u sail -ppassword -e "DROP DATABASE IF EXISTS pcrai_test_XX; CREATE DATABASE pcrai_test_XX;"
cd /shared/code/req_docs/code
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE=pcrai_test_XX DB_AUDIT_DATABASE=pcrai_test_XX \
  DB_USERNAME=sail DB_PASSWORD=password APP_ENV=testing \
  php ../tests/scripts/pre-migrate.php

Or use the helper script: /shared/code/req_docs/tests/scripts/run-legacy-test.sh pcrai_test_XX feature-file.feature

Essential Rules

`@USE_SAME_CONFIG` -- MUST Be Feature-Level Tag

CRITICAL: @USE_SAME_CONFIG must go on the FEATURE tag line (line 1), NOT on individual scenarios.

The code in BaseFeatureContext.php:122 only checks getFeature()->getTags() (feature-level tags). Scenario-level @USE_SAME_CONFIG tags are silently ignored -- the config reloads every scenario, causing:

~2-3 min wasted per extra scenario (full migrate:fresh + config import each time)
PHP timeout on heavy configs (cumulative max_execution_time exhaustion)
A 5-scenario test takes 14+ minutes instead of ~4 minutes

With `@USE_SAME_CONFIG` at feature level	Without
First scenario: ~2-3 min (full migrate + import)	Every scenario: ~2-3 min each
Subsequent scenarios: ~10-15 seconds each	5-scenario file: ~14 minutes
5-scenario file: ~4 minutes	PHP timeout risk on heavy configs

Correct pattern:

@REQ_BT-XXXX @USE_SAME_CONFIG
Feature: My tests (all use same config)

  @TEST_BT-XXXX @TV-RULE-001
  Scenario: First test
    Given The configuration "my-config.xlsx" is loaded   # <-- LOADED
    ...

  @TEST_BT-XXXX @TV-RULE-002
  Scenario: Second test
    Given The configuration "my-config.xlsx" is loaded   # <-- SKIPPED (reused)
    ...

Only omit @USE_SAME_CONFIG when scenarios genuinely need different configs (e.g., BT-9509 Westgard tests with per-scenario config files).

NEVER put @USE_SAME_CONFIG as the first scenario-level tag. BaseFeatureContext::beforeScenario() uses Arr::first($event->getScenario()->getTags()) to determine the BT key for fixture lookup. If @USE_SAME_CONFIG is the first scenario tag, the fixture directory resolves to support_files/USE_SAME_CONFIG/ instead of support_files/BT-XXXX/, causing "file not found" errors on every scenario. The file-level tag on line 1 is sufficient — do not duplicate it on scenarios. If you must add it to a scenario for documentation purposes, ensure @TEST_BT-XXXX comes first:

  @TEST_BT-XXXX @USE_SAME_CONFIG @TV-RULE-001   <- OK (TEST_BT first)
  @USE_SAME_CONFIG @TEST_BT-XXXX @TV-RULE-001   <- BROKEN (fixture lookup fails)

Assert Control Wells BEFORE Patient Wells

When a patient well fails (e.g., CONTROL_MISSING), Behat stops at that assertion and skips remaining steps. If control well assertions come after the patient assertion, you never see whether controls passed or failed -- losing critical diagnostic information.

Always order assertions: controls first, then patients.

Then well "C11" should have "Control Passed" outcome
And well "C13" should have "Control Passed" outcome
And well "B1" should have "Detected" outcome

PhpSpreadsheet for Config Edits (with openpyxl Exceptions)

PhpSpreadsheet (PHP) is REQUIRED for config edits that modify Rules Mapping, Error Codes, Error Resolutions, or sheet structure. openpyxl delete_rows() corrupts Rules Mapping sheets (shifts cells, leaves nulls in ROLE column -> PHP crashes). openpyxl also uses inline strings instead of shared strings, which can cause subtle format issues.

openpyxl (Python) is ACCEPTABLE for:

(a) Read-only inspection of any sheet
(b) Modifying only cell values in QIR, Curve Control Limits, Delta CT sheets (NOT Rules Mapping)

openpyxl delete_rows() is NEVER safe on any sheet.

A reusable Westgard removal script exists:

cd /shared/code/req_docs/code
php scripts/remove-westgard-rules.php "../tests/support_files/BT-XXXX/config.xlsx"

For other config edits, write similar PHP CLI scripts using PhpSpreadsheet. The composer autoloader is at /shared/code/req_docs/code/vendor/autoload.php.

Config Preference

Preference	Configs	Notes
Preferred (v3)	`quest-v3.xlsx`, v30/v31 variants	0 Westgard rows, extraction-aware
Acceptable (v30)	`Viracor v30 pp based.xlsx`	V30, generally clean
Avoid (v2)	`Viracor 2.25.0.xlsx`, `Quest_PP_2_22.xlsx`	Westgard failures, wrong rule ordering

V2 config issues: Missing WESTGARDS_MISSED error code, wrong LINEAR_REGRESSION_VALIDATION precedence, CF=0 treated as no multiplier, negative quantities cause MySQL overflow.

Strategy for v2 configs (when unavoidable):

Start with minimal assertions (CT, CLS only)
Verify standard curve pipeline before adding quantity assertions
Check LINEAR_REGRESSION_VALIDATION precedence (must be < 38)
Use PhpSpreadsheet for all edits
Remove Westgard mappings unless testing Westgard behavior

Rate Limiting

Always include RATE_LIMIT_MAX_ATTEMPTS=9999. The app has a custom rate limiter (5 req/60s per URL+IP). With @USE_SAME_CONFIG, scenarios run fast enough to hit this limit. Without it, tests with >2 scenarios fail with "too many request" errors.

Creating a New Test

Step 1: Create Support Directories

# For Behat execution (required):
mkdir /shared/code/req_docs/tests/support_files/BT-XXXX/

# For git tracking (required):
mkdir -p /shared/code/req_docs/new_tests/support_files/BT-XXXX/

Name must match the @TEST_BT-XXXX tag you will use.

Step 2: Create/Copy Config XLSX

Preferred approach: Copy from a working test with similar config needs.

# Find configs for a specific rule
find /shared/code/req_docs/tests/support_files/ -name "*.xlsx" | head -20

# Copy a known-working config
cp /shared/code/req_docs/tests/support_files/BT-5134/config.xlsx \
   /shared/code/req_docs/tests/support_files/BT-XXXX/

Before creating fixtures, check the config:

Sheet	What to check
Rules Mapping	Rule applies to your intended well role
Control Labels	Well labels map to intended roles
Error Codes	Expected error code exists; check PREVENTS ANALYSIS flag
Combined Outcomes	Exact outcome strings for your assertions
Error Resolutions	RULES SKIP ON RE-ANALYSIS matches rule name (for resolution tests)
QIR - Quantification settings	Slope/efficiency/R2 thresholds (clear if not testing these)

Step 3: Create JSON Run Files

Always start from an existing working fixture:

ls /shared/code/req_docs/tests/support_files/BT-5001/*.json

Key JSON structure:

{
  "run_info": { "run_name": "MY_TEST.json", "thermocycler_id": "275000953" },
  "targets": { "t1": { "mix_name": "NOR2", "target_name": "NOR2", "auto_baseline": true } },
  "wells": { "w1": { "well_number": "a1", "label": "|T:NOR2|R:S2|", "well_uuid": "w1" } },
  "observations": { "o1": { "target": "NOR2", "ct": 30.0, "dxai_cls": "Pos", "well_uuid": "w1", "readings": [...] } }
}

Critical fields:

well_number: Always lowercase in JSON (a1), uppercase in Gherkin (A1)
label: Must match config's expected format (pipe-delimited, plain text, etc.)
well_uuid: Every well_uuid in observations must have a matching entry in wells -- orphaned observations are silently ignored
readings: Array of fluorescence values -- copy from a working fixture, do not fabricate
ct: The CT value under test
dxai_cls: Classification (Pos, Neg, Ambiguous) -- but see Gotcha #5, this is a hint not final

Step 4: Create Feature File

@REQ_BT-XXXX @USE_SAME_CONFIG
Feature: Description of what's being tested

	@TEST_BT-XXXX @TV-RULE-001-001
	Scenario: First test -- config loads here
		Given The configuration "my-config.xlsx" is loaded
		When Upload the run file "test1.json"
		And Open the run file "test1.json"
		Then well "C11" should have "Control Passed" outcome
		And well "A1" should have "Detected" outcome

	@TEST_BT-XXXX @TV-RULE-001-002
	Scenario: Second test -- config reused automatically
		Given The configuration "my-config.xlsx" is loaded
		When Upload the run file "test2.json"
		And Open the run file "test2.json"
		Then well "C11" should have "Control Passed" outcome
		And well "A1" should have "Not Detected" outcome

Key patterns:

@USE_SAME_CONFIG on the feature line (line 1), never on scenarios
Assert control wells before patient wells
Use tabs for indentation (matching existing files). Mixed tabs/spaces cause silent parse failures
Scenario Outline: requires an Examples: table. If no placeholders, use Scenario: instead

Scenario Outline conversion -- when 3+ scenarios share identical step structure but differ only in data values:

# TV Tags: TV-QUANTVAL-005-001, TV-QUANTVAL-005-002, TV-QUANTVAL-005-003
@TEST_BT-9001 @TV-QUANTVAL-005
Scenario Outline: <description>
    Given The configuration "config.xlsx" is loaded
    When Upload the run file "<run_file>"
    And Open the run file "<run_file>"
    Then well "A1" should have "<outcome>" outcome

    Examples:
      | description           | run_file      | outcome      | # TV Tag              |
      | CT below threshold    | test_low.json | Detected     | # TV-QUANTVAL-005-001 |
      | CT above threshold    | test_hi.json  | Not Detected | # TV-QUANTVAL-005-002 |
      | CT at exact boundary  | test_bnd.json | Detected     | # TV-QUANTVAL-005-003 |

Add # TV Tags: comment above the Outline listing all TV IDs
The # TV Tag comment column in Examples preserves per-row traceability
Pipe characters | in description text conflict with Gherkin table syntax -- escape or rephrase

File naming: {priority}_{BT-KEY}.feature in tests/exports/cucumber/v3/

Step 5: Verify Parsing

cd /shared/code/req_docs/code
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run

Step 6: Run Test

cd /shared/code/req_docs/code
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
  DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
  DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
  DB_USERNAME=sail DB_PASSWORD=password \
  ./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature"

Tagging Convention

Tag	Level	Purpose	Example
`@REQ_BT-XXXX`	Feature	Links to Jira requirement	`@REQ_BT-5268`
`@TEST_BT-XXXX`	Scenario	Test ID + fixture directory lookup	`@TEST_BT-9001`
`@TV-RULE-REQ-TV`	Scenario	Test vector traceability	`@TV-QUANTVAL-005-001`
`@USE_SAME_CONFIG`	Feature (line 1)	All scenarios share first scenario's config	Quest features
`@KNOWN_CODE_ISSUE`	Scenario	Test documents expected behavior but code is incomplete	IC skip tests
`@KNOWN_LIMITATION`	Scenario	Test passes but cannot fully cover intended TV due to system constraints	SWCOMBOUT TV-001
`@DUPLICATE_COVERAGE`	Scenario	Functionally identical to another scenario (atomic step limitation)	NEGSIGMOID TV-003-001
`@MISTAGGED`	Scenario	TV tag does not match what scenario actually tests	NEGSIGMOID TV-001-006
`@COMBINED_OUTCOME`	Scenario	Test involves outcomes across multiple runs/mixes	Combined outcome features
`@UNIQUE`	Scenario	Test uses unique/isolated test data	Isolated test data
`@UNIVERSAL`	Scenario	Test applies universally across configurations	Universal edge cases
`@EXAMPLE_TEST`	Scenario	Example/demonstration test (not in core regression)	Demo tests

Before Tagging @KNOWN_LIMITATION

Consult the False KL Checklist before applying this tag. ~41% of KL tags in Waves 1-3 were incorrectly applied due to false assumptions about config editability, fixture engineering, and infrastructure capabilities.

Common Gherkin Steps

# Config + run file
Given The configuration "{config}.xlsx" is loaded
When Upload the run file "{file}.json"
And Open the run file "{file}.json"

# Well assertions
Then well "A1" should have "{outcome}" outcome
And well "A1" should have "{mix}" mix
And well "A1" should have "{role}" sample role
And well "A1" should have "true" is crossover

# Observation assertions
And well "A1" observation "{target}" should have "{cls}" final cls
And well "A1" observation "{target}" should have "{ct}" final ct
And well "A1" observation "{target}" should have "{qty}" quantity

# Resolution + re-analysis
When Apply resolution to well "A1" with "{resolution}"
And Re analyse the run file

# Resolution with individual curve result
When Apply resolution to well "A1" with "Set individual curve results" and "Manual classification" to observation "{TARGET}" with "{CLS}"
And Re analyse the run file

For the full step definition reference (22 steps with parameters, exceptions, and regex patterns), see dev-testing-guide.md.

Gotchas (Hard-Won Lessons)

1. Well Numbers -- Uppercase in Gherkin, Lowercase in JSON

JSON run files use lowercase: "well_number": "a2"
Gherkin steps use uppercase: well "A2" should have "Detected" outcome
The system converts during import. Always use uppercase in feature files.

2. Rules Only Apply to Mapped Roles

Critical: Each rule in the config's Rules Mapping sheet is mapped to specific (target, role) combinations. If your test well's role is not in the mapping, the rule will never fire on that well, even if the data should trigger it.

How to check: Open the config XLSX -> Rules Mapping sheet -> find your rule -> check which roles are listed.

Example: DELTA_CT in v31 config maps to: NTC, PC, COVIDPPC, MPXPC, COVIDNPC, COVIDPNC, MPXNC, NEG. Patient wells are NOT included. So DELTA_CT never evaluates on patient wells.

Well labels determine roles via the Control Labels sheet:

Label pattern	Role	Type
`PC*`	PC	Positive control
`MPXPC*`	MPXPC	Positive control
`MPXNC*`	MPXNC	Negative control
`NEG*`	NEG	Negative control
`NTC`	NTC	No template control
(anything else)	Patient	Patient

3. Config Validation Cascade + Rule Precedence

Multiple rules validate in precedence order. A test targeting one rule can fail because a different rule fires first:

Common precedence chain:

COMBINED_OUTCOME_CONTROL (~order 10) -- control pass/fail
Negative control rules (BNC, BICQUAL) (~order 16)
STANDARD_OUTSIDE_CT_RANGE -- CT range checks
BAD_GRADIENT / BAD_EFFICIENCY / BAD_R2 -- linear regression
DELTA_CT (~order 46) -- CT threshold / CLS mismatch
COMBINED_OUTCOME (~order 47) -- patient well outcome
Sample label has an invalid accession (FILEIMPORT)

Key interactions:

Rules with is_allow_error_wells = false will not run if an earlier rule already set an error on the well
PREVENTS ANALYSIS = NO on your target rule's error means later rules will still evaluate and may overwrite
To isolate a rule, remove interfering higher-precedence rule mappings for the test well's role using PhpSpreadsheet

4. Well Outcome vs Observation-Level Errors

The wellShouldHaveOutcome step checks the well-level outcome, not observation-level errors:

Individual rules (DELTA_CT, QUANTVAL, etc.) set observation-level errors
COMBINED_OUTCOME (for patient wells) or COMBINED_OUTCOME_CONTROL (for controls) set the well-level outcome
The well outcome string comes from the Combined Outcomes sheet, not the rule title

Exception: Errors with PREVENTS ANALYSIS = YES override the well outcome directly with the error message.

5. `dxai_cls` in JSON is a Hint, Not Final

The dxai_cls field in run file JSON is the instrument's pre-classification, but the system re-classifies based on its own logic. Setting dxai_cls: "Neg" with ct: 25 and amplification curve data will result in the system classifying as "Pos" (because CT=25 shows amplification).

The curve shape always wins. Even explicitly setting dxai_cls: "Pos" will be overridden to Neg if the readings array has a downward/flat curve shape. The system's setMachineClsCalculatedFromMachineCt() method in Observation.php determines classification from the readings, not from dxai_cls. This means fixture-level dxai_cls cannot be used to isolate classification-dependent rule behavior from the curve analysis path (learned from BT-9544 QTYWEIGHT review).

To create a genuinely Neg observation:

Set ct to null, dxai_ct to null, dxai_cls to "Neg"
Replace the readings array with flat values (e.g., 40 x 1.0)

To create a genuinely Pos observation:

Set ct to a value below the target's MAX CT threshold, dxai_ct to the same value, dxai_cls to "Pos"
Use realistic amplification curve readings (copy from a working fixture)

6. Gradient Requires CT Variation Across Wells

Standard curve tests (QUANTVAL) use S2/S4/S6 wells. If all wells have identical CT values, the gradient is 0 (flat line), which triggers BAD_GRADIENT.

Fix: When creating fixtures with different CT values, apply an offset to the template data rather than setting all wells to the same value. Preserve the relative differences between wells.

# WRONG - all wells get CT=38, gradient=0
for obs in observations:
    obs['ct'] = 38.0

# RIGHT - offset from template values, preserving gradient
for obs in observations:
    if obs['ct'] is not None:
        obs['ct'] = obs['ct'] + 2.0  # template had 30, 31, 32 -> now 32, 33, 34

7. QIR Settings Interference

The QIR - Quantification settings sheet in config XLSX has MIN SLOPE / MAX SLOPE, MIN EFFICIENCY / MAX EFFICIENCY, MIN R2, and MIN CONTROLS. When testing CT range validation specifically, clear all other QIR settings to prevent interference.

Warning: openpyxl cell value assignment is acceptable for QIR sheet edits (cell values only, no row deletion), but read back and verify after save. See Essential Rules > PhpSpreadsheet for the full openpyxl policy.

8. Each Test Takes ~2 Minutes

Each scenario takes approximately 1.5-3 minutes due to:

Database refresh (drop all tables, run all migrations)
Config import (parse XLSX, seed database)
Run file processing

A 9-scenario feature file takes ~16-18 minutes. With @USE_SAME_CONFIG: first scenario ~3 min, subsequent ~15 sec each. Plan accordingly.

9. Scenario Outline Requires Examples Table

Scenario Outline: MUST have an Examples: table. Without it, the Gherkin parser fails with:

Expected Step, Examples table, or end of Scenario, but got text: "Then"

If the scenario does not use placeholders (<param>), use Scenario: instead of Scenario Outline:.

10. Mixed Indentation Breaks Parsing

Gherkin requires consistent indentation within a file. Mixing tabs and spaces (e.g., \t\t Given) causes silent parsing failures. Use tabs only, matching the pattern of existing files.

11. LINEAR_REGRESSION_VALIDATION Must Run Before STDQT

LINEAR_REGRESSION_VALIDATION computes slope and intercept from standard wells and stores them on RunTarget. STDQT reads those values to compute patient quantities.

If LINEAR_REGRESSION_VALIDATION has a higher run order (precedence) than STDQT, quantities will silently be 0/null. STDQT calls cannotQuantify() which checks for slope/intercept -- if they are null (because regression has not run yet), it exits without computing.

Correct: LINEAR_REGRESSION_VALIDATION at run order 8, STDQT at 38
Wrong: LINEAR_REGRESSION_VALIDATION at run order 49 (only seen in openpyxl-generated configs)

12. openpyxl-Generated XLSX Uses Inline Strings

Configs heavily modified by openpyxl use inline strings (t="inlineStr") instead of shared strings format (t="s" with xl/sharedStrings.xml). While PhpSpreadsheet generally handles both, subtle edge cases may arise.

Best practice: Copy an existing working .xlsx and modify only the cells you need, rather than generating a new workbook from scratch.

13. Config Swapping Can Introduce Missing Error Codes

Copying a config from one test to another can fail if the source config references error codes not registered in the application. Symptom: No Error Code Defined for code: XXXXX (Exception) at run upload time.

Fix: Either:

Go back to the original config and fix only what is needed
Remove the offending rule mappings from the swapped config
Add the missing error code to the Error Codes sheet

14. Control Wells Trigger Westgard QC Cascade

V2 configs with Westgard rules mapped to PEC role will fire on control wells labeled R:LO POS (because LO POS maps to PEC via the Control Labels sheet). Without matching Westgard Limits date ranges in the test DB, this triggers WESTGARDS_MISSED.

Cascade: If WESTGARDS_MISSED has PREVENTS ANALYSIS = YES, the control well cannot complete analysis -> "Associate mix and extraction errors" rule propagates the error to patient wells.

DO NOT remove control wells to avoid this -- that triggers MINCONTROLS errors instead.

Partial fix: Add WESTGARDS_MISSED to the Error Codes sheet with PREVENTS ANALYSIS = NO and ERROR TYPE = Warning. Some controls may still fail. Prefer removing Westgard rule mappings entirely (Gotcha #15) unless specifically testing Westgard behavior.

Warning: The openpyxl code below is for Error Codes cell edits only (acceptable per policy). Do NOT use openpyxl delete_rows() on any sheet.

# openpyxl partial fix for v2 configs with Westgard rules (prefer Gotcha #15)
ws = wb["Error Codes"]
new_row = ws.max_row + 1
ws.cell(row=new_row, column=1, value="WESTGARDS_MISSED")
ws.cell(row=new_row, column=2, value="Westgard Limit Missed...")
ws.cell(row=new_row, column=3, value="Warning")  # NOT "Error"
ws.cell(row=new_row, column=4, value="Well")
ws.cell(row=new_row, column=6, value="NO")  # PREVENTS ANALYSIS = NO
ws.cell(row=new_row, column=7, value="NO")
ws.cell(row=new_row, column=8, value="NO")

15. Remove Westgard Rule Mappings for Test Isolation (Preferred)

V2 configs with Westgard rules cause cascade failures on control wells when no Westgard Limits date ranges exist in the test DB.

PREFERRED approach: Remove Westgard rule mappings from config entirely. Better test isolation since you are testing the target rule, not Westgard. Use remove-westgard-rules.php:

cd /shared/code/req_docs/code
php scripts/remove-westgard-rules.php "../tests/support_files/BT-XXXX/config.xlsx"

Caution: The script only checks column B for "westgard" keyword. Rules with IDs like WG* (e.g., "Check 13S after 12S", "Check 22S after 13S") need separate removal. Also clear the Westgard Limits and Westgard Events sheets entirely.

Only keep Westgard rules when specifically testing Westgard behavior.

16. DB Contention from Parallel Behat Runs

Running multiple Behat tests simultaneously against the same database causes @BeforeScenario hook failures:

Table 'migrations' doesn't exist (first agent's DROP collides with second agent's migration)
Table 'users' already exists (rebuild collision)

Fix: Use the DB pool. See How Test Execution Works > Parallel Execution.

17. MIN_CONTROLS vs MINEXTRACT: Config Must Match Extraction Settings

Two different programmatic rules handle minimum control validation:

Rule	Lookup method	Works when
`MIN_CONTROLS`	Non-extraction (`!hasExtractionSettings()`)	`use_extraction_instruments=false`
`MINEXTRACT`	Extraction-aware (mix, date, instrument, batch)	`use_extraction_instruments=true`

The trap: Configs with use_extraction_instruments=true (Quest EZ configs) auto-assign extraction settings to ALL wells. If such a config uses MIN_CONTROLS, the non-extraction lookup returns empty for every well -> every patient gets CONTROL_MISSING.

Error message tells you which rule ran:

CONTROL_MISSING: "This well is missing the required associated controls..." -- MIN_CONTROLS ran
EXTRACTION_CONTROLS_MISSING: "This well is missing the required associated extraction controls..." -- MINEXTRACT ran

18. Remove Non-Westgard Interfering Rules for Test Isolation

Other rules can also override your target rule's results. When testing a specific rule in isolation, check if higher-precedence rules fire on the same well role and overwrite the outcome.

General pattern:

Run the test -- if the wrong error message appears, identify which rule produced it
Check Rules Mapping sheet -- find all rules mapped to your test well's role
Identify rules with higher run order than your target rule
Remove those rule mappings (for the specific role, not globally) using PhpSpreadsheet
Script: /shared/code/req_docs/code/scripts/setup-bt9505-config.php shows the pattern

Key insight: PREVENTS ANALYSIS=NO on your target rule's error code means later rules with IS ALLOW ERROR WELLS=false will NOT skip the well -- they will evaluate and potentially overwrite.

19. Transient "Invalid JSON was returned from the route" Errors

Config loading occasionally fails with "Invalid JSON was returned from the route" -- Laravel returns HTML error page instead of JSON API response. This is a transient infrastructure issue, not a test bug.

Causes: Passport OAuth token expiry, momentary DB connection timeout, memory pressure after multiple scenarios with re-analysis.

Fix: Simply rerun the test. If it persists across 2+ runs, investigate Laravel Passport keys and DB connectivity.

20. Heavy Configs (5000+ Rules Mapping Rows) Cause PHP Timeout

Configs with very large Rules Mapping sheets can exceed PHP's 90-second max_execution_time during well processing.

Mitigations:

Use @USE_SAME_CONFIG at feature level (avoids repeated config loads -- this alone fixed BT-9506)
Keep scenario count low (3-4 per feature) for heavy configs
Consider using a lighter config variant
Remove unused rule mappings to reduce processing

21. Import Validator Masks Rule-Engine Errors

Invalid data values (SD=0, null, text, negative) are rejected by the config import validator before the rule engine runs. Tests expecting rule-engine error codes will get import-level errors instead. Check whether the import validator catches the condition first.

22. Single-Mix Configs Cannot Test Target Mismatch Independently

Kit configs with only one observable target (e.g., NOR2) cannot test "target mismatch" independent of "role mismatch" -- changing the target to something not in the config produces the same error as changing the role. True target-mismatch tests require a multi-mix config.

23. Specimen Type Comes from Config, Not Fixture

The system determines specimen type through the config's Test Codes sheet (client code -> mix -> specimen type), NOT from the JSON fixture. The fixture only needs the C:XXXX client code in the well label.

Chain: Well label C:1408 -> Test Codes sheet -> ENTF mix -> Fecal specimen -> LOD threshold

24. Orphaned Observations on Non-Existent Wells

JSON fixtures can have observations referencing well_uuid values not declared in the wells section. These orphaned observations are silently ignored during import.

Fix: Always verify every well_uuid in observations has a matching entry in wells.

25. `mix_results` Count Comes from Config, Not Wells

For SWCOMBOUT and similar rules, mix_count is determined by the Combined Outcomes sheet in the config, not by the number of patient wells in the fixture.

26. Resolution Step Atomicity -- Cannot Isolate Skip Paths

The Behat step "Apply resolution...with 'Set individual curve results' and 'Manual classification'" atomically sets BOTH resolution code AND manual classification. Tests claiming to isolate one skip path from the other are functionally identical duplicates.

Tag duplicates with @DUPLICATE_COVERAGE rather than deleting (preserves traceability).

Behat only supports "Pos" and "Neg" for manual classification values -- "Amb" (Ambiguous) is not supported. Tests requiring Ambiguous CLS must be tagged @KNOWN_LIMITATION.

27. Vacuous Scenarios -- Tests That Pass But Test Nothing

A scenario is vacuous when it passes for the wrong reason:

Reusing a qualitative fixture for a quantitative scope test
Config that does not activate the rule being tested
Assertion on a generic message that any rule could produce

Fix: Remove vacuous scenarios and document why, or create proper fixtures.

28. Extraction Settings: MIN_CONTROLS vs MINEXTRACT

Two different rules handle minimum control validation. See Gotcha #17 for full details. The key trap: configs with use_extraction_instruments=true + MIN_CONTROLS rule will always produce CONTROL_MISSING on every patient well.

How to check: Config -> Rules sheet -> find "minimum required controls" -> check column B (PROGRAMMATIC RULE NAME).

29. Heavy Configs and Timeouts (Config-Specific)

Configs with very large Rules Mapping sheets (e.g., Quest_EZ_MINCONTROLS.xlsx at 5014 rows) can exceed PHP's 90-second max_execution_time. The timeout typically hits during Collection operations in Laravel's Eloquent layer.

Symptoms: PHP Fatal error: Maximum execution time of 90 seconds exceeded in Collection.php or HasAttributes.php.

See Gotcha #20 for mitigations.

30. V2 Westgard Cleanup Needs More Than `remove-westgard-rules.php`

The script only checks column B for "westgard" keyword. Rules with RULE ID patterns like WG* (e.g., "Check 13S after 12S", "Check 22S after 13S") need separate removal. Also clear Westgard Limits and Westgard Events sheets entirely.

31. PICQUANT Rules Lack Null IC Guards -- ISSUE-013

PicquantRule.php and PicquantSerumRule.php do not check for null IC observation or null IC CT before processing. TV-001-017 is unreachable (FILEIMPORT catches missing obs first). TV-001-018 with null IC CT causes false inhibition instead of skip.

32. DB_AUDIT_DATABASE Must Match DB_DATABASE

When using the DB pool, pass DB_AUDIT_DATABASE=$DB alongside DB_DATABASE=$DB. Without this, a stale audit DB reference from a previous agent's config cache can cause failures in scenarios that write audit records (e.g., WDCLSCINVSIG).

33. DB_HOST and DB_AUDIT_HOST Must Be 127.0.0.1

The .env file has DB_HOST=mysql and DB_AUDIT_HOST=mysql -- Docker hostnames that do not resolve outside Docker. Always override both when running tests. Without DB_AUDIT_HOST, audit writes fail silently on scenarios that trigger the mysql_audit connection.

34. Fresh DBs Need Pre-Migration

migrate:fresh on an empty DB (no migrations table) skips db:wipe due to FreshCommand.php checking repositoryExists(). This causes the BeforeScenario hook to fail with Table 'migrations' doesn't exist.

See How Test Execution Works > Parallel Execution for the pre-migration command.

35. Rule Precedence and Interference (Detail)

Rules validate in precedence order. A test targeting one rule can fail because a higher-precedence rule fires first and overwrites the outcome.

Key interactions:

Rules with is_allow_error_wells = false will not run if an earlier rule already set an error
PREVENTS ANALYSIS = NO on your target rule's error means later rules will still evaluate and may overwrite
To isolate a rule, remove interfering higher-precedence rule mappings for the test well's role using PhpSpreadsheet

BT-9505 example: COMBINED_OUTCOME_CONTROL (order 10) sets CT_BOUNDARY_HIT with PREVENTS ANALYSIS=NO. Then BNC (order 16) evaluates and overwrites the outcome. Fix: remove BNC and BICQUAL rule mappings for the test well's role.

36. Missing Error Codes When Swapping Configs

Fix: Add the missing error code to the Error Codes sheet, or remove the rule mapping that produces it.

37. Config-Dependent TV Feasibility

Not every TV (test vector) can be tested with every config. Before writing a test, verify the config's structure supports the conditions the TV requires. If it doesn't, you need a different config -- or the TV must be documented as infeasible for that config and tagged appropriately.

Real examples from Wave 7-8 review:

TV	Required Condition	Config Limitation	Resolution
TV-INDETCTS-001-003	Zero non-IC targets on well	Nottingham config mandates E gene + S gene (2 non-IC targets always present)	Retagged to TV-001-007 (non-Pos classification). TV-001-003 needs a different config or is structurally infeasible.
TV-RWAC-002-004	Fallback control with `use_fallback_shared_controls=true`	Test config has `use_fallback_shared_controls=false`	Tagged @DUPLICATE_COVERAGE; true TV-002-004 needs a different config variant.
TV-MINCTRL-002-005	Control well with MINCONTROL resolution code applied	No scenario applied a resolution before checking	Retagged to TV-001-002 (role skip). TV-002-005 needs a new scenario.

Decision framework when a TV is infeasible with your config:

Can you use a different config? Check Config Preference (Essential Rules). Prefer v3 configs.
Can you modify the config? Use PhpSpreadsheet. But don't break other scenarios sharing the config.
Neither works? Retag the scenario to the TV it actually exercises. Document the infeasible TV as a coverage gap in the feature file header. Tag scenario with an appropriate qualifier if needed.

See also: Gotcha #17 (MIN_CONTROLS vs MINEXTRACT), Gotcha #22 (single-mix target mismatch).

38. Negative-Path Tests and @NEGATIVE_PATH_GUARD

When a rule only produces errors (never modifies the default outcome), a "no error" assertion is inherently weak -- it would pass even if the rule were completely disabled.

Example: ADJZIKA scenarios in BT-9539 all assert "Detected" (the default Zika outcome). Three scenarios test conditions where ADJ_Zika should NOT fire. But "Detected" is what you'd get whether ADJ_Zika evaluated and found no error, OR whether ADJ_Zika never ran at all. No Behat step exists to confirm rule evaluation occurred.

Pattern for negative-path tests:

Tag with @NEGATIVE_PATH_GUARD to clearly signal the assertion's limitation
Add a comment explaining what the scenario verifies and what it cannot distinguish
Pair with positive-case tests elsewhere that prove the rule CAN fire. Reference the positive case in the comment (e.g., "Positive-case ADJ_Zika coverage in legacy BT-5155").
If no positive case exists, the negative-path test has very low diagnostic value -- consider whether it's worth keeping.

When IS a negative-path test sufficient?

The rule has a positive-case companion test proving it works
The scenario is a regression guard (preventing false positives in production)
The scenario verifies a DIFFERENT rule fires instead (e.g., asserting "Mixes missing" instead of "Detected" proves mixes-missing ran, even if the target rule didn't)

39. Resolution Steps Must Target the Specific Failing Observation

When a well has multiple observations (e.g., QHSV1 and QHSV2 on an HSV well), the resolution step must target the observation that actually triggered the error -- not just any observation on the well.

Example from BT-9535 GAP-005: Well S1 (NEC role) had two observations:

QHSV1: upward curve, Neg cls -> passes DSIGMOIDCTRL (Neg + upward = no error)
QHSV2: downward curve, Neg cls -> fails DSIGMOIDCTRL (Neg + non-downward = error)

The original test applied resolution to QHSV1 (the passing observation). The test passed because the QHSV2 error was never cleared -- but the resolution step appeared to "work" because the initial error assertion on the well had already been satisfied. The fix was changing the resolution target to QHSV2.

Rule: When writing resolution scenarios for multi-observation wells:

Identify which observation triggers the error (check the rule's evaluation logic, not just the well-level outcome)
Apply resolution to THAT specific observation
After re-analysis, assert both the well outcome AND the resolved observation's final cls/ct

This is especially important when wells are remapped (e.g., well_number: "S1" maps to physical well a4), because the observation data may not match what you'd expect from the well label.

40. @USE_SAME_CONFIG Must NOT Be First Scenario-Level Tag

When @USE_SAME_CONFIG is placed on a scenario's tag line, it must NOT be the first tag. BaseFeatureContext::beforeScenario() uses Arr::first($event->getScenario()->getTags()) to determine the BT key for fixture directory lookup (support_files/BT-XXXX/). If @USE_SAME_CONFIG is the first scenario tag, the fixture path resolves to support_files/USE_SAME_CONFIG/ -- which doesn't exist -- causing "file not found" errors on every scenario.

Wrong:

@USE_SAME_CONFIG
Feature: My test

    @USE_SAME_CONFIG @TEST_BT-5700 @REQ_BT-5368
    Scenario Outline: Test something

Right (file-level only -- preferred):

@USE_SAME_CONFIG
Feature: My test

    @TEST_BT-5700 @REQ_BT-5368
    Scenario Outline: Test something

Also right (if scenario-level desired, put after @TEST):

@USE_SAME_CONFIG
Feature: My test

    @TEST_BT-5700 @USE_SAME_CONFIG @REQ_BT-5368
    Scenario Outline: Test something

Best practice: Put @USE_SAME_CONFIG on line 1 (feature level) only. The scenario-level tag is redundant since the code checks getFeature()->getTags().

41. Config Must Have Resolution Codes for Well Resolution Tests

If your scenario resolves a well (e.g., When Apply resolution to well "A1" with "..."), the config XLSX must have resolution codes configured for the error on that well. Without them, the API returns "Resolution is not allowed for the selected well" and the scenario fails.

Two categories of resolution codes exist:

Well State	Required Config	Where in XLSX
Well has a specific error code (e.g., `SYSTEMIC_INHIBITON_DETECTED`)	Resolution codes linked to that error code	Error Resolutions sheet -- rows where the error code column matches
Patient-level well (no error code, or general resolution)	Patient-level resolution codes (non-error-linked)	Error Resolutions sheet -- rows with no error code association

Symptom: Scenario fails at the resolution step with Resolution is not allowed for the selected well. The well processes correctly, the error fires correctly, but the resolution step is rejected because the config has zero resolution codes for that error.

Fix: Open the config XLSX Error Resolutions sheet and add resolution code rows for the relevant error code. Or use a config that already has them (most v3/v30/v31 configs have resolution codes for common error codes).

Real example: 5 SYSINH scenarios failed because the config had 0 resolution codes for SYSTEMIC_INHIBITON_DETECTED and 0 patient-level resolution codes. The rule fired correctly, the error was set correctly, but the resolution step was rejected. Adding resolution codes to the Error Resolutions sheet fixed all 5.

Pre-flight check: Before writing any resolution scenario, open the config's Error Resolutions sheet and verify rows exist for the error code your scenario will resolve. Add to the Before writing any test verification table.

Test-First Policy

Tests document expected behavior per SRS/STD specifications:

Tests are NOT skipped due to code implementation gaps
Failing tests indicate code bugs, not test errors
Tag with @KNOWN_CODE_ISSUE when code does not implement expected behavior
Log code issues in docusaurus/docs/traceability/known-issues.md
See Known Issues Registry for resolution process

Known Code Issues Reference

Scenarios tagged @KNOWN_CODE_ISSUE fail due to application bugs, not test errors:

Issue	Rule	Description
ISSUE-005	NOTTSQM	CF=0 treated as no multiplier instead of producing 0 quantity
ISSUE-006	NOTTSQM	Negative concentration factor causes MySQL unsigned DECIMAL overflow
ISSUE-009	INHCT	INH rule does not fire IC_FAILED on inhibited controls
ISSUE-010	MIXMISS	Cross-run reanalysis does not resolve mixes missing (2 scenarios)
ISSUE-013	PICQUANT	No null IC observation/CT guard -- false inhibition instead of skip

Additional @KNOWN_CODE_ISSUE scenarios exist in other feature files (BT-9513 INHCT, BT-9516 INHSERUMQUANT, BT-9517 WG, BT-9547 INHQUANT). See the Known Issues Registry for the full list and resolution status.

Code naming quirk: ArrayItemsSeeker::penultimateIndex() returns the LAST index (count - 1), not the second-to-last. This affects sigmoid evaluation in SigmoidIdentifier -- the "middle vs penultimate" comparison is actually "middle vs last." BT-9402's original @KNOWN_CODE_ISSUE was based on misunderstanding this method name (removed after investigation confirmed the code behavior is correct, just misnamed).

Subagent Test Creation Patterns

Overview

Subagents can autonomously create gap tests when given sufficient context. Each subagent handles ONE rule's gaps, creates fixtures + feature file, runs tests, and iterates.

Proven: NOTTSQM (6 scenarios, 6/6 pass) and POSSIGMOID (5 scenarios, 5/5 pass) were created entirely by subagents.

Parallel Behat Execution

Subagents run Behat tests in parallel using the database pool. Each subagent checks out a DB before running tests. See How Test Execution Works > Parallel Execution for the canonical checkout/run/checkin commands.

Pool capacity: 10 databases, so up to 10 parallel test runs.

Subagent Prompt Template

## Task: Create {RULE} Gap Tests ({N} new test vectors)

You are creating Behat tests for the {RULE} rule ({description}).

### READ FIRST
Read these files before starting:
1. `docusaurus/docs/guides/llm/guide-llm-behat-creation.md`
2. Existing fixture: `tests/support_files/BT-{EXISTING}/{template}.json`
3. Existing scenario in: `tests/exports/cucumber/{file}.feature` (lines X-Y)

### What to Create
- Support directory: `tests/support_files/BT-{NEW}/`
- Config: copy from {source}, verify rule mappings
- {N} JSON fixtures: clone from {template}, change {fields}
- Feature file: `tests/exports/cucumber/{priority}_BT-{NEW}.feature`

### TV Details
| Fixture | Key Change | TV ID | Expected Outcome |
|---------|-----------|-------|-----------------|
| ... | ... | ... | ... |

### Iterative Strategy
1. Create all fixtures
2. Dry-run to verify parsing
3. Run with MINIMAL assertions first (CT, CLS only)
4. Check actual output for quantities/outcomes
5. Update assertions to match actual values
6. Re-run to confirm all pass

### Running Tests (Parallel-Safe with DB Pool)
See "How Test Execution Works > Parallel Execution" in guide-llm-behat-creation.md
for the canonical DB checkout/run/checkin commands. Use paths:
- DB pool JSON: `/shared/code/req_docs/tests/scripts/db-pool.json`
- DB pool lock: `/shared/code/req_docs/tests/scripts/db-pool.lock`
- Code dir: `/shared/code/req_docs/code`

### After Tests Pass - Copy to new_tests/
cp tests/exports/cucumber/{file}.feature new_tests/exports/cucumber/
cp -r tests/support_files/BT-{NEW}/ new_tests/support_files/

### Success Criteria
- All scenarios parse (dry-run)
- All scenarios pass with correct assertions
- Files copied to new_tests/ for git tracking
- Report final results with actual values

Proven Patterns

Pattern A: Value Variation (NOTTSQM-style)

For rules that transform a single input value:

Clone ONE working fixture
Change only the input field (e.g., concentration_factor)
One scenario per value, minimal assertions
Best for: NOTTSQM, CTCUTOFF boundary tests, threshold tests

Pattern B: Skip Condition (POSSIGMOID-style)

For rules that should NOT fire when conditions are met:

Use existing fixture that TRIGGERS the error (baseline)
Each scenario: trigger error -> apply skip condition -> re-analyse -> verify error gone
Step syntax for resolution + manual classification:

When Apply resolution to well "A1" with "Set individual curve results" and "Manual classification" to observation "{TARGET}" with "{CLS}"
And Re analyse the run file
Then well "A1" should have "{new_outcome}" outcome

Check Error Resolutions sheet in config for the RULES SKIP ON RE-ANALYSIS column
Best for: POSSIGMOID, NEGSIGMOID, DSIGMOID, POSSIGMOIDCTRL, NEGSIGMOIDCTRL

Pattern C: Control Well Role (DELTACT-style)

For rules that only apply to specific roles:

Check Rules Mapping sheet for which roles the rule applies to
Use appropriate well label (PC*, MPXNC*, NEG*, etc.)
Be aware of COMBINED_OUTCOME_CONTROL at precedence ~10
Best for: DELTA_CT, control-specific rules

Config Modification Checklist

When copying configs for new tests:

Sheet	What to Check	Common Fix
Error Codes	Missing error codes	Add the error code row your rule produces
Error Resolutions	`RULES SKIP ON RE-ANALYSIS` column	Fix to match the rule's programmatic name
Error Resolutions	Resolution codes for error codes under test	Add rows if scenario resolves wells (Gotcha #41)
Rules Mapping	Rule mapped to intended roles	Add role mappings if missing
Rules	LINEAR_REGRESSION_VALIDATION precedence	Set to 8 (must be < STDQT's 38)
Combined Outcomes	Outcome strings	Note exact strings for assertions
Control Labels	Well label -> role mapping	Verify fixture labels map correctly

Subagent Checklist

Before running tests, the subagent should verify:

Support directory created in tests/support_files/BT-XXXX/ (for Behat)
Config copied and relevant sheets checked
JSON fixtures: run_name updated, key fields changed, well_number lowercase
Feature file: tabs for indentation, @REQ_BT-XXXX on Feature line 1 with @USE_SAME_CONFIG, @TEST_BT-XXXX @TV-* on each Scenario
Dry-run passes (no parsing errors)
Full run with minimal assertions passes
Assertions updated based on actual system output
Final run: all scenarios green
Copy to new_tests/ for git tracking (tests/ is gitignored subtree)

Completed Gap Tests (Reference)

Rule	Feature File	Scenarios	Status	BT Key
QUANTVAL	`14_BT-9001.feature`	9	9/9 pass	BT-9001
DELTACT	`15_BT-9101.feature`	7	7/7 pass	BT-9101
NOTTSQM	`16_BT-9201.feature`	6	5/6 pass (1 @KNOWN_CODE_ISSUE: negative CF DB overflow)	BT-9201
POSSIGMOID	`17_BT-9301.feature`	5	5/5 pass	BT-9301
NEGSIGMOID	`18_BT-9401.feature`	5	5/5 pass	BT-9401
DSIGMOID	`19_BT-9402.feature`	7	6/7 pass (1 @KNOWN_CODE_ISSUE: TV-BND-001)	BT-9402
CTCUTOFF	`20_BT-9403.feature`	15	15/15 pass	BT-9403
MINCTRL	`21_BT-9404.feature`	8	8/8 pass (ISSUE-007 resolved: quest-v3 uses MINEXTRACT)	BT-9404
ICQUALSERUM	`22_BT-9405.feature`	5	5/5 pass	BT-9405
INHQUAL	`23_BT-9501.feature`	7	7/7 pass	BT-9501
RQUAL	`24_BT-9502.feature`	3	3/3 pass	BT-9502
INCONCLUSIVE	`25_BT-9503.feature`	8	8/8 pass	BT-9503
CONTROLFAIL	`26_BT-9504.feature`	9	9/9 pass	BT-9504
COMBOUTCTRL	`27_BT-9505.feature`	8	8/8 pass	BT-9505
EXTCTRL	`28_BT-9506.feature`	5	5/5 pass	BT-9506
MINCTRL resolution	`29_BT-9507.feature`	4	4/4 pass	BT-9507
STDQT	`30_BT-9508.feature`	6	6/6 pass	BT-9508
WG Westgard	`31_BT-9509.feature`	8	8/8 pass	BT-9509
COMBOUT	`32_BT-9510.feature`	14	14/14 pass	BT-9510
THRESH	`33_BT-9511.feature`	8	8/8 pass	BT-9511
STDCURVE	`34_BT-9512.feature`	6	6/6 pass	BT-9512
INHCT	`35_BT-9513.feature`	10	6/10 pass (4 @KNOWN_CODE_ISSUE ISSUE-009)	BT-9513
SYSINH	`36_BT-9514.feature`	9	9/9 pass	BT-9514
MIXMISS	`37_BT-9515.feature`	12	10/12 pass (2 @KNOWN_CODE_ISSUE ISSUE-010)	BT-9515
INHSERUMQUANT	`38_BT-9516.feature`	11	10/11 pass (1 @KNOWN_CODE_ISSUE)	BT-9516
WG remaining	`39_BT-9517.feature`	21	20/21 pass (1 @KNOWN_CODE_ISSUE)	BT-9517
RQUANT	`40_BT-9518.feature`	14	14/14 pass	BT-9518
AMB	`41_BT-9519.feature`	15	15/15 pass	BT-9519
QUANTVAL remaining	`42_BT-9520.feature`	11	11/11 pass (TV-001-006 @KNOWN_LIMITATION)	BT-9520
RQUANTQUAL	`43_BT-9521.feature`	8	8/8 pass	BT-9521
MNGQTY	`44_BT-9522.feature`	7	7/7 pass	BT-9522
WDCTC	`45_BT-9523.feature`	8	8/8 pass	BT-9523
LINREG	`46_BT-9524.feature`	7	7/7 pass	BT-9524
QSSC	`47_BT-9525.feature`	2	2/2 pass (supplemental)	BT-9525
WDCLSCINVSIG	`48_BT-9526.feature`	7	7/7 pass	BT-9526
INHQUAL remaining	`49_BT-9527.feature`	9	9/9 pass	BT-9527
STDQT remaining	`50_BT-9528.feature`	6	6/6 pass	BT-9528
SWCOMBOUT	`51_BT-9529.feature`	7	7/7 pass	BT-9529
INDETCTS	`52_BT-9530.feature`	7	7/7 pass	BT-9530
WCAF	`54_BT-9532.feature`	7	7/7 pass	BT-9532
REPEATSAMP	`55_BT-9533.feature`	5	5/5 pass	BT-9533
NEGSIGMOIDCTRL	`56_BT-9534.feature`	5	5/5 pass	BT-9534
DSIGMOIDCTRL	`57_BT-9535.feature`	5	5/5 pass	BT-9535
QSSC+RQUAL	`58_BT-9536.feature`	10	10/10 pass	BT-9536
UNEXPFL	`59_BT-9537.feature`	5	5/5 pass	BT-9537
CC	`60_BT-9538.feature`	5	5/5 pass	BT-9538
POSSIGMOIDCTRL+ADJZIKA	`61_BT-9539.feature`	8	8/8 pass	BT-9539
MINFL	`62_BT-9540.feature`	6	6/6 pass	BT-9540
RWAC	`63_BT-9541.feature`	4	4/4 pass	BT-9541
WDCT+WDCLSC+WDCLS	`64_BT-9542.feature`	10	10/10 pass	BT-9542
CTCUTOFF remaining	`65_BT-9543.feature`	4	4/4 pass	BT-9543
QTYWEIGHT	`66_BT-9544.feature`	4	4/4 pass	BT-9544
WDCLSINVSIG	`67_BT-9545.feature`	3	3/3 pass	BT-9545
MINCTRL edge	`68_BT-9546.feature`	3	3/3 pass	BT-9546
INHQUANT skip	`69_BT-9547.feature`	2	2/2 pass (both @KNOWN_CODE_ISSUE ISSUE-013)	BT-9547
ICCT	`70_BT-9548.feature`	3	3/3 pass (GAP-003 @KNOWN_LIMITATION)	BT-9548
MWCOMBOUT	`71_BT-9549.feature`	2	2/2 pass (GAP-002 @KNOWN_LIMITATION)	BT-9549
NEC+NECINH	`72_BT-9550.feature`	3	3/3 pass	BT-9550
RRES	`73_BT-9551.feature`	2	2/2 pass	BT-9551

Next BT key to use: BT-9552

Quick Start Checklist​

How Test Execution Works​

File Locations​

tests/ vs new_tests/​

How Fixture Lookup Works​

Running Tests​

Parallel Execution (DB Pool)​

Essential Rules​

@USE_SAME_CONFIG -- MUST Be Feature-Level Tag​

Assert Control Wells BEFORE Patient Wells​

PhpSpreadsheet for Config Edits (with openpyxl Exceptions)​

Config Preference​

Rate Limiting​

Creating a New Test​

Step 1: Create Support Directories​

Step 2: Create/Copy Config XLSX​

Step 3: Create JSON Run Files​

Step 4: Create Feature File​

Step 5: Verify Parsing​

Step 6: Run Test​

Tagging Convention​

Common Gherkin Steps​

Gotchas (Hard-Won Lessons)​

1. Well Numbers -- Uppercase in Gherkin, Lowercase in JSON​

2. Rules Only Apply to Mapped Roles​

3. Config Validation Cascade + Rule Precedence​

4. Well Outcome vs Observation-Level Errors​

5. dxai_cls in JSON is a Hint, Not Final​

6. Gradient Requires CT Variation Across Wells​

7. QIR Settings Interference​

8. Each Test Takes ~2 Minutes​

9. Scenario Outline Requires Examples Table​

10. Mixed Indentation Breaks Parsing​

11. LINEAR_REGRESSION_VALIDATION Must Run Before STDQT​

12. openpyxl-Generated XLSX Uses Inline Strings​

13. Config Swapping Can Introduce Missing Error Codes​

14. Control Wells Trigger Westgard QC Cascade​

15. Remove Westgard Rule Mappings for Test Isolation (Preferred)​

16. DB Contention from Parallel Behat Runs​

17. MIN_CONTROLS vs MINEXTRACT: Config Must Match Extraction Settings​

18. Remove Non-Westgard Interfering Rules for Test Isolation​

19. Transient "Invalid JSON was returned from the route" Errors​

20. Heavy Configs (5000+ Rules Mapping Rows) Cause PHP Timeout​

21. Import Validator Masks Rule-Engine Errors​

22. Single-Mix Configs Cannot Test Target Mismatch Independently​

23. Specimen Type Comes from Config, Not Fixture​

24. Orphaned Observations on Non-Existent Wells​

25. mix_results Count Comes from Config, Not Wells​

26. Resolution Step Atomicity -- Cannot Isolate Skip Paths​

27. Vacuous Scenarios -- Tests That Pass But Test Nothing​

28. Extraction Settings: MIN_CONTROLS vs MINEXTRACT​

29. Heavy Configs and Timeouts (Config-Specific)​

30. V2 Westgard Cleanup Needs More Than remove-westgard-rules.php​

31. PICQUANT Rules Lack Null IC Guards -- ISSUE-013​

32. DB_AUDIT_DATABASE Must Match DB_DATABASE​

33. DB_HOST and DB_AUDIT_HOST Must Be 127.0.0.1​

34. Fresh DBs Need Pre-Migration​

35. Rule Precedence and Interference (Detail)​

36. Missing Error Codes When Swapping Configs​

37. Config-Dependent TV Feasibility​

38. Negative-Path Tests and @NEGATIVE_PATH_GUARD​

39. Resolution Steps Must Target the Specific Failing Observation​

40. @USE_SAME_CONFIG Must NOT Be First Scenario-Level Tag​

41. Config Must Have Resolution Codes for Well Resolution Tests​

Test-First Policy​

Known Code Issues Reference​

Subagent Test Creation Patterns​

Overview​

Parallel Behat Execution​

Subagent Prompt Template​

Proven Patterns​

Pattern A: Value Variation (NOTTSQM-style)​

Pattern B: Skip Condition (POSSIGMOID-style)​

Pattern C: Control Well Role (DELTACT-style)​

Config Modification Checklist​

Subagent Checklist​

Completed Gap Tests (Reference)​