Guide: Creating and Editing Behat Tests
Created: 2026-02-03
Last Updated: 2026-02-12
Status: Living document - update as new patterns discovered
Note: Consolidated from guide-behat-authoring.md + GUIDE_BEHAT_TEST_CREATION.md + dev-testing-guide.md execution sections
For step definitions and parameter reference, see dev-testing-guide.md.
Quick Start Checklist
Minimum steps to create a working test:
- Create support directory:
tests/support_files/BT-XXXX/ - Copy a working config XLSX into it (prefer v3/v30/v31 configs)
- Copy and modify a JSON run fixture
- Create feature file in
tests/exports/cucumber/v3/ - Dry-run:
cd /shared/code/req_docs/code && ./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run - Full run (see How Test Execution Works)
- Verify assertions match actual system output
Before writing any test, verify:
| Check | Where to look | Why |
|---|---|---|
| Rule applies to your well role | Config -> Rules Mapping sheet | Rules only fire on mapped (target, role) pairs |
| Expected error code exists | Config -> Error Codes sheet | Missing codes cause unhandled exceptions |
| Error's PREVENTS ANALYSIS flag | Config -> Error Codes sheet | Determines whether later rules can run |
| Outcome string | Config -> Combined Outcomes sheet | Well-level outcome comes from here, not the rule title |
| Well label -> role mapping | Config -> Control Labels sheet | Label determines role (PC*, NEG*, NTC, etc.) |
| Config can exercise the TV's conditions | STD decision table -> required inputs vs config capabilities | Some TVs are impossible with certain configs (see Gotcha #37) |
| Assertion distinguishes rule-ran from default | STD -> expected output vs default (no-error) output | If the expected outcome is the same as the default, the test can't prove the rule ran (see Gotcha #38) |
| Resolution codes exist for error | Config -> Error Resolutions sheet | Resolution steps fail with "Resolution is not allowed" if no codes configured (see Gotcha #41) |
How Test Execution Works
File Locations
/shared/code/req_docs/tests/
exports/cucumber/
v3/ # 55 consolidated feature files (BT-9xxx, verified passing)
legacy/ # 10 legacy files providing unique TV coverage
archive/ # Superseded/debug files (not run)
support_files/BT-XXXX/ # Fixtures per test (JSON runs, XLSX configs)
catalogue/ # Feature catalogue, cross-references
/shared/code/req_docs/new_tests/
exports/cucumber/ # NEW test files go here (tracked by main repo)
support_files/BT-XXXX/ # NEW fixtures go here (tracked by main repo)
/shared/code/req_docs/code/
features/bootstrap/ # Step definitions (FeatureContext.php, BaseFeatureContext.php)
behat.yml # Behat config (paths, suites)
tests/ vs new_tests/
The tests/ directory is a git subtree and is listed in .gitignore. New files created there will NOT be committed to the main repo.
For git tracking: Always copy/create new test files in new_tests/ (same structure as tests/).
For Behat execution: Files must be in tests/ because Behat looks up fixtures via tests/support_files/BT-XXXX/. So:
- Create files in both
tests/(for execution) andnew_tests/(for git) - OR create in
tests/first, then copy tonew_tests/after verification - The
new_tests/copy is the authoritative version for git
How Fixture Lookup Works
The @TEST_BT-XXXX tag on a scenario determines the support files directory:
- Behat reads
@TEST_BT-5177tag from the scenario - Strips
TEST_prefix ->BT-5177 - All file references resolve to
tests/support_files/BT-5177/ Given The configuration "foo.xlsx" is loaded-> loadssupport_files/BT-5177/foo.xlsxWhen Upload the run file "bar.json"-> loadssupport_files/BT-5177/bar.json
Consequence: All scenarios sharing a @TEST_BT-XXXX tag share the same fixture directory. Feature file location (v3/, legacy/, any subdirectory) does not affect lookup.
Running Tests
cd /shared/code/req_docs/code
# Single test by feature file
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
DB_USERNAME=sail DB_PASSWORD=password \
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature"
# Single test by tag
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
DB_USERNAME=sail DB_PASSWORD=password \
./vendor/bin/behat --tags=@TEST_BT-5035
# Dry-run (parse check only, no DB needed)
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run
IMPORTANT environment variables:
RATE_LIMIT_MAX_ATTEMPTS=9999-- always required. The app has a custom rate limiter (5 req/60s per URL+IP). Without it, tests with >2 scenarios fail with "too many request" errors.DB_HOST=127.0.0.1-- always required. The.envfile hasDB_HOST=mysql(Docker hostname that does not resolve outside Docker).DB_AUDIT_HOST=127.0.0.1-- always required. Same Docker hostname issue. Without it, audit writes fail silently.DB_AUDIT_DATABASE=$DB-- always required when using DB pool. Must matchDB_DATABASE.
NEVER use: DB_HOST=mysql, MYSQL_ATTR_SSL_VERIFY_SERVER_CERT, /workspace/ paths.
Parallel Execution (DB Pool)
Behat's @BeforeScenario hook drops and recreates all tables, so two instances sharing the same database corrupt each other. A database pool of 10 MySQL databases (pcrai_test_01 through pcrai_test_10) enables safe parallel execution.
Setup: /shared/code/req_docs/tests/scripts/setup-test-dbs.sh (idempotent, creates all 10 DBs with grants for sail@localhost)
How it works:
CreatesApplication.phprespectsDB_DATABASEenv var (falls back topcrai_testif unset)- Agents checkout a DB from
/shared/code/req_docs/tests/scripts/db-pool.jsonusingflock+jqfor atomicity - Each Behat run uses its own DB -- no conflicts
Checkout -> Run -> Checkin:
# Checkout first available DB
DB=$(jq -r '[.databases[] | select(.status=="available")][0].name' /shared/code/req_docs/tests/scripts/db-pool.json)
flock /shared/code/req_docs/tests/scripts/db-pool.lock jq \
--arg db "$DB" --arg label "BT-XXXX" --arg ts "$(date -Iseconds)" \
'(.databases[] | select(.name==$db)) |= (.status="in_use"|.locked_by=$label|.locked_at=$ts)' \
/shared/code/req_docs/tests/scripts/db-pool.json > /shared/code/req_docs/tests/scripts/db-pool.tmp \
&& mv /shared/code/req_docs/tests/scripts/db-pool.tmp /shared/code/req_docs/tests/scripts/db-pool.json
# Run test
cd /shared/code/req_docs/code
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
DB_DATABASE=$DB DB_AUDIT_DATABASE=$DB \
DB_USERNAME=sail DB_PASSWORD=password \
./vendor/bin/behat "../tests/exports/cucumber/XX_BT-XXXX.feature"
# Checkin (always -- even on failure)
flock /shared/code/req_docs/tests/scripts/db-pool.lock jq \
--arg db "$DB" \
'(.databases[] | select(.name==$db)) |= (.status="available"|.locked_by=null|.locked_at=null)' \
/shared/code/req_docs/tests/scripts/db-pool.json > /shared/code/req_docs/tests/scripts/db-pool.tmp \
&& mv /shared/code/req_docs/tests/scripts/db-pool.tmp /shared/code/req_docs/tests/scripts/db-pool.json
Pool capacity: 10 databases, so up to 10 parallel test runs. If all are in use, wait and retry.
Without pool (backward compatible): DB_DATABASE=pcrai_test still works -- the single-DB fallback is unchanged.
Fresh DBs need pre-migration before first Behat run. migrate:fresh on an empty DB (no migrations table) skips db:wipe. Fix:
mysql -h 127.0.0.1 -u sail -ppassword -e "DROP DATABASE IF EXISTS pcrai_test_XX; CREATE DATABASE pcrai_test_XX;"
cd /shared/code/req_docs/code
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 DB_DATABASE=pcrai_test_XX DB_AUDIT_DATABASE=pcrai_test_XX \
DB_USERNAME=sail DB_PASSWORD=password APP_ENV=testing \
php ../tests/scripts/pre-migrate.php
Or use the helper script: /shared/code/req_docs/tests/scripts/run-legacy-test.sh pcrai_test_XX feature-file.feature
Essential Rules
@USE_SAME_CONFIG -- MUST Be Feature-Level Tag
CRITICAL: @USE_SAME_CONFIG must go on the FEATURE tag line (line 1), NOT on individual scenarios.
The code in BaseFeatureContext.php:122 only checks getFeature()->getTags() (feature-level tags). Scenario-level @USE_SAME_CONFIG tags are silently ignored -- the config reloads every scenario, causing:
- ~2-3 min wasted per extra scenario (full
migrate:fresh+ config import each time) - PHP timeout on heavy configs (cumulative
max_execution_timeexhaustion) - A 5-scenario test takes 14+ minutes instead of ~4 minutes
With @USE_SAME_CONFIG at feature level | Without |
|---|---|
| First scenario: ~2-3 min (full migrate + import) | Every scenario: ~2-3 min each |
| Subsequent scenarios: ~10-15 seconds each | 5-scenario file: ~14 minutes |
| 5-scenario file: ~4 minutes | PHP timeout risk on heavy configs |
Correct pattern:
@REQ_BT-XXXX @USE_SAME_CONFIG
Feature: My tests (all use same config)
@TEST_BT-XXXX @TV-RULE-001
Scenario: First test
Given The configuration "my-config.xlsx" is loaded # <-- LOADED
...
@TEST_BT-XXXX @TV-RULE-002
Scenario: Second test
Given The configuration "my-config.xlsx" is loaded # <-- SKIPPED (reused)
...
Only omit @USE_SAME_CONFIG when scenarios genuinely need different configs (e.g., BT-9509 Westgard tests with per-scenario config files).
NEVER put @USE_SAME_CONFIG as the first scenario-level tag. BaseFeatureContext::beforeScenario() uses Arr::first($event->getScenario()->getTags()) to determine the BT key for fixture lookup. If @USE_SAME_CONFIG is the first scenario tag, the fixture directory resolves to support_files/USE_SAME_CONFIG/ instead of support_files/BT-XXXX/, causing "file not found" errors on every scenario. The file-level tag on line 1 is sufficient — do not duplicate it on scenarios. If you must add it to a scenario for documentation purposes, ensure @TEST_BT-XXXX comes first:
@TEST_BT-XXXX @USE_SAME_CONFIG @TV-RULE-001 <- OK (TEST_BT first)
@USE_SAME_CONFIG @TEST_BT-XXXX @TV-RULE-001 <- BROKEN (fixture lookup fails)
Assert Control Wells BEFORE Patient Wells
When a patient well fails (e.g., CONTROL_MISSING), Behat stops at that assertion and skips remaining steps. If control well assertions come after the patient assertion, you never see whether controls passed or failed -- losing critical diagnostic information.
Always order assertions: controls first, then patients.
Then well "C11" should have "Control Passed" outcome
And well "C13" should have "Control Passed" outcome
And well "B1" should have "Detected" outcome
PhpSpreadsheet for Config Edits (with openpyxl Exceptions)
PhpSpreadsheet (PHP) is REQUIRED for config edits that modify Rules Mapping, Error Codes, Error Resolutions, or sheet structure. openpyxl delete_rows() corrupts Rules Mapping sheets (shifts cells, leaves nulls in ROLE column -> PHP crashes). openpyxl also uses inline strings instead of shared strings, which can cause subtle format issues.
openpyxl (Python) is ACCEPTABLE for:
- (a) Read-only inspection of any sheet
- (b) Modifying only cell values in QIR, Curve Control Limits, Delta CT sheets (NOT Rules Mapping)
openpyxl delete_rows() is NEVER safe on any sheet.
A reusable Westgard removal script exists:
cd /shared/code/req_docs/code
php scripts/remove-westgard-rules.php "../tests/support_files/BT-XXXX/config.xlsx"
For other config edits, write similar PHP CLI scripts using PhpSpreadsheet. The composer autoloader is at /shared/code/req_docs/code/vendor/autoload.php.
Config Preference
| Preference | Configs | Notes |
|---|---|---|
| Preferred (v3) | quest-v3.xlsx, v30/v31 variants | 0 Westgard rows, extraction-aware |
| Acceptable (v30) | Viracor v30 pp based.xlsx | V30, generally clean |
| Avoid (v2) | Viracor 2.25.0.xlsx, Quest_PP_2_22.xlsx | Westgard failures, wrong rule ordering |
V2 config issues: Missing WESTGARDS_MISSED error code, wrong LINEAR_REGRESSION_VALIDATION precedence, CF=0 treated as no multiplier, negative quantities cause MySQL overflow.
Strategy for v2 configs (when unavoidable):
- Start with minimal assertions (CT, CLS only)
- Verify standard curve pipeline before adding quantity assertions
- Check LINEAR_REGRESSION_VALIDATION precedence (must be < 38)
- Use PhpSpreadsheet for all edits
- Remove Westgard mappings unless testing Westgard behavior
Rate Limiting
Always include RATE_LIMIT_MAX_ATTEMPTS=9999. The app has a custom rate limiter (5 req/60s per URL+IP). With @USE_SAME_CONFIG, scenarios run fast enough to hit this limit. Without it, tests with >2 scenarios fail with "too many request" errors.
Creating a New Test
Step 1: Create Support Directories
# For Behat execution (required):
mkdir /shared/code/req_docs/tests/support_files/BT-XXXX/
# For git tracking (required):
mkdir -p /shared/code/req_docs/new_tests/support_files/BT-XXXX/
Name must match the @TEST_BT-XXXX tag you will use.
Step 2: Create/Copy Config XLSX
Preferred approach: Copy from a working test with similar config needs.
# Find configs for a specific rule
find /shared/code/req_docs/tests/support_files/ -name "*.xlsx" | head -20
# Copy a known-working config
cp /shared/code/req_docs/tests/support_files/BT-5134/config.xlsx \
/shared/code/req_docs/tests/support_files/BT-XXXX/
Before creating fixtures, check the config:
| Sheet | What to check |
|---|---|
| Rules Mapping | Rule applies to your intended well role |
| Control Labels | Well labels map to intended roles |
| Error Codes | Expected error code exists; check PREVENTS ANALYSIS flag |
| Combined Outcomes | Exact outcome strings for your assertions |
| Error Resolutions | RULES SKIP ON RE-ANALYSIS matches rule name (for resolution tests) |
| QIR - Quantification settings | Slope/efficiency/R2 thresholds (clear if not testing these) |
Step 3: Create JSON Run Files
Always start from an existing working fixture:
ls /shared/code/req_docs/tests/support_files/BT-5001/*.json
Key JSON structure:
{
"run_info": { "run_name": "MY_TEST.json", "thermocycler_id": "275000953" },
"targets": { "t1": { "mix_name": "NOR2", "target_name": "NOR2", "auto_baseline": true } },
"wells": { "w1": { "well_number": "a1", "label": "|T:NOR2|R:S2|", "well_uuid": "w1" } },
"observations": { "o1": { "target": "NOR2", "ct": 30.0, "dxai_cls": "Pos", "well_uuid": "w1", "readings": [...] } }
}
Critical fields:
well_number: Always lowercase in JSON (a1), uppercase in Gherkin (A1)label: Must match config's expected format (pipe-delimited, plain text, etc.)well_uuid: Everywell_uuidin observations must have a matching entry inwells-- orphaned observations are silently ignoredreadings: Array of fluorescence values -- copy from a working fixture, do not fabricatect: The CT value under testdxai_cls: Classification (Pos,Neg,Ambiguous) -- but see Gotcha #5, this is a hint not final
Step 4: Create Feature File
@REQ_BT-XXXX @USE_SAME_CONFIG
Feature: Description of what's being tested
@TEST_BT-XXXX @TV-RULE-001-001
Scenario: First test -- config loads here
Given The configuration "my-config.xlsx" is loaded
When Upload the run file "test1.json"
And Open the run file "test1.json"
Then well "C11" should have "Control Passed" outcome
And well "A1" should have "Detected" outcome
@TEST_BT-XXXX @TV-RULE-001-002
Scenario: Second test -- config reused automatically
Given The configuration "my-config.xlsx" is loaded
When Upload the run file "test2.json"
And Open the run file "test2.json"
Then well "C11" should have "Control Passed" outcome
And well "A1" should have "Not Detected" outcome
Key patterns:
@USE_SAME_CONFIGon the feature line (line 1), never on scenarios- Assert control wells before patient wells
- Use tabs for indentation (matching existing files). Mixed tabs/spaces cause silent parse failures
Scenario Outline:requires anExamples:table. If no placeholders, useScenario:instead
Scenario Outline conversion -- when 3+ scenarios share identical step structure but differ only in data values:
# TV Tags: TV-QUANTVAL-005-001, TV-QUANTVAL-005-002, TV-QUANTVAL-005-003
@TEST_BT-9001 @TV-QUANTVAL-005
Scenario Outline: <description>
Given The configuration "config.xlsx" is loaded
When Upload the run file "<run_file>"
And Open the run file "<run_file>"
Then well "A1" should have "<outcome>" outcome
Examples:
| description | run_file | outcome | # TV Tag |
| CT below threshold | test_low.json | Detected | # TV-QUANTVAL-005-001 |
| CT above threshold | test_hi.json | Not Detected | # TV-QUANTVAL-005-002 |
| CT at exact boundary | test_bnd.json | Detected | # TV-QUANTVAL-005-003 |
- Add
# TV Tags:comment above the Outline listing all TV IDs - The
# TV Tagcomment column in Examples preserves per-row traceability - Pipe characters
|in description text conflict with Gherkin table syntax -- escape or rephrase
File naming: {priority}_{BT-KEY}.feature in tests/exports/cucumber/v3/
Step 5: Verify Parsing
cd /shared/code/req_docs/code
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature" --dry-run
Step 6: Run Test
cd /shared/code/req_docs/code
RATE_LIMIT_MAX_ATTEMPTS=9999 APP_ENV=testing \
DB_HOST=127.0.0.1 DB_AUDIT_HOST=127.0.0.1 \
DB_DATABASE=pcrai_test_01 DB_AUDIT_DATABASE=pcrai_test_01 \
DB_USERNAME=sail DB_PASSWORD=password \
./vendor/bin/behat "../tests/exports/cucumber/v3/XX_BT-XXXX.feature"
Tagging Convention
| Tag | Level | Purpose | Example |
|---|---|---|---|
@REQ_BT-XXXX | Feature | Links to Jira requirement | @REQ_BT-5268 |
@TEST_BT-XXXX | Scenario | Test ID + fixture directory lookup | @TEST_BT-9001 |
@TV-RULE-REQ-TV | Scenario | Test vector traceability | @TV-QUANTVAL-005-001 |
@USE_SAME_CONFIG | Feature (line 1) | All scenarios share first scenario's config | Quest features |
@KNOWN_CODE_ISSUE | Scenario | Test documents expected behavior but code is incomplete | IC skip tests |
@KNOWN_LIMITATION | Scenario | Test passes but cannot fully cover intended TV due to system constraints | SWCOMBOUT TV-001 |
@DUPLICATE_COVERAGE | Scenario | Functionally identical to another scenario (atomic step limitation) | NEGSIGMOID TV-003-001 |
@MISTAGGED | Scenario | TV tag does not match what scenario actually tests | NEGSIGMOID TV-001-006 |
@COMBINED_OUTCOME | Scenario | Test involves outcomes across multiple runs/mixes | Combined outcome features |
@UNIQUE | Scenario | Test uses unique/isolated test data | Isolated test data |
@UNIVERSAL | Scenario | Test applies universally across configurations | Universal edge cases |
@EXAMPLE_TEST | Scenario | Example/demonstration test (not in core regression) | Demo tests |
Consult the False KL Checklist before applying this tag. ~41% of KL tags in Waves 1-3 were incorrectly applied due to false assumptions about config editability, fixture engineering, and infrastructure capabilities.
Common Gherkin Steps
# Config + run file
Given The configuration "{config}.xlsx" is loaded
When Upload the run file "{file}.json"
And Open the run file "{file}.json"
# Well assertions
Then well "A1" should have "{outcome}" outcome
And well "A1" should have "{mix}" mix
And well "A1" should have "{role}" sample role
And well "A1" should have "true" is crossover
# Observation assertions
And well "A1" observation "{target}" should have "{cls}" final cls
And well "A1" observation "{target}" should have "{ct}" final ct
And well "A1" observation "{target}" should have "{qty}" quantity
# Resolution + re-analysis
When Apply resolution to well "A1" with "{resolution}"
And Re analyse the run file
# Resolution with individual curve result
When Apply resolution to well "A1" with "Set individual curve results" and "Manual classification" to observation "{TARGET}" with "{CLS}"
And Re analyse the run file
For the full step definition reference (22 steps with parameters, exceptions, and regex patterns), see dev-testing-guide.md.
Gotchas (Hard-Won Lessons)
1. Well Numbers -- Uppercase in Gherkin, Lowercase in JSON
- JSON run files use lowercase:
"well_number": "a2" - Gherkin steps use uppercase:
well "A2" should have "Detected" outcome - The system converts during import. Always use uppercase in feature files.
2. Rules Only Apply to Mapped Roles
Critical: Each rule in the config's Rules Mapping sheet is mapped to specific (target, role) combinations. If your test well's role is not in the mapping, the rule will never fire on that well, even if the data should trigger it.
How to check: Open the config XLSX -> Rules Mapping sheet -> find your rule -> check which roles are listed.
Example: DELTA_CT in v31 config maps to: NTC, PC, COVIDPPC, MPXPC, COVIDNPC, COVIDPNC, MPXNC, NEG. Patient wells are NOT included. So DELTA_CT never evaluates on patient wells.
Well labels determine roles via the Control Labels sheet:
| Label pattern | Role | Type |
|---|---|---|
PC* | PC | Positive control |
MPXPC* | MPXPC | Positive control |
MPXNC* | MPXNC | Negative control |
NEG* | NEG | Negative control |
NTC | NTC | No template control |
| (anything else) | Patient | Patient |
3. Config Validation Cascade + Rule Precedence
Multiple rules validate in precedence order. A test targeting one rule can fail because a different rule fires first:
Common precedence chain:
COMBINED_OUTCOME_CONTROL(~order 10) -- control pass/fail- Negative control rules (BNC, BICQUAL) (~order 16)
STANDARD_OUTSIDE_CT_RANGE-- CT range checksBAD_GRADIENT/BAD_EFFICIENCY/BAD_R2-- linear regressionDELTA_CT(~order 46) -- CT threshold / CLS mismatchCOMBINED_OUTCOME(~order 47) -- patient well outcomeSample label has an invalid accession(FILEIMPORT)
Key interactions:
- Rules with
is_allow_error_wells = falsewill not run if an earlier rule already set an error on the well PREVENTS ANALYSIS = NOon your target rule's error means later rules will still evaluate and may overwrite- To isolate a rule, remove interfering higher-precedence rule mappings for the test well's role using PhpSpreadsheet
4. Well Outcome vs Observation-Level Errors
The wellShouldHaveOutcome step checks the well-level outcome, not observation-level errors:
- Individual rules (DELTA_CT, QUANTVAL, etc.) set observation-level errors
COMBINED_OUTCOME(for patient wells) orCOMBINED_OUTCOME_CONTROL(for controls) set the well-level outcome- The well outcome string comes from the Combined Outcomes sheet, not the rule title
Exception: Errors with PREVENTS ANALYSIS = YES override the well outcome directly with the error message.
5. dxai_cls in JSON is a Hint, Not Final
The dxai_cls field in run file JSON is the instrument's pre-classification, but the system re-classifies based on its own logic. Setting dxai_cls: "Neg" with ct: 25 and amplification curve data will result in the system classifying as "Pos" (because CT=25 shows amplification).
The curve shape always wins. Even explicitly setting dxai_cls: "Pos" will be overridden to Neg if the readings array has a downward/flat curve shape. The system's setMachineClsCalculatedFromMachineCt() method in Observation.php determines classification from the readings, not from dxai_cls. This means fixture-level dxai_cls cannot be used to isolate classification-dependent rule behavior from the curve analysis path (learned from BT-9544 QTYWEIGHT review).
To create a genuinely Neg observation:
- Set
cttonull,dxai_cttonull,dxai_clsto"Neg" - Replace the
readingsarray with flat values (e.g., 40 x1.0)
To create a genuinely Pos observation:
- Set
ctto a value below the target's MAX CT threshold,dxai_ctto the same value,dxai_clsto"Pos" - Use realistic amplification curve
readings(copy from a working fixture)
6. Gradient Requires CT Variation Across Wells
Standard curve tests (QUANTVAL) use S2/S4/S6 wells. If all wells have identical CT values, the gradient is 0 (flat line), which triggers BAD_GRADIENT.
Fix: When creating fixtures with different CT values, apply an offset to the template data rather than setting all wells to the same value. Preserve the relative differences between wells.
# WRONG - all wells get CT=38, gradient=0
for obs in observations:
obs['ct'] = 38.0
# RIGHT - offset from template values, preserving gradient
for obs in observations:
if obs['ct'] is not None:
obs['ct'] = obs['ct'] + 2.0 # template had 30, 31, 32 -> now 32, 33, 34
7. QIR Settings Interference
The QIR - Quantification settings sheet in config XLSX has MIN SLOPE / MAX SLOPE, MIN EFFICIENCY / MAX EFFICIENCY, MIN R2, and MIN CONTROLS. When testing CT range validation specifically, clear all other QIR settings to prevent interference.
Warning: openpyxl cell value assignment is acceptable for QIR sheet edits (cell values only, no row deletion), but read back and verify after save. See Essential Rules > PhpSpreadsheet for the full openpyxl policy.
8. Each Test Takes ~2 Minutes
Each scenario takes approximately 1.5-3 minutes due to:
- Database refresh (drop all tables, run all migrations)
- Config import (parse XLSX, seed database)
- Run file processing
A 9-scenario feature file takes ~16-18 minutes. With @USE_SAME_CONFIG: first scenario ~3 min, subsequent ~15 sec each. Plan accordingly.
9. Scenario Outline Requires Examples Table
Scenario Outline: MUST have an Examples: table. Without it, the Gherkin parser fails with:
Expected Step, Examples table, or end of Scenario, but got text: "Then"
If the scenario does not use placeholders (<param>), use Scenario: instead of Scenario Outline:.
10. Mixed Indentation Breaks Parsing
Gherkin requires consistent indentation within a file. Mixing tabs and spaces (e.g., \t\t Given) causes silent parsing failures. Use tabs only, matching the pattern of existing files.
11. LINEAR_REGRESSION_VALIDATION Must Run Before STDQT
LINEAR_REGRESSION_VALIDATION computes slope and intercept from standard wells and stores them on RunTarget. STDQT reads those values to compute patient quantities.
If LINEAR_REGRESSION_VALIDATION has a higher run order (precedence) than STDQT, quantities will silently be 0/null. STDQT calls cannotQuantify() which checks for slope/intercept -- if they are null (because regression has not run yet), it exits without computing.
- Correct: LINEAR_REGRESSION_VALIDATION at run order 8, STDQT at 38
- Wrong: LINEAR_REGRESSION_VALIDATION at run order 49 (only seen in openpyxl-generated configs)
12. openpyxl-Generated XLSX Uses Inline Strings
Configs heavily modified by openpyxl use inline strings (t="inlineStr") instead of shared strings format (t="s" with xl/sharedStrings.xml). While PhpSpreadsheet generally handles both, subtle edge cases may arise.
Best practice: Copy an existing working .xlsx and modify only the cells you need, rather than generating a new workbook from scratch.
13. Config Swapping Can Introduce Missing Error Codes
Copying a config from one test to another can fail if the source config references error codes not registered in the application. Symptom: No Error Code Defined for code: XXXXX (Exception) at run upload time.
Fix: Either:
- Go back to the original config and fix only what is needed
- Remove the offending rule mappings from the swapped config
- Add the missing error code to the Error Codes sheet
14. Control Wells Trigger Westgard QC Cascade
V2 configs with Westgard rules mapped to PEC role will fire on control wells labeled R:LO POS (because LO POS maps to PEC via the Control Labels sheet). Without matching Westgard Limits date ranges in the test DB, this triggers WESTGARDS_MISSED.
Cascade: If WESTGARDS_MISSED has PREVENTS ANALYSIS = YES, the control well cannot complete analysis -> "Associate mix and extraction errors" rule propagates the error to patient wells.
DO NOT remove control wells to avoid this -- that triggers MINCONTROLS errors instead.
Partial fix: Add WESTGARDS_MISSED to the Error Codes sheet with PREVENTS ANALYSIS = NO and ERROR TYPE = Warning. Some controls may still fail. Prefer removing Westgard rule mappings entirely (Gotcha #15) unless specifically testing Westgard behavior.
Warning: The openpyxl code below is for Error Codes cell edits only (acceptable per policy). Do NOT use openpyxl delete_rows() on any sheet.
# openpyxl partial fix for v2 configs with Westgard rules (prefer Gotcha #15)
ws = wb["Error Codes"]
new_row = ws.max_row + 1
ws.cell(row=new_row, column=1, value="WESTGARDS_MISSED")
ws.cell(row=new_row, column=2, value="Westgard Limit Missed...")
ws.cell(row=new_row, column=3, value="Warning") # NOT "Error"
ws.cell(row=new_row, column=4, value="Well")
ws.cell(row=new_row, column=6, value="NO") # PREVENTS ANALYSIS = NO
ws.cell(row=new_row, column=7, value="NO")
ws.cell(row=new_row, column=8, value="NO")
15. Remove Westgard Rule Mappings for Test Isolation (Preferred)
V2 configs with Westgard rules cause cascade failures on control wells when no Westgard Limits date ranges exist in the test DB.
PREFERRED approach: Remove Westgard rule mappings from config entirely. Better test isolation since you are testing the target rule, not Westgard. Use remove-westgard-rules.php:
cd /shared/code/req_docs/code
php scripts/remove-westgard-rules.php "../tests/support_files/BT-XXXX/config.xlsx"
Caution: The script only checks column B for "westgard" keyword. Rules with IDs like WG* (e.g., "Check 13S after 12S", "Check 22S after 13S") need separate removal. Also clear the Westgard Limits and Westgard Events sheets entirely.
Only keep Westgard rules when specifically testing Westgard behavior.
16. DB Contention from Parallel Behat Runs
Running multiple Behat tests simultaneously against the same database causes @BeforeScenario hook failures:
Table 'migrations' doesn't exist(first agent's DROP collides with second agent's migration)Table 'users' already exists(rebuild collision)
Fix: Use the DB pool. See How Test Execution Works > Parallel Execution.
17. MIN_CONTROLS vs MINEXTRACT: Config Must Match Extraction Settings
Two different programmatic rules handle minimum control validation:
| Rule | Lookup method | Works when |
|---|---|---|
MIN_CONTROLS | Non-extraction (!hasExtractionSettings()) | use_extraction_instruments=false |
MINEXTRACT | Extraction-aware (mix, date, instrument, batch) | use_extraction_instruments=true |
The trap: Configs with use_extraction_instruments=true (Quest EZ configs) auto-assign extraction settings to ALL wells. If such a config uses MIN_CONTROLS, the non-extraction lookup returns empty for every well -> every patient gets CONTROL_MISSING.
Error message tells you which rule ran:
CONTROL_MISSING: "This well is missing the required associated controls..." --MIN_CONTROLSranEXTRACTION_CONTROLS_MISSING: "This well is missing the required associated extraction controls..." --MINEXTRACTran
18. Remove Non-Westgard Interfering Rules for Test Isolation
Other rules can also override your target rule's results. When testing a specific rule in isolation, check if higher-precedence rules fire on the same well role and overwrite the outcome.
General pattern:
- Run the test -- if the wrong error message appears, identify which rule produced it
- Check Rules Mapping sheet -- find all rules mapped to your test well's role
- Identify rules with higher run order than your target rule
- Remove those rule mappings (for the specific role, not globally) using PhpSpreadsheet
- Script:
/shared/code/req_docs/code/scripts/setup-bt9505-config.phpshows the pattern
Key insight: PREVENTS ANALYSIS=NO on your target rule's error code means later rules with IS ALLOW ERROR WELLS=false will NOT skip the well -- they will evaluate and potentially overwrite.
19. Transient "Invalid JSON was returned from the route" Errors
Config loading occasionally fails with "Invalid JSON was returned from the route" -- Laravel returns HTML error page instead of JSON API response. This is a transient infrastructure issue, not a test bug.
Causes: Passport OAuth token expiry, momentary DB connection timeout, memory pressure after multiple scenarios with re-analysis.
Fix: Simply rerun the test. If it persists across 2+ runs, investigate Laravel Passport keys and DB connectivity.
20. Heavy Configs (5000+ Rules Mapping Rows) Cause PHP Timeout
Configs with very large Rules Mapping sheets can exceed PHP's 90-second max_execution_time during well processing.
Mitigations:
- Use
@USE_SAME_CONFIGat feature level (avoids repeated config loads -- this alone fixed BT-9506) - Keep scenario count low (3-4 per feature) for heavy configs
- Consider using a lighter config variant
- Remove unused rule mappings to reduce processing
21. Import Validator Masks Rule-Engine Errors
Invalid data values (SD=0, null, text, negative) are rejected by the config import validator before the rule engine runs. Tests expecting rule-engine error codes will get import-level errors instead. Check whether the import validator catches the condition first.
22. Single-Mix Configs Cannot Test Target Mismatch Independently
Kit configs with only one observable target (e.g., NOR2) cannot test "target mismatch" independent of "role mismatch" -- changing the target to something not in the config produces the same error as changing the role. True target-mismatch tests require a multi-mix config.
23. Specimen Type Comes from Config, Not Fixture
The system determines specimen type through the config's Test Codes sheet (client code -> mix -> specimen type), NOT from the JSON fixture. The fixture only needs the C:XXXX client code in the well label.
Chain: Well label C:1408 -> Test Codes sheet -> ENTF mix -> Fecal specimen -> LOD threshold
24. Orphaned Observations on Non-Existent Wells
JSON fixtures can have observations referencing well_uuid values not declared in the wells section. These orphaned observations are silently ignored during import.
Fix: Always verify every well_uuid in observations has a matching entry in wells.
25. mix_results Count Comes from Config, Not Wells
For SWCOMBOUT and similar rules, mix_count is determined by the Combined Outcomes sheet in the config, not by the number of patient wells in the fixture.
26. Resolution Step Atomicity -- Cannot Isolate Skip Paths
The Behat step "Apply resolution...with 'Set individual curve results' and 'Manual classification'" atomically sets BOTH resolution code AND manual classification. Tests claiming to isolate one skip path from the other are functionally identical duplicates.
Tag duplicates with @DUPLICATE_COVERAGE rather than deleting (preserves traceability).
Behat only supports "Pos" and "Neg" for manual classification values -- "Amb" (Ambiguous) is not supported. Tests requiring Ambiguous CLS must be tagged @KNOWN_LIMITATION.
27. Vacuous Scenarios -- Tests That Pass But Test Nothing
A scenario is vacuous when it passes for the wrong reason:
- Reusing a qualitative fixture for a quantitative scope test
- Config that does not activate the rule being tested
- Assertion on a generic message that any rule could produce
Fix: Remove vacuous scenarios and document why, or create proper fixtures.
28. Extraction Settings: MIN_CONTROLS vs MINEXTRACT
Two different rules handle minimum control validation. See Gotcha #17 for full details. The key trap: configs with use_extraction_instruments=true + MIN_CONTROLS rule will always produce CONTROL_MISSING on every patient well.
How to check: Config -> Rules sheet -> find "minimum required controls" -> check column B (PROGRAMMATIC RULE NAME).
29. Heavy Configs and Timeouts (Config-Specific)
Configs with very large Rules Mapping sheets (e.g., Quest_EZ_MINCONTROLS.xlsx at 5014 rows) can exceed PHP's 90-second max_execution_time. The timeout typically hits during Collection operations in Laravel's Eloquent layer.
Symptoms: PHP Fatal error: Maximum execution time of 90 seconds exceeded in Collection.php or HasAttributes.php.
See Gotcha #20 for mitigations.
30. V2 Westgard Cleanup Needs More Than remove-westgard-rules.php
The script only checks column B for "westgard" keyword. Rules with RULE ID patterns like WG* (e.g., "Check 13S after 12S", "Check 22S after 13S") need separate removal. Also clear Westgard Limits and Westgard Events sheets entirely.
31. PICQUANT Rules Lack Null IC Guards -- ISSUE-013
PicquantRule.php and PicquantSerumRule.php do not check for null IC observation or null IC CT before processing. TV-001-017 is unreachable (FILEIMPORT catches missing obs first). TV-001-018 with null IC CT causes false inhibition instead of skip.
32. DB_AUDIT_DATABASE Must Match DB_DATABASE
When using the DB pool, pass DB_AUDIT_DATABASE=$DB alongside DB_DATABASE=$DB. Without this, a stale audit DB reference from a previous agent's config cache can cause failures in scenarios that write audit records (e.g., WDCLSCINVSIG).
33. DB_HOST and DB_AUDIT_HOST Must Be 127.0.0.1
The .env file has DB_HOST=mysql and DB_AUDIT_HOST=mysql -- Docker hostnames that do not resolve outside Docker. Always override both when running tests. Without DB_AUDIT_HOST, audit writes fail silently on scenarios that trigger the mysql_audit connection.
34. Fresh DBs Need Pre-Migration
migrate:fresh on an empty DB (no migrations table) skips db:wipe due to FreshCommand.php checking repositoryExists(). This causes the BeforeScenario hook to fail with Table 'migrations' doesn't exist.
See How Test Execution Works > Parallel Execution for the pre-migration command.
35. Rule Precedence and Interference (Detail)
Rules validate in precedence order. A test targeting one rule can fail because a higher-precedence rule fires first and overwrites the outcome.
Key interactions:
- Rules with
is_allow_error_wells = falsewill not run if an earlier rule already set an error PREVENTS ANALYSIS = NOon your target rule's error means later rules will still evaluate and may overwrite- To isolate a rule, remove interfering higher-precedence rule mappings for the test well's role using PhpSpreadsheet
BT-9505 example: COMBINED_OUTCOME_CONTROL (order 10) sets CT_BOUNDARY_HIT with PREVENTS ANALYSIS=NO. Then BNC (order 16) evaluates and overwrites the outcome. Fix: remove BNC and BICQUAL rule mappings for the test well's role.
36. Missing Error Codes When Swapping Configs
Copying a config from one test to another can fail if the source config references error codes not registered in the application. Symptom: No Error Code Defined for code: XXXXX (Exception) at run upload.
Fix: Add the missing error code to the Error Codes sheet, or remove the rule mapping that produces it.
37. Config-Dependent TV Feasibility
Not every TV (test vector) can be tested with every config. Before writing a test, verify the config's structure supports the conditions the TV requires. If it doesn't, you need a different config -- or the TV must be documented as infeasible for that config and tagged appropriately.
Real examples from Wave 7-8 review:
| TV | Required Condition | Config Limitation | Resolution |
|---|---|---|---|
| TV-INDETCTS-001-003 | Zero non-IC targets on well | Nottingham config mandates E gene + S gene (2 non-IC targets always present) | Retagged to TV-001-007 (non-Pos classification). TV-001-003 needs a different config or is structurally infeasible. |
| TV-RWAC-002-004 | Fallback control with use_fallback_shared_controls=true | Test config has use_fallback_shared_controls=false | Tagged @DUPLICATE_COVERAGE; true TV-002-004 needs a different config variant. |
| TV-MINCTRL-002-005 | Control well with MINCONTROL resolution code applied | No scenario applied a resolution before checking | Retagged to TV-001-002 (role skip). TV-002-005 needs a new scenario. |
Decision framework when a TV is infeasible with your config:
- Can you use a different config? Check Config Preference (Essential Rules). Prefer v3 configs.
- Can you modify the config? Use PhpSpreadsheet. But don't break other scenarios sharing the config.
- Neither works? Retag the scenario to the TV it actually exercises. Document the infeasible TV as a coverage gap in the feature file header. Tag scenario with an appropriate qualifier if needed.
See also: Gotcha #17 (MIN_CONTROLS vs MINEXTRACT), Gotcha #22 (single-mix target mismatch).
38. Negative-Path Tests and @NEGATIVE_PATH_GUARD
When a rule only produces errors (never modifies the default outcome), a "no error" assertion is inherently weak -- it would pass even if the rule were completely disabled.
Example: ADJZIKA scenarios in BT-9539 all assert "Detected" (the default Zika outcome). Three scenarios test conditions where ADJ_Zika should NOT fire. But "Detected" is what you'd get whether ADJ_Zika evaluated and found no error, OR whether ADJ_Zika never ran at all. No Behat step exists to confirm rule evaluation occurred.
Pattern for negative-path tests:
- Tag with
@NEGATIVE_PATH_GUARDto clearly signal the assertion's limitation - Add a comment explaining what the scenario verifies and what it cannot distinguish
- Pair with positive-case tests elsewhere that prove the rule CAN fire. Reference the positive case in the comment (e.g., "Positive-case ADJ_Zika coverage in legacy BT-5155").
- If no positive case exists, the negative-path test has very low diagnostic value -- consider whether it's worth keeping.
When IS a negative-path test sufficient?
- The rule has a positive-case companion test proving it works
- The scenario is a regression guard (preventing false positives in production)
- The scenario verifies a DIFFERENT rule fires instead (e.g., asserting "Mixes missing" instead of "Detected" proves mixes-missing ran, even if the target rule didn't)
39. Resolution Steps Must Target the Specific Failing Observation
When a well has multiple observations (e.g., QHSV1 and QHSV2 on an HSV well), the resolution step must target the observation that actually triggered the error -- not just any observation on the well.
Example from BT-9535 GAP-005: Well S1 (NEC role) had two observations:
- QHSV1: upward curve, Neg cls -> passes DSIGMOIDCTRL (Neg + upward = no error)
- QHSV2: downward curve, Neg cls -> fails DSIGMOIDCTRL (Neg + non-downward = error)
The original test applied resolution to QHSV1 (the passing observation). The test passed because the QHSV2 error was never cleared -- but the resolution step appeared to "work" because the initial error assertion on the well had already been satisfied. The fix was changing the resolution target to QHSV2.
Rule: When writing resolution scenarios for multi-observation wells:
- Identify which observation triggers the error (check the rule's evaluation logic, not just the well-level outcome)
- Apply resolution to THAT specific observation
- After re-analysis, assert both the well outcome AND the resolved observation's final cls/ct
This is especially important when wells are remapped (e.g., well_number: "S1" maps to physical well a4), because the observation data may not match what you'd expect from the well label.
40. @USE_SAME_CONFIG Must NOT Be First Scenario-Level Tag
When @USE_SAME_CONFIG is placed on a scenario's tag line, it must NOT be the first tag. BaseFeatureContext::beforeScenario() uses Arr::first($event->getScenario()->getTags()) to determine the BT key for fixture directory lookup (support_files/BT-XXXX/). If @USE_SAME_CONFIG is the first scenario tag, the fixture path resolves to support_files/USE_SAME_CONFIG/ -- which doesn't exist -- causing "file not found" errors on every scenario.
Wrong:
@USE_SAME_CONFIG
Feature: My test
@USE_SAME_CONFIG @TEST_BT-5700 @REQ_BT-5368
Scenario Outline: Test something
Right (file-level only -- preferred):
@USE_SAME_CONFIG
Feature: My test
@TEST_BT-5700 @REQ_BT-5368
Scenario Outline: Test something
Also right (if scenario-level desired, put after @TEST):
@USE_SAME_CONFIG
Feature: My test
@TEST_BT-5700 @USE_SAME_CONFIG @REQ_BT-5368
Scenario Outline: Test something
Best practice: Put @USE_SAME_CONFIG on line 1 (feature level) only. The scenario-level tag is redundant since the code checks getFeature()->getTags().
41. Config Must Have Resolution Codes for Well Resolution Tests
If your scenario resolves a well (e.g., When Apply resolution to well "A1" with "..."), the config XLSX must have resolution codes configured for the error on that well. Without them, the API returns "Resolution is not allowed for the selected well" and the scenario fails.
Two categories of resolution codes exist:
| Well State | Required Config | Where in XLSX |
|---|---|---|
Well has a specific error code (e.g., SYSTEMIC_INHIBITON_DETECTED) | Resolution codes linked to that error code | Error Resolutions sheet -- rows where the error code column matches |
| Patient-level well (no error code, or general resolution) | Patient-level resolution codes (non-error-linked) | Error Resolutions sheet -- rows with no error code association |
Symptom: Scenario fails at the resolution step with Resolution is not allowed for the selected well. The well processes correctly, the error fires correctly, but the resolution step is rejected because the config has zero resolution codes for that error.
Fix: Open the config XLSX Error Resolutions sheet and add resolution code rows for the relevant error code. Or use a config that already has them (most v3/v30/v31 configs have resolution codes for common error codes).
Real example: 5 SYSINH scenarios failed because the config had 0 resolution codes for SYSTEMIC_INHIBITON_DETECTED and 0 patient-level resolution codes. The rule fired correctly, the error was set correctly, but the resolution step was rejected. Adding resolution codes to the Error Resolutions sheet fixed all 5.
Pre-flight check: Before writing any resolution scenario, open the config's Error Resolutions sheet and verify rows exist for the error code your scenario will resolve. Add to the Before writing any test verification table.
Test-First Policy
Tests document expected behavior per SRS/STD specifications:
- Tests are NOT skipped due to code implementation gaps
- Failing tests indicate code bugs, not test errors
- Tag with
@KNOWN_CODE_ISSUEwhen code does not implement expected behavior - Log code issues in
docusaurus/docs/traceability/known-issues.md - See Known Issues Registry for resolution process
Known Code Issues Reference
Scenarios tagged @KNOWN_CODE_ISSUE fail due to application bugs, not test errors:
| Issue | Rule | Description |
|---|---|---|
| ISSUE-005 | NOTTSQM | CF=0 treated as no multiplier instead of producing 0 quantity |
| ISSUE-006 | NOTTSQM | Negative concentration factor causes MySQL unsigned DECIMAL overflow |
| ISSUE-009 | INHCT | INH rule does not fire IC_FAILED on inhibited controls |
| ISSUE-010 | MIXMISS | Cross-run reanalysis does not resolve mixes missing (2 scenarios) |
| ISSUE-013 | PICQUANT | No null IC observation/CT guard -- false inhibition instead of skip |
Additional @KNOWN_CODE_ISSUE scenarios exist in other feature files (BT-9513 INHCT, BT-9516 INHSERUMQUANT, BT-9517 WG, BT-9547 INHQUANT). See the Known Issues Registry for the full list and resolution status.
Code naming quirk: ArrayItemsSeeker::penultimateIndex() returns the LAST index (count - 1), not the second-to-last. This affects sigmoid evaluation in SigmoidIdentifier -- the "middle vs penultimate" comparison is actually "middle vs last." BT-9402's original @KNOWN_CODE_ISSUE was based on misunderstanding this method name (removed after investigation confirmed the code behavior is correct, just misnamed).
Subagent Test Creation Patterns
Overview
Subagents can autonomously create gap tests when given sufficient context. Each subagent handles ONE rule's gaps, creates fixtures + feature file, runs tests, and iterates.
Proven: NOTTSQM (6 scenarios, 6/6 pass) and POSSIGMOID (5 scenarios, 5/5 pass) were created entirely by subagents.
Parallel Behat Execution
Subagents run Behat tests in parallel using the database pool. Each subagent checks out a DB before running tests. See How Test Execution Works > Parallel Execution for the canonical checkout/run/checkin commands.
Pool capacity: 10 databases, so up to 10 parallel test runs.
Subagent Prompt Template
## Task: Create {RULE} Gap Tests ({N} new test vectors)
You are creating Behat tests for the {RULE} rule ({description}).
### READ FIRST
Read these files before starting:
1. `docusaurus/docs/guides/llm/guide-llm-behat-creation.md`
2. Existing fixture: `tests/support_files/BT-{EXISTING}/{template}.json`
3. Existing scenario in: `tests/exports/cucumber/{file}.feature` (lines X-Y)
### What to Create
- Support directory: `tests/support_files/BT-{NEW}/`
- Config: copy from {source}, verify rule mappings
- {N} JSON fixtures: clone from {template}, change {fields}
- Feature file: `tests/exports/cucumber/{priority}_BT-{NEW}.feature`
### TV Details
| Fixture | Key Change | TV ID | Expected Outcome |
|---------|-----------|-------|-----------------|
| ... | ... | ... | ... |
### Iterative Strategy
1. Create all fixtures
2. Dry-run to verify parsing
3. Run with MINIMAL assertions first (CT, CLS only)
4. Check actual output for quantities/outcomes
5. Update assertions to match actual values
6. Re-run to confirm all pass
### Running Tests (Parallel-Safe with DB Pool)
See "How Test Execution Works > Parallel Execution" in guide-llm-behat-creation.md
for the canonical DB checkout/run/checkin commands. Use paths:
- DB pool JSON: `/shared/code/req_docs/tests/scripts/db-pool.json`
- DB pool lock: `/shared/code/req_docs/tests/scripts/db-pool.lock`
- Code dir: `/shared/code/req_docs/code`
### After Tests Pass - Copy to new_tests/
cp tests/exports/cucumber/{file}.feature new_tests/exports/cucumber/
cp -r tests/support_files/BT-{NEW}/ new_tests/support_files/
### Success Criteria
- All scenarios parse (dry-run)
- All scenarios pass with correct assertions
- Files copied to new_tests/ for git tracking
- Report final results with actual values
Proven Patterns
Pattern A: Value Variation (NOTTSQM-style)
For rules that transform a single input value:
- Clone ONE working fixture
- Change only the input field (e.g.,
concentration_factor) - One scenario per value, minimal assertions
- Best for: NOTTSQM, CTCUTOFF boundary tests, threshold tests
Pattern B: Skip Condition (POSSIGMOID-style)
For rules that should NOT fire when conditions are met:
- Use existing fixture that TRIGGERS the error (baseline)
- Each scenario: trigger error -> apply skip condition -> re-analyse -> verify error gone
- Step syntax for resolution + manual classification:
When Apply resolution to well "A1" with "Set individual curve results" and "Manual classification" to observation "{TARGET}" with "{CLS}"
And Re analyse the run file
Then well "A1" should have "{new_outcome}" outcome
- Check Error Resolutions sheet in config for the
RULES SKIP ON RE-ANALYSIScolumn - Best for: POSSIGMOID, NEGSIGMOID, DSIGMOID, POSSIGMOIDCTRL, NEGSIGMOIDCTRL
Pattern C: Control Well Role (DELTACT-style)
For rules that only apply to specific roles:
- Check Rules Mapping sheet for which roles the rule applies to
- Use appropriate well label (
PC*,MPXNC*,NEG*, etc.) - Be aware of COMBINED_OUTCOME_CONTROL at precedence ~10
- Best for: DELTA_CT, control-specific rules
Config Modification Checklist
When copying configs for new tests:
| Sheet | What to Check | Common Fix |
|---|---|---|
| Error Codes | Missing error codes | Add the error code row your rule produces |
| Error Resolutions | RULES SKIP ON RE-ANALYSIS column | Fix to match the rule's programmatic name |
| Error Resolutions | Resolution codes for error codes under test | Add rows if scenario resolves wells (Gotcha #41) |
| Rules Mapping | Rule mapped to intended roles | Add role mappings if missing |
| Rules | LINEAR_REGRESSION_VALIDATION precedence | Set to 8 (must be < STDQT's 38) |
| Combined Outcomes | Outcome strings | Note exact strings for assertions |
| Control Labels | Well label -> role mapping | Verify fixture labels map correctly |
Subagent Checklist
Before running tests, the subagent should verify:
- Support directory created in
tests/support_files/BT-XXXX/(for Behat) - Config copied and relevant sheets checked
- JSON fixtures:
run_nameupdated, key fields changed,well_numberlowercase - Feature file: tabs for indentation,
@REQ_BT-XXXXon Feature line 1 with@USE_SAME_CONFIG,@TEST_BT-XXXX @TV-*on each Scenario - Dry-run passes (no parsing errors)
- Full run with minimal assertions passes
- Assertions updated based on actual system output
- Final run: all scenarios green
- Copy to
new_tests/for git tracking (tests/ is gitignored subtree)
Completed Gap Tests (Reference)
| Rule | Feature File | Scenarios | Status | BT Key |
|---|---|---|---|---|
| QUANTVAL | 14_BT-9001.feature | 9 | 9/9 pass | BT-9001 |
| DELTACT | 15_BT-9101.feature | 7 | 7/7 pass | BT-9101 |
| NOTTSQM | 16_BT-9201.feature | 6 | 5/6 pass (1 @KNOWN_CODE_ISSUE: negative CF DB overflow) | BT-9201 |
| POSSIGMOID | 17_BT-9301.feature | 5 | 5/5 pass | BT-9301 |
| NEGSIGMOID | 18_BT-9401.feature | 5 | 5/5 pass | BT-9401 |
| DSIGMOID | 19_BT-9402.feature | 7 | 6/7 pass (1 @KNOWN_CODE_ISSUE: TV-BND-001) | BT-9402 |
| CTCUTOFF | 20_BT-9403.feature | 15 | 15/15 pass | BT-9403 |
| MINCTRL | 21_BT-9404.feature | 8 | 8/8 pass (ISSUE-007 resolved: quest-v3 uses MINEXTRACT) | BT-9404 |
| ICQUALSERUM | 22_BT-9405.feature | 5 | 5/5 pass | BT-9405 |
| INHQUAL | 23_BT-9501.feature | 7 | 7/7 pass | BT-9501 |
| RQUAL | 24_BT-9502.feature | 3 | 3/3 pass | BT-9502 |
| INCONCLUSIVE | 25_BT-9503.feature | 8 | 8/8 pass | BT-9503 |
| CONTROLFAIL | 26_BT-9504.feature | 9 | 9/9 pass | BT-9504 |
| COMBOUTCTRL | 27_BT-9505.feature | 8 | 8/8 pass | BT-9505 |
| EXTCTRL | 28_BT-9506.feature | 5 | 5/5 pass | BT-9506 |
| MINCTRL resolution | 29_BT-9507.feature | 4 | 4/4 pass | BT-9507 |
| STDQT | 30_BT-9508.feature | 6 | 6/6 pass | BT-9508 |
| WG Westgard | 31_BT-9509.feature | 8 | 8/8 pass | BT-9509 |
| COMBOUT | 32_BT-9510.feature | 14 | 14/14 pass | BT-9510 |
| THRESH | 33_BT-9511.feature | 8 | 8/8 pass | BT-9511 |
| STDCURVE | 34_BT-9512.feature | 6 | 6/6 pass | BT-9512 |
| INHCT | 35_BT-9513.feature | 10 | 6/10 pass (4 @KNOWN_CODE_ISSUE ISSUE-009) | BT-9513 |
| SYSINH | 36_BT-9514.feature | 9 | 9/9 pass | BT-9514 |
| MIXMISS | 37_BT-9515.feature | 12 | 10/12 pass (2 @KNOWN_CODE_ISSUE ISSUE-010) | BT-9515 |
| INHSERUMQUANT | 38_BT-9516.feature | 11 | 10/11 pass (1 @KNOWN_CODE_ISSUE) | BT-9516 |
| WG remaining | 39_BT-9517.feature | 21 | 20/21 pass (1 @KNOWN_CODE_ISSUE) | BT-9517 |
| RQUANT | 40_BT-9518.feature | 14 | 14/14 pass | BT-9518 |
| AMB | 41_BT-9519.feature | 15 | 15/15 pass | BT-9519 |
| QUANTVAL remaining | 42_BT-9520.feature | 11 | 11/11 pass (TV-001-006 @KNOWN_LIMITATION) | BT-9520 |
| RQUANTQUAL | 43_BT-9521.feature | 8 | 8/8 pass | BT-9521 |
| MNGQTY | 44_BT-9522.feature | 7 | 7/7 pass | BT-9522 |
| WDCTC | 45_BT-9523.feature | 8 | 8/8 pass | BT-9523 |
| LINREG | 46_BT-9524.feature | 7 | 7/7 pass | BT-9524 |
| QSSC | 47_BT-9525.feature | 2 | 2/2 pass (supplemental) | BT-9525 |
| WDCLSCINVSIG | 48_BT-9526.feature | 7 | 7/7 pass | BT-9526 |
| INHQUAL remaining | 49_BT-9527.feature | 9 | 9/9 pass | BT-9527 |
| STDQT remaining | 50_BT-9528.feature | 6 | 6/6 pass | BT-9528 |
| SWCOMBOUT | 51_BT-9529.feature | 7 | 7/7 pass | BT-9529 |
| INDETCTS | 52_BT-9530.feature | 7 | 7/7 pass | BT-9530 |
| WCAF | 54_BT-9532.feature | 7 | 7/7 pass | BT-9532 |
| REPEATSAMP | 55_BT-9533.feature | 5 | 5/5 pass | BT-9533 |
| NEGSIGMOIDCTRL | 56_BT-9534.feature | 5 | 5/5 pass | BT-9534 |
| DSIGMOIDCTRL | 57_BT-9535.feature | 5 | 5/5 pass | BT-9535 |
| QSSC+RQUAL | 58_BT-9536.feature | 10 | 10/10 pass | BT-9536 |
| UNEXPFL | 59_BT-9537.feature | 5 | 5/5 pass | BT-9537 |
| CC | 60_BT-9538.feature | 5 | 5/5 pass | BT-9538 |
| POSSIGMOIDCTRL+ADJZIKA | 61_BT-9539.feature | 8 | 8/8 pass | BT-9539 |
| MINFL | 62_BT-9540.feature | 6 | 6/6 pass | BT-9540 |
| RWAC | 63_BT-9541.feature | 4 | 4/4 pass | BT-9541 |
| WDCT+WDCLSC+WDCLS | 64_BT-9542.feature | 10 | 10/10 pass | BT-9542 |
| CTCUTOFF remaining | 65_BT-9543.feature | 4 | 4/4 pass | BT-9543 |
| QTYWEIGHT | 66_BT-9544.feature | 4 | 4/4 pass | BT-9544 |
| WDCLSINVSIG | 67_BT-9545.feature | 3 | 3/3 pass | BT-9545 |
| MINCTRL edge | 68_BT-9546.feature | 3 | 3/3 pass | BT-9546 |
| INHQUANT skip | 69_BT-9547.feature | 2 | 2/2 pass (both @KNOWN_CODE_ISSUE ISSUE-013) | BT-9547 |
| ICCT | 70_BT-9548.feature | 3 | 3/3 pass (GAP-003 @KNOWN_LIMITATION) | BT-9548 |
| MWCOMBOUT | 71_BT-9549.feature | 2 | 2/2 pass (GAP-002 @KNOWN_LIMITATION) | BT-9549 |
| NEC+NECINH | 72_BT-9550.feature | 3 | 3/3 pass | BT-9550 |
| RRES | 73_BT-9551.feature | 2 | 2/2 pass | BT-9551 |
Next BT key to use: BT-9552