- tests/test_testcases.py: new ChatClient using /api/chat SSE (replaces
/api/send polling), each testcase gets own session_id
- Registered as 'testcases' suite in run_tests.py (25 markdown testcases)
- Results post to /api/test-results for real-time /tests dashboard
- Reuses parser + assertion engine from runtime_test.py
- Usage: python tests/run_tests.py testcases/fast
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run_tests.py: --repeat=N runs each test N times, aggregates into one result
- Stats include: runs, pass_rate, min/avg/p50/p95/max_ms
- Stats posted in result.stats field for dashboard display
- Works with all suites (engine, api, matrix, roundtrip)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New 'matrix' suite runs same API tests with different LLM model configs:
- Variants: gemini-flash (baseline), haiku, gpt-4o-mini
- Tests: eras_query (SQL correctness), eras_artifact (data output), social_reflex (fast path)
- Posts results as test_name[variant] to /tests dashboard
- All 9 combos passing (6/9 verified locally, ~35s for ERAS tests)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New 'engine' suite in run_tests.py with tests that verify frame engine
mechanics without LLM calls. Covers graph loading, node instantiation,
edge type completeness, reflex/tool_output conditions, and frame trace
structure for reflex/expert/expert+interpreter pipelines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>