New 'matrix' suite runs same API tests with different LLM model configs:
- Variants: gemini-flash (baseline), haiku, gpt-4o-mini
- Tests: eras_query (SQL correctness), eras_artifact (data output), social_reflex (fast path)
- Posts results as test_name[variant] to /tests dashboard
- All 9 combos passing (6/9 verified locally, ~35s for ERAS tests)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /api/chat accepts {"models": {"role": "provider/model"}} for per-request overrides
- runtime.handle_message passes model_overrides through to frame engine
- All 4 graph definitions (v1-v4) now declare MODELS dicts
- test_graph_has_models expanded to verify all graphs
- 11/11 engine tests green
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that will pass once implemented:
- graph_has_models: graph definition includes MODELS dict
- instantiate_applies_graph_models: node.model set from graph config
- model_override_per_request: process_message accepts model_overrides
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New 'engine' suite in run_tests.py with tests that verify frame engine
mechanics without LLM calls. Covers graph loading, node instantiation,
edge type completeness, reflex/tool_output conditions, and frame trace
structure for reflex/expert/expert+interpreter pipelines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>