agent-runtime

Archived

Author	SHA1	Message	Date
Nico	44f6116855	feat(tests): ui suite — toolbar groups, scroll preservation, DOM keep-alive 6 Playwright tests against nyx-test (localhost:30802, auth disabled): - toolbar group counts per route (nyx=4, tests=2, home=1) - toolbar survives full nav roundtrip without losing groups - scroll position preserved across navigation (keep-alive working) - all visited views stay in DOM (not removed on nav) Run: NYX_URL=http://localhost:30802 python tests/run_tests.py ui Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 23:24:51 +02:00
Nico	b35610cf6f	Add node-level test suite for ErasExpertNode 6 tests that instantiate ErasExpertNode directly (no HTTP, no pipeline). Assert SQL table selection, JOIN patterns, and response hygiene. 2 LLM calls per test vs 4+ for matrix — runs in ~22s total locally. Requires pymysql in venv and DB access (WireGuard or NodePort). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 22:12:01 +02:00
Nico	d8e832d2d4	Add --parallel=N for concurrent test execution - run_tests.py: ThreadPoolExecutor runs N tests concurrently within a suite - Each testcase has its own session_id so parallel is safe - Engine tests: fixed asyncio.new_event_loop() for thread safety - Usage: python tests/run_tests.py testcases --parallel=3 - Wall time reduction: ~3x for testcases (15min → 5min with parallel=3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 20:01:06 +02:00
Nico	c21ff08211	Unify testcases into run_tests.py: SSE client, session isolation, dashboard - tests/test_testcases.py: new ChatClient using /api/chat SSE (replaces /api/send polling), each testcase gets own session_id - Registered as 'testcases' suite in run_tests.py (25 markdown testcases) - Results post to /api/test-results for real-time /tests dashboard - Reuses parser + assertion engine from runtime_test.py - Usage: python tests/run_tests.py testcases/fast Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 19:48:58 +02:00
Nico	b4031611e2	Add --repeat=N mode to test runner with timing stats (avg/p50/p95) - run_tests.py: --repeat=N runs each test N times, aggregates into one result - Stats include: runs, pass_rate, min/avg/p50/p95/max_ms - Stats posted in result.stats field for dashboard display - Works with all suites (engine, api, matrix, roundtrip) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:18:09 +02:00
Nico	4e679a3ad9	Add model matrix test suite: 3 tests × 3 variants = 9 combos New 'matrix' suite runs same API tests with different LLM model configs: - Variants: gemini-flash (baseline), haiku, gpt-4o-mini - Tests: eras_query (SQL correctness), eras_artifact (data output), social_reflex (fast path) - Posts results as test_name[variant] to /tests dashboard - All 9 combos passing (6/9 verified locally, ~35s for ERAS tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:12:24 +02:00
Nico	097c7f31f3	Add engine test suite: 8 tests for graph loading, conditions, frame traces New 'engine' suite in run_tests.py with tests that verify frame engine mechanics without LLM calls. Covers graph loading, node instantiation, edge type completeness, reflex/tool_output conditions, and frame trace structure for reflex/expert/expert+interpreter pipelines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:01:06 +02:00

7 Commits