agent-runtime

gitea/agent-runtime

Archived

Fork 0

Commit Graph

Author	SHA1	Message	Date
Nico	b4031611e2	Add --repeat=N mode to test runner with timing stats (avg/p50/p95) - run_tests.py: --repeat=N runs each test N times, aggregates into one result - Stats include: runs, pass_rate, min/avg/p50/p95/max_ms - Stats posted in result.stats field for dashboard display - Works with all suites (engine, api, matrix, roundtrip) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:18:09 +02:00
Nico	4e679a3ad9	Add model matrix test suite: 3 tests × 3 variants = 9 combos New 'matrix' suite runs same API tests with different LLM model configs: - Variants: gemini-flash (baseline), haiku, gpt-4o-mini - Tests: eras_query (SQL correctness), eras_artifact (data output), social_reflex (fast path) - Posts results as test_name[variant] to /tests dashboard - All 9 combos passing (6/9 verified locally, ~35s for ERAS tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:12:24 +02:00
Nico	097c7f31f3	Add engine test suite: 8 tests for graph loading, conditions, frame traces New 'engine' suite in run_tests.py with tests that verify frame engine mechanics without LLM calls. Covers graph loading, node instantiation, edge type completeness, reflex/tool_output conditions, and frame trace structure for reflex/expert/expert+interpreter pipelines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 18:01:06 +02:00

Author

SHA1

Message

Date

Nico

b4031611e2

Add --repeat=N mode to test runner with timing stats (avg/p50/p95)

- run_tests.py: --repeat=N runs each test N times, aggregates into one result
- Stats include: runs, pass_rate, min/avg/p50/p95/max_ms
- Stats posted in result.stats field for dashboard display
- Works with all suites (engine, api, matrix, roundtrip)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-03 18:18:09 +02:00

Nico

4e679a3ad9

Add model matrix test suite: 3 tests × 3 variants = 9 combos

New 'matrix' suite runs same API tests with different LLM model configs:
- Variants: gemini-flash (baseline), haiku, gpt-4o-mini
- Tests: eras_query (SQL correctness), eras_artifact (data output), social_reflex (fast path)
- Posts results as test_name[variant] to /tests dashboard
- All 9 combos passing (6/9 verified locally, ~35s for ERAS tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-03 18:12:24 +02:00

Nico

097c7f31f3

Add engine test suite: 8 tests for graph loading, conditions, frame traces

New 'engine' suite in run_tests.py with tests that verify frame engine
mechanics without LLM calls. Covers graph loading, node instantiation,
edge type completeness, reflex/tool_output conditions, and frame trace
structure for reflex/expert/expert+interpreter pipelines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-03 18:01:06 +02:00

3 Commits