Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks https://ift.tt/2wMC6Gd

, No Comments
Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks Hi HN — we're the team behind Arch ( https://ift.tt/d8WOGUm ), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router ( https://ift.tt/SPGxdCa ), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps: - Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers. - Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?” Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper ( https://ift.tt/cLgxMbs ), but here's a snapshot: Specs: - 1.5B params — runs on a single GPU (or CPU for testing) - No retraining needed — point it at any mix of LLMs - Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones - Outperforms larger closed models on our conversational routing benchmarks (details in the paper) Links: - Arch Proxy (open source): https://ift.tt/d8WOGUm - Model + code: https://ift.tt/SPGxdCa - Paper: https://ift.tt/cLgxMbs July 1, 2025 at 10:43PM

0 टिप्पणियाँ:

एक टिप्पणी भेजें