Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks https://ift.tt/2wMC6Gd

posted by Thar Desert Times , No Comments

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks Hi HN — we're the team behind Arch ( https://ift.tt/d8WOGUm ), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router ( https://ift.tt/SPGxdCa ), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps: - Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers. - Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?” Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper ( https://ift.tt/cLgxMbs ), but here's a snapshot: Specs: - 1.5B params — runs on a single GPU (or CPU for testing) - No retraining needed — point it at any mix of LLMs - Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones - Outperforms larger closed models on our conversational routing benchmarks (details in the paper) Links: - Arch Proxy (open source): https://ift.tt/d8WOGUm - Model + code: https://ift.tt/SPGxdCa - Paper: https://ift.tt/cLgxMbs July 1, 2025 at 10:43PM

Thar Desert Times

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks https://ift.tt/2wMC6Gd

0 टिप्पणियाँ:

एक टिप्पणी भेजें

Pages

About Me

Thar Desert Times

Popular Posts

Random Posts

ब्लॉग आर्काइव

Label Cloud

Contact Us

लेबल

बुरे बर्ताव की शिकायत करें

About Us

यह ब्लॉग खोजें

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/nyCOMSx

Popular Posts

Newsletter

Subscribe Our Newsletter

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks https://ift.tt/2wMC6Gd

0 टिप्पणियाँ:

एक टिप्पणी भेजें

Pages

About Me

Thar Desert Times

Popular Posts

Random Posts

ब्लॉग आर्काइव

Label Cloud

Contact Us

लेबल

बुरे बर्ताव की शिकायत करें

About Us

यह ब्लॉग खोजें

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/nyCOMSx

Popular Posts

सदस्यता लें

Newsletter

Subscribe Our Newsletter