Show HN: PhAIL – Real-robot benchmark for AI models https://ift.tt/arPWgLi

posted by Thar Desert Times , No Comments

Show HN: PhAIL – Real-robot benchmark for AI models I built this because I couldn't find honest numbers on how well VLA models [1] actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know. PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running. Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+. Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions. Happy to answer questions about methodology, the models, or what we observed. [1] Vision-Language-Action: https://ift.tt/IPlR3oc https://phail.ai March 31, 2026 at 09:55PM

Thar Desert Times

Show HN: PhAIL – Real-robot benchmark for AI models https://ift.tt/arPWgLi

0 टिप्पणियाँ:

एक टिप्पणी भेजें

Pages

About Me

Thar Desert Times

Popular Posts

Random Posts

ब्लॉग आर्काइव

Label Cloud

Contact Us

लेबल

बुरे बर्ताव की शिकायत करें

About Us

यह ब्लॉग खोजें

Show HN: Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6× https://ift.tt/zskwVKy

Popular Posts

Newsletter

Subscribe Our Newsletter

Show HN: PhAIL – Real-robot benchmark for AI models https://ift.tt/arPWgLi

0 टिप्पणियाँ:

एक टिप्पणी भेजें

Pages

About Me

Thar Desert Times

Popular Posts

Random Posts

ब्लॉग आर्काइव

Label Cloud

Contact Us

लेबल

बुरे बर्ताव की शिकायत करें

About Us

यह ब्लॉग खोजें

Show HN: Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6× https://ift.tt/zskwVKy

Popular Posts

सदस्यता लें

Newsletter

Subscribe Our Newsletter