Running YOLO on-device changes how you think about ML products

The moment a model leaves a notebook and enters a real product, everything becomes more honest.

For Open RTMS, the challenge was not just detection accuracy. It was whether the full pipeline could hold up in real time:

What becomes important

Once inference happens on-device, latency matters more. Resource usage matters more. Failure modes matter more.

You start making different decisions:

That pressure is useful. It forces you to build ML systems like products instead of demos.