The moment a model leaves a notebook and enters a real product, everything becomes more honest.
For Open RTMS, the challenge was not just detection accuracy. It was whether the full pipeline could hold up in real time:
- video capture,
- preprocessing,
- inference,
- response handling,
- and the path back to the client app.
What becomes important
Once inference happens on-device, latency matters more. Resource usage matters more. Failure modes matter more.
You start making different decisions:
- smaller models,
- more predictable pipelines,
- simpler APIs,
- clearer monitoring.
That pressure is useful. It forces you to build ML systems like products instead of demos.