The moment a model leaves a notebook and enters a real product, everything becomes more honest.

For Open RTMS, the challenge was not just detection accuracy. It was whether the full pipeline could hold up in real time:

  • video capture,
  • preprocessing,
  • inference,
  • response handling,
  • and the path back to the client app.

What becomes important

Once inference happens on-device, latency matters more. Resource usage matters more. Failure modes matter more.

You start making different decisions:

  • smaller models,
  • more predictable pipelines,
  • simpler APIs,
  • clearer monitoring.

That pressure is useful. It forces you to build ML systems like products instead of demos.