Learnings from audio recognition product

In last couple weeks, I helped with our new audio recognition product development. It recently launched public here It’s quite interesting to know how audio recognition actually works.

The solution I imagined

Based on my previous HMM experience, I thought the recognition would be a process of feature scoring and the development would be a candidate feature generation and how to quickly select the top 1 among large scale inventory(music, tv show, live game). Apparently, I was completedly wrong :)

The actual solution

I entered this project in the late of development. I was amazed by the well-written incredibly efficient C++ audio index service. There wasn’t any fancy ML algoritm or sophisticated math equation. It’s purely fast online indexing service.

Essentially, we connected with existing audio inventory or live audio stream if it’s a live game. And we conducted rolling window indexing in seconds level.

This project taught me how powerful a well-written C++ service could be. It basically avoided unnecessary ML models building. And it is more accurate approach :)