I’m working on a project where we have users replying among other things with sounds like:

* **Agreeing:** “hm-hmm”, “mhm”
* **Disagreeing:** “mm-mm”, “uh-uh”
* **Undecided/Thinking:** “hmmmm”, “mmm…”

I tested **OpenAI Whisper** and **GPT-4o transcribe**. Both work okay for yes/no, but:

* Sometimes confuse yes and no.
* Especially unreliable with the undecided/thinking sounds (“hmmmm”).

Before I go deeper into custom training:

👉 **Does anyone know models, APIs, or setups that handle this kind of sound reliabl**y?

👉 **Anyone tried this before and has learning**s?

Thanks!

Posted in