I’m working on a project where we have users replying among other things with sounds like:
* **Agreeing:** “hm-hmm”, “mhm”
* **Disagreeing:** “mm-mm”, “uh-uh”
* **Undecided/Thinking:** “hmmmm”, “mmm…”
I tested **OpenAI Whisper** and **GPT-4o transcribe**. Both work okay for yes/no, but:
* Sometimes confuse yes and no.
* Especially unreliable with the undecided/thinking sounds (“hmmmm”).
Before I go deeper into custom training:
👉 **Does anyone know models, APIs, or setups that handle this kind of sound reliabl**y?
👉 **Anyone tried this before and has learning**s?
Thanks!