I’m experimenting with a multi-stage voice pipeline something that takes raw audio input and processes it through multiple NLP layers (like emotion, tone, and intent). The idea is to understand not just *what* is being said, but deeper nuances behind it.
I’m being intentionally vague for now, but would love to hear from folks who’ve worked on:
* Audio-first NLP workflows
* Transformer models beyond standard text applications
* Challenges with emotional/contextual understanding from speech
Not a research paper request — just curious to connect with anyone who’s walked this path before.
DMs are open if that’s easier.