PyTorch: Real-time Audio-visual Speech Recognition

Open link in next tab

Real-time Audio-visual Speech Recognition

https://pytorch.org/blog/real-time-speech-rec/

Audio-Visual Speech Recognition (AV-ASR, or AVSR) is the task of transcribing text from audio and visual streams, which has recently attracted a lot of research attention due to its robustness to noise. The vast majority of work to date has focused on developing AV-ASR models for non-streaming recognition; studies on streaming AV-ASR are very limited.

Real-time Audio-visual Speech Recognition