We need someone to help us build an real-time transcription interface with our fine-tuned Whisper model. The model would be hosted on our VM with A100 GPU. It should be able to interact with our React frontend via REST API or support a basic UI built with Streamlit/Gradio.
Main features:
- Interface to transcribe audio in real-time..
- Integrate VAD/Diarization using Silero VAD and Pyannote.