Real-world voice activity detection over long-form audio, powered by Whisper encoder refinements. This repo presents the architectural refinements proposed by this project—covering encoder-only, ...