Features
What you can do with WhisperApp — a walkthrough of each feature
Everything Works Offline
Every feature in WhisperApp runs without internet. Your audio data never leaves your machine, ensuring complete privacy. Confidential meeting recordings and interviews can be processed with peace of mind.
File Transcription
Drag & drop audio or video files and convert them to accurate text with one click. Handle meeting recordings, interviews, lectures, and more.
- Auto-selects the best of 4 backends: CUDA / OpenVINO / Vulkan / CPU
- Batch parallel processing for multiple files simultaneously
- Loop detection & watchdog for automatic error recovery
- Open completed tasks directly in the editor or LLM
Real-time Transcription
Transcribe microphone or system audio in real-time while recording. No need to take notes during meetings and interviews — the text appears as you speak.
- Simultaneous recording + transcription — get both the audio file and text
- System audio capture (whole system / specific app) for real-time recognition (Pro)
- Level meter for real-time input monitoring
Speaker DiarizationPro
Automatically identify who said what in multi-speaker audio. Essential for creating meeting minutes and interview transcripts.
- High-accuracy speaker separation powered by a dedicated engine
- Auto-detect or manually specify speaker count (2–10) per file
- 24-color palette for visual speaker identification in the editor and audio bar
- Add, rename, delete speaker tags, and batch-change speakers across segments
Editor
A dedicated editing tool for efficiently finishing transcription results. Works with or without speaker diarization. Features per-segment audio playback, a fully keyboard-driven workflow, and auto-recovery. Dramatically speeds up proofreading for meeting minutes and interview records.
- Fully keyboard-driven (E=edit, D=split, M=merge, T=timing, S=speaker, Space=play, arrows=navigate)
- Auto-recovery: editing state is restored even after a crash or unexpected app close
- Speaker color-coded audio bar — click to seek, visually grasp the overall structure
- Variable speed playback (0.5x–2.0x) — speed audio files pre-generated in the background
- Multi-tab support — edit multiple files with independent state simultaneously
- Undo/redo for all 11 operation types (text edit, speaker change, split, merge, delete, insert, timing, etc.)
- Overlap detection during timestamp adjustment — auto-suggests resolution strategies
3-Mode Recording & Download
Capture audio via microphone, PC system audio, or YouTube/URL download. Recordings are automatically added to the transcription queue for a seamless record-to-text workflow.
Microphone Recording
Select your device and start recording. Save in 6 formats: WAV/FLAC/MP3/AAC/OGG/OPUS. Can be used alongside real-time transcription.
System Audio Capture (Pro)Pro
Capture audio output from your PC. Record the entire system audio, or specify a particular application (Zoom, Teams, etc.) to capture only that app's audio.
YouTube/URL Download (Pro)Pro
Paste a YouTube or other video URL to download. Supports audio-only extraction or video with quality selection. Downloads are auto-added to the transcription queue.
- Auto-add recordings to the file list after completion
- Level meter and recording timer for real-time monitoring
Local LLM (AI Analysis & Summarization)Pro
Local AI chat running entirely on your machine. Load transcription results and ask "Summarize this" or "What are the key points?" — the AI analyzes the content locally. Confidential data stays safe.
- Streaming responses with real-time AI output
- Create, edit, and manage custom prompt templates
- Save, rename, restore, and export conversation history
- Adjust context size and control the LLM server from the GUI
Video Subtitle GenerationPro
Add subtitles to videos using transcription results. Supports both hardcoded (burned-in) and soft subtitles (as a track). Subtitle files can also be used for YouTube uploads.
- Hard sub: burned into the video (subtitles always visible)
- Soft sub: embedded as a track (can be toggled on/off)
- Customize font, size, color, and position
Smartphone IntegrationPro
Pair with WhisperApp for Android (free) to record on your phone and leverage your PC's GPU for fast, high-accuracy transcription. Perfect for field recordings processed at your desk. (Currently in development — coming in a future update)
- Local Wi-Fi communication — no internet needed
- WebSocket for real-time progress updates on your phone
- Leverage PC GPU (CUDA/OpenVINO/Vulkan) from your smartphone
Model Management & ModelHub
Freely choose and manage AI models for speech recognition and LLM. Beginners get recommended models, while advanced users can add any model from HuggingFace. Auto-detected GPU/VRAM info helps you check hardware compatibility before downloading.
- Recommended models: optimal quantization per model size, marked with a badge for easy selection
- Custom models: search HuggingFace or enter a direct URL to add any model (fine-tuned, etc.)
- GPU/VRAM auto-detection: hardware info automatically detected and displayed in status bar
- LLM models include VRAM/RAM requirements for easy compatibility checking
- 6 ASR models with quantization variants + 12+ LLM series — extensive model library
- Up to 3 concurrent downloads, with full download management and deletion
Automatic Engine Updates
Check and install updates for transcription, LLM, audio processing, and other engines — all from within the app. The right build for your GPU is selected automatically.
- View all engine statuses from the Settings → Update tab
- One-click update or install for each engine
- Auto-check on startup with optional auto-install
- Auto-selects builds matching your GPU (CUDA/OpenVINO/Vulkan/CPU)
Smart Backend Optimization
Automatically selects the optimal GPU backend for your hardware. Detects power source in real-time, balancing performance and battery life on laptops. Works perfectly without any configuration.
On AC Power
GPU-first priority for maximum performance. Fully utilizes CPU resources for the fastest processing.
On Battery
NPU-first power-saving mode. Reduces CPU resource usage to maximize battery life.
- Auto-detects NVIDIA GPU (CUDA), Intel GPU/NPU (OpenVINO), and Vulkan-compatible GPUs to select the optimal backend
- Real-time power source detection, automatically switching between performance and power-saving modes
- Automatic fallback to another backend on GPU errors — always stays stable
- Choose from 4 profiles: Performance / Balanced / Power Saving / Auto
- Manual backend selection also available for advanced users
- Each engine has different supported backends — the app automatically determines the best combination for each
Download Now
High-accuracy transcription, speaker diarization, real-time recognition, and local LLM chat
Download