Features

What you can do with WhisperApp — a walkthrough of each feature

Everything Works Offline

Every feature in WhisperApp runs without internet. Your audio data never leaves your machine, ensuring complete privacy. Confidential meeting recordings and interviews can be processed with peace of mind.

No data sent to serversFully local processingOffline license verification

File Transcription

Drag & drop audio or video files and convert them to accurate text with one click. Handle meeting recordings, interviews, lectures, and more.

1Drag & drop audio/video files onto the window (folders work too)

2Select a Whisper model (tiny to large-v3-turbo) and language (10 languages + auto-detect)

3Choose an output format (TXT/SRT/VTT/JSON/CSV/LRC) and click Run

4Track progress in real-time. Right-click completed files to open results instantly

Auto-selects the best of 4 backends: CUDA / OpenVINO / Vulkan / CPU
Batch parallel processing for multiple files simultaneously
Loop detection & watchdog for automatic error recovery
Open completed tasks directly in the editor or LLM

Real-time Transcription

Transcribe microphone or system audio in real-time while recording. No need to take notes during meetings and interviews — the text appears as you speak.

1Open the Recorder window and select microphone or system audio

2Turn on "Real-time transcription" and set the model and language

3Start recording — spoken words appear in the segment list in real-time

4"Copy All" to clipboard, or export as TXT/SRT

Simultaneous recording + transcription — get both the audio file and text
System audio capture (whole system / specific app) for real-time recognition (Pro)
Level meter for real-time input monitoring

Speaker DiarizationPro

Automatically identify who said what in multi-speaker audio. Essential for creating meeting minutes and interview transcripts.

1Enable the "Speaker Diarization" checkbox on the transcription screen

2Set speaker count (auto-detect, or manually specify from 2 to 10)

3Run transcription — speaker tags are automatically assigned to each segment

4Open results in the editor to refine speaker tags and merge/split as needed

High-accuracy speaker separation powered by a dedicated engine
Auto-detect or manually specify speaker count (2–10) per file
24-color palette for visual speaker identification in the editor and audio bar
Add, rename, delete speaker tags, and batch-change speakers across segments

Editor

A dedicated editing tool for efficiently finishing transcription results. Works with or without speaker diarization. Features per-segment audio playback, a fully keyboard-driven workflow, and auto-recovery. Dramatically speeds up proofreading for meeting minutes and interview records.

1After transcription, right-click → "Edit" to open the editor (supports 6 formats: SRT/VTT/JSON/CSV/LRC/TXT)

2Select a segment and play audio with Shift+Space while correcting text (press E to start editing instantly)

3Split (D), merge (M), adjust timestamps (T), or change speaker tags (S) as needed

4Save with Ctrl+S. All operations support undo (Ctrl+Z), and editing state is auto-saved continuously

Fully keyboard-driven (E=edit, D=split, M=merge, T=timing, S=speaker, Space=play, arrows=navigate)
Auto-recovery: editing state is restored even after a crash or unexpected app close
Speaker color-coded audio bar — click to seek, visually grasp the overall structure
Variable speed playback (0.5x–2.0x) — speed audio files pre-generated in the background
Multi-tab support — edit multiple files with independent state simultaneously
Undo/redo for all 11 operation types (text edit, speaker change, split, merge, delete, insert, timing, etc.)
Overlap detection during timestamp adjustment — auto-suggests resolution strategies

3-Mode Recording & Download

Capture audio via microphone, PC system audio, or YouTube/URL download. Recordings are automatically added to the transcription queue for a seamless record-to-text workflow.

Microphone Recording

Select your device and start recording. Save in 6 formats: WAV/FLAC/MP3/AAC/OGG/OPUS. Can be used alongside real-time transcription.

System Audio Capture (Pro)Pro

Capture audio output from your PC. Record the entire system audio, or specify a particular application (Zoom, Teams, etc.) to capture only that app's audio.

YouTube/URL Download (Pro)Pro

Paste a YouTube or other video URL to download. Supports audio-only extraction or video with quality selection. Downloads are auto-added to the transcription queue.

Auto-add recordings to the file list after completion
Level meter and recording timer for real-time monitoring

Local LLM (AI Analysis & Summarization)Pro

Local AI chat running entirely on your machine. Load transcription results and ask "Summarize this" or "What are the key points?" — the AI analyzes the content locally. Confidential data stays safe.

1Open the Summarize window and add transcription files (or open directly from the completed list)

2Select a prompt style (summary, meeting minutes, translation — or create your own)

3Click "Summarize" or type any question in the chat and send

4Review AI responses. Conversations are auto-saved to history for later retrieval and export

Streaming responses with real-time AI output
Create, edit, and manage custom prompt templates
Save, rename, restore, and export conversation history
Adjust context size and control the LLM server from the GUI

Video Subtitle GenerationPro

Add subtitles to videos using transcription results. Supports both hardcoded (burned-in) and soft subtitles (as a track). Subtitle files can also be used for YouTube uploads.

1Open the file in the editor and correct text as needed

2Click the "Subtitle" button and select the target video file

3Choose burn-in (hard sub) or embed (soft sub) and configure style

4Click "Run" to generate the subtitled video

Hard sub: burned into the video (subtitles always visible)
Soft sub: embedded as a track (can be toggled on/off)
Customize font, size, color, and position

Smartphone IntegrationPro

Pair with WhisperApp for Android (free) to record on your phone and leverage your PC's GPU for fast, high-accuracy transcription. Perfect for field recordings processed at your desk. (Currently in development — coming in a future update)

1Start the API server from PC settings

2Display the QR code and scan it with WhisperApp for Android

3Record on your phone and send to PC for transcription

4View results on your phone in real-time

Local Wi-Fi communication — no internet needed
WebSocket for real-time progress updates on your phone
Leverage PC GPU (CUDA/OpenVINO/Vulkan) from your smartphone

Model Management & ModelHub

Freely choose and manage AI models for speech recognition and LLM. Beginners get recommended models, while advanced users can add any model from HuggingFace. Auto-detected GPU/VRAM info helps you check hardware compatibility before downloading.

1Launch ModelHub (the bundled model manager) and select the ASR or LLM tab

2Pick a model with the "Recommended" badge, or choose a quantization variant (Q4/Q5/Q8/F16) for your preferred quality-size balance

3Click "Download" — progress, speed, and estimated time are shown in real-time

4Launch WhisperApp — downloaded models are auto-detected and available in the dropdown

Recommended models: optimal quantization per model size, marked with a badge for easy selection
Custom models: search HuggingFace or enter a direct URL to add any model (fine-tuned, etc.)
GPU/VRAM auto-detection: hardware info automatically detected and displayed in status bar
LLM models include VRAM/RAM requirements for easy compatibility checking
6 ASR models with quantization variants + 12+ LLM series — extensive model library
Up to 3 concurrent downloads, with full download management and deletion

Automatic Engine Updates

Check and install updates for transcription, LLM, audio processing, and other engines — all from within the app. The right build for your GPU is selected automatically.

View all engine statuses from the Settings → Update tab
One-click update or install for each engine
Auto-check on startup with optional auto-install
Auto-selects builds matching your GPU (CUDA/OpenVINO/Vulkan/CPU)

Smart Backend Optimization

Automatically selects the optimal GPU backend for your hardware. Detects power source in real-time, balancing performance and battery life on laptops. Works perfectly without any configuration.

On AC Power

GPU-first priority for maximum performance. Fully utilizes CPU resources for the fastest processing.

On Battery

NPU-first power-saving mode. Reduces CPU resource usage to maximize battery life.

Auto-detects NVIDIA GPU (CUDA), Intel GPU/NPU (OpenVINO), and Vulkan-compatible GPUs to select the optimal backend
Real-time power source detection, automatically switching between performance and power-saving modes
Automatic fallback to another backend on GPU errors — always stays stable
Choose from 4 profiles: Performance / Balanced / Power Saving / Auto
Manual backend selection also available for advanced users
Each engine has different supported backends — the app automatically determines the best combination for each

Download Now

High-accuracy transcription, speaker diarization, real-time recognition, and local LLM chat

Download