Currently preparing for launch — sales coming soon

Features

What you can do with WhisperApp — a walkthrough of each feature

Everything Works Offline

Every feature in WhisperApp runs without internet. Your audio data never leaves your machine, ensuring complete privacy. Confidential meeting recordings and interviews can be processed with peace of mind.

No data sent to serversFully local processingOffline license verification

File Transcription

Drag & drop audio or video files and convert them to accurate text with one click. Handle meeting recordings, interviews, lectures, and more.

1Drag & drop audio/video files onto the window (folders work too)
2Select a Whisper model (tiny to large-v3-turbo) and language (10 languages + auto-detect)
3Choose an output format (TXT/SRT/VTT/JSON/CSV/LRC) and click Run
4Track progress in real-time. Right-click completed files to open results instantly
  • Auto-selects the best of 4 backends: CUDA / OpenVINO / Vulkan / CPU
  • Batch parallel processing for multiple files simultaneously
  • Loop detection & watchdog for automatic error recovery
  • Open completed tasks directly in the editor or LLM

Real-time Transcription

Transcribe microphone or system audio in real-time while recording. No need to take notes during meetings and interviews — the text appears as you speak.

1Open the Recorder window and select microphone or system audio
2Turn on "Real-time transcription" and set the model and language
3Start recording — spoken words appear in the segment list in real-time
4"Copy All" to clipboard, or export as TXT/SRT
  • Simultaneous recording + transcription — get both the audio file and text
  • System audio capture (whole system / specific app) for real-time recognition (Pro)
  • Level meter for real-time input monitoring

Speaker DiarizationPro

Automatically identify who said what in multi-speaker audio. Essential for creating meeting minutes and interview transcripts.

1Enable the "Speaker Diarization" checkbox on the transcription screen
2Set speaker count (auto-detect, or manually specify from 2 to 10)
3Run transcription — speaker tags are automatically assigned to each segment
4Open results in the editor to refine speaker tags and merge/split as needed
  • High-accuracy speaker separation powered by a dedicated engine
  • Auto-detect or manually specify speaker count (2–10) per file
  • 24-color palette for visual speaker identification in the editor and audio bar
  • Add, rename, delete speaker tags, and batch-change speakers across segments

Editor

A dedicated editing tool for efficiently finishing transcription results. Works with or without speaker diarization. Features per-segment audio playback, a fully keyboard-driven workflow, and auto-recovery. Dramatically speeds up proofreading for meeting minutes and interview records.

1After transcription, right-click → "Edit" to open the editor (supports 6 formats: SRT/VTT/JSON/CSV/LRC/TXT)
2Select a segment and play audio with Shift+Space while correcting text (press E to start editing instantly)
3Split (D), merge (M), adjust timestamps (T), or change speaker tags (S) as needed
4Save with Ctrl+S. All operations support undo (Ctrl+Z), and editing state is auto-saved continuously
  • Fully keyboard-driven (E=edit, D=split, M=merge, T=timing, S=speaker, Space=play, arrows=navigate)
  • Auto-recovery: editing state is restored even after a crash or unexpected app close
  • Speaker color-coded audio bar — click to seek, visually grasp the overall structure
  • Variable speed playback (0.5x–2.0x) — speed audio files pre-generated in the background
  • Multi-tab support — edit multiple files with independent state simultaneously
  • Undo/redo for all 11 operation types (text edit, speaker change, split, merge, delete, insert, timing, etc.)
  • Overlap detection during timestamp adjustment — auto-suggests resolution strategies

3-Mode Recording & Download

Capture audio via microphone, PC system audio, or YouTube/URL download. Recordings are automatically added to the transcription queue for a seamless record-to-text workflow.

Microphone Recording

Select your device and start recording. Save in 6 formats: WAV/FLAC/MP3/AAC/OGG/OPUS. Can be used alongside real-time transcription.

System Audio Capture (Pro)Pro

Capture audio output from your PC. Record the entire system audio, or specify a particular application (Zoom, Teams, etc.) to capture only that app's audio.

YouTube/URL Download (Pro)Pro

Paste a YouTube or other video URL to download. Supports audio-only extraction or video with quality selection. Downloads are auto-added to the transcription queue.

  • Auto-add recordings to the file list after completion
  • Level meter and recording timer for real-time monitoring

Local LLM (AI Analysis & Summarization)Pro

Local AI chat running entirely on your machine. Load transcription results and ask "Summarize this" or "What are the key points?" — the AI analyzes the content locally. Confidential data stays safe.

1Open the Summarize window and add transcription files (or open directly from the completed list)
2Select a prompt style (summary, meeting minutes, translation — or create your own)
3Click "Summarize" or type any question in the chat and send
4Review AI responses. Conversations are auto-saved to history for later retrieval and export
  • Streaming responses with real-time AI output
  • Create, edit, and manage custom prompt templates
  • Save, rename, restore, and export conversation history
  • Adjust context size and control the LLM server from the GUI

Video Subtitle GenerationPro

Add subtitles to videos using transcription results. Supports both hardcoded (burned-in) and soft subtitles (as a track). Subtitle files can also be used for YouTube uploads.

1Open the file in the editor and correct text as needed
2Click the "Subtitle" button and select the target video file
3Choose burn-in (hard sub) or embed (soft sub) and configure style
4Click "Run" to generate the subtitled video
  • Hard sub: burned into the video (subtitles always visible)
  • Soft sub: embedded as a track (can be toggled on/off)
  • Customize font, size, color, and position

Smartphone IntegrationPro

Pair with WhisperApp for Android (free) to record on your phone and leverage your PC's GPU for fast, high-accuracy transcription. Perfect for field recordings processed at your desk. (Currently in development — coming in a future update)

1Start the API server from PC settings
2Display the QR code and scan it with WhisperApp for Android
3Record on your phone and send to PC for transcription
4View results on your phone in real-time
  • Local Wi-Fi communication — no internet needed
  • WebSocket for real-time progress updates on your phone
  • Leverage PC GPU (CUDA/OpenVINO/Vulkan) from your smartphone

Model Management & ModelHub

Freely choose and manage AI models for speech recognition and LLM. Beginners get recommended models, while advanced users can add any model from HuggingFace. Auto-detected GPU/VRAM info helps you check hardware compatibility before downloading.

1Launch ModelHub (the bundled model manager) and select the ASR or LLM tab
2Pick a model with the "Recommended" badge, or choose a quantization variant (Q4/Q5/Q8/F16) for your preferred quality-size balance
3Click "Download" — progress, speed, and estimated time are shown in real-time
4Launch WhisperApp — downloaded models are auto-detected and available in the dropdown
  • Recommended models: optimal quantization per model size, marked with a badge for easy selection
  • Custom models: search HuggingFace or enter a direct URL to add any model (fine-tuned, etc.)
  • GPU/VRAM auto-detection: hardware info automatically detected and displayed in status bar
  • LLM models include VRAM/RAM requirements for easy compatibility checking
  • 6 ASR models with quantization variants + 12+ LLM series — extensive model library
  • Up to 3 concurrent downloads, with full download management and deletion

Automatic Engine Updates

Check and install updates for transcription, LLM, audio processing, and other engines — all from within the app. The right build for your GPU is selected automatically.

  • View all engine statuses from the Settings → Update tab
  • One-click update or install for each engine
  • Auto-check on startup with optional auto-install
  • Auto-selects builds matching your GPU (CUDA/OpenVINO/Vulkan/CPU)

Smart Backend Optimization

Automatically selects the optimal GPU backend for your hardware. Detects power source in real-time, balancing performance and battery life on laptops. Works perfectly without any configuration.

On AC Power

GPU-first priority for maximum performance. Fully utilizes CPU resources for the fastest processing.

On Battery

NPU-first power-saving mode. Reduces CPU resource usage to maximize battery life.

  • Auto-detects NVIDIA GPU (CUDA), Intel GPU/NPU (OpenVINO), and Vulkan-compatible GPUs to select the optimal backend
  • Real-time power source detection, automatically switching between performance and power-saving modes
  • Automatic fallback to another backend on GPU errors — always stays stable
  • Choose from 4 profiles: Performance / Balanced / Power Saving / Auto
  • Manual backend selection also available for advanced users
  • Each engine has different supported backends — the app automatically determines the best combination for each

Download Now

High-accuracy transcription, speaker diarization, real-time recognition, and local LLM chat

Download