Development Roadmap
What's next for WhisperApp
Last updated: February 27, 2026
Phase 1 — Desktop App
Windows desktop application
High-accuracy transcription and AI assistant with fully local processing. All processing happens on your PC — no audio data is ever sent externally.
Released
- High-accuracy transcription (multi-language support, available languages depend on the model)
- Real-time transcription
- Speaker diarization (automatic speaker identification)
- AI assistant (local LLM chat, summarization, translation)
- Video subtitle generation (SRT / VTT formats)
- Smart backend optimization (automatic GPU / NPU / CPU selection)
- Multiple inference engine support (whisper.cpp / sherpa-onnx)
- Bring your own models — Use any Whisper or LLM model you want. Not sure which to pick? Recommended presets get you started instantly
In Development
- Mobile app (Android) — Control and review transcriptions from your smartphone via PC connectivity
- LLM connection expansion — Connect to other local LLM servers or external APIs in addition to the built-in engine
- Transcription history — Browse and reuse past recognition results anytime
- LLM transcription correction — Automatically correct misrecognitions and fix improperly segmented text using local LLM
- Automatic document generation — AI formats transcription results into meeting minutes and reports, exported as docx and more
- LLM web search — AI assistant can reference internet information in responses
- Dictionary — Pre-register proper nouns and technical terms to improve transcription accuracy
Phase 2 — CLI & Multi-OS Support
Command-line interface and cross-platform deployment
Access all WhisperApp features from the command line. A UI-independent core engine that integrates into any environment and workflow.
- Complete separation of UI and logic — Rebuild business logic as a standalone core library
- Batch processing — Process large numbers of audio files via scripts
- Pipeline integration — Embed into existing workflows and automation scripts
- Linux / macOS support — Cross-platform operation enabled by core engine separation
CLI Usage Example
$ whisperapp transcribe audio.mp3 --diarize --speakers 3
$ whisperapp export meeting.wav --format docx --summarizePhase 3 — Server Edition / Enterprise Solutions
Enterprise environment deployment
Runs as a headless server for shared use across teams and organizations. Fully contained within your internal network, ensuring confidential audio data never leaves your premises.
Server Features
- Admin Web UI — Intuitive browser-based management (similar to router or NAS admin panels)
- Multi-user & job queue — Efficiently process requests from multiple users
- Multi-GPU scheduling — Automatic resource allocation and load balancing across GPUs
- LLM deployment modes — Choose between always-on / idle-stop / on-demand to optimize resource usage
- RAG capabilities — AI references internal documents from specified directories for informed responses
- CLI admin tool (whisperapp-admin) — Server configuration and monitoring from the command line
Enterprise Deployment
- On-premises server deployment support
- Pre-installed server — Ready-to-use dedicated hardware with minimal network configuration
- Fully air-gapped operation — Data never leaves your internal network
Phase 4 — Cloud Service
SaaS offering
An AI transcription platform accessible from any device, no server setup required. Leveraging major AI provider APIs to minimize infrastructure costs, offered at more affordable prices than competitors.
- Cloud-hosted version — Start immediately with a subscription, no server setup or GPU required
- Multi-client support — Web browser / mobile app / lightweight desktop client
- Managed service — Cloud solution with full operational support for enterprises where server deployment is challenging
- GPU cloud environment — Cloud servers for freely running any Whisper model or local LLM
This roadmap reflects our current development direction. Features and timelines are subject to change without notice. Listed features are not guaranteed to be implemented.
Download Now
High-accuracy transcription, speaker diarization, real-time recognition, and local LLM chat
Download