Development Roadmap

What's next for WhisperApp

Last updated: February 27, 2026

Phase 1 — Desktop App

Windows desktop application

Current

High-accuracy transcription and AI assistant with fully local processing. All processing happens on your PC — no audio data is ever sent externally.

Released

High-accuracy transcription (multi-language support, available languages depend on the model)
Real-time transcription
Speaker diarization (automatic speaker identification)
AI assistant (local LLM chat, summarization, translation)
Video subtitle generation (SRT / VTT formats)
Smart backend optimization (automatic GPU / NPU / CPU selection)
Multiple inference engine support (whisper.cpp / sherpa-onnx)
Bring your own models — Use any Whisper or LLM model you want. Not sure which to pick? Recommended presets get you started instantly

In Development

Mobile app (Android) — Control and review transcriptions from your smartphone via PC connectivity
LLM connection expansion — Connect to other local LLM servers or external APIs in addition to the built-in engine
Transcription history — Browse and reuse past recognition results anytime
LLM transcription correction — Automatically correct misrecognitions and fix improperly segmented text using local LLM
Automatic document generation — AI formats transcription results into meeting minutes and reports, exported as docx and more
LLM web search — AI assistant can reference internet information in responses
Dictionary — Pre-register proper nouns and technical terms to improve transcription accuracy

Phase 2 — CLI & Multi-OS Support

Command-line interface and cross-platform deployment

Access all WhisperApp features from the command line. A UI-independent core engine that integrates into any environment and workflow.

Complete separation of UI and logic — Rebuild business logic as a standalone core library
Batch processing — Process large numbers of audio files via scripts
Pipeline integration — Embed into existing workflows and automation scripts
Linux / macOS support — Cross-platform operation enabled by core engine separation

CLI Usage Example

$ whisperapp transcribe audio.mp3 --diarize --speakers 3
$ whisperapp export meeting.wav --format docx --summarize

Phase 3 — Server Edition / Enterprise Solutions

Enterprise environment deployment

Runs as a headless server for shared use across teams and organizations. Fully contained within your internal network, ensuring confidential audio data never leaves your premises.

Server Features

Admin Web UI — Intuitive browser-based management (similar to router or NAS admin panels)
Multi-user & job queue — Efficiently process requests from multiple users
Multi-GPU scheduling — Automatic resource allocation and load balancing across GPUs
LLM deployment modes — Choose between always-on / idle-stop / on-demand to optimize resource usage
RAG capabilities — AI references internal documents from specified directories for informed responses
CLI admin tool (whisperapp-admin) — Server configuration and monitoring from the command line

Enterprise Deployment

On-premises server deployment support
Pre-installed server — Ready-to-use dedicated hardware with minimal network configuration
Fully air-gapped operation — Data never leaves your internal network

Phase 4 — Cloud Service

SaaS offering

An AI transcription platform accessible from any device, no server setup required. Leveraging major AI provider APIs to minimize infrastructure costs, offered at more affordable prices than competitors.

Cloud-hosted version — Start immediately with a subscription, no server setup or GPU required
Multi-client support — Web browser / mobile app / lightweight desktop client
Managed service — Cloud solution with full operational support for enterprises where server deployment is challenging
GPU cloud environment — Cloud servers for freely running any Whisper model or local LLM

This roadmap reflects our current development direction. Features and timelines are subject to change without notice. Listed features are not guaranteed to be implemented.

Download Now

High-accuracy transcription, speaker diarization, real-time recognition, and local LLM chat

Download