OpenAI Whisper offers multiple model sizes from tiny to large-v3-turbo. To answer the common question "which model should I use?", here's a thorough comparison of each model's characteristics.

Model Lineup and Specifications

Model	Parameters	Model Size	VRAM Required	Relative Speed
tiny	39M	75MB	~1GB	~10x
base	74M	142MB	~1GB	~7x
small	244M	466MB	~2GB	~4x
medium	769M	1.5GB	~5GB	~2x
large-v2	1550M	2.9GB	~10GB	1x
large-v3	1550M	2.9GB	~10GB	1x
large-v3-turbo	809M	1.6GB	~6GB	~8x

Relative speed is approximate, with large-v2 as baseline (1x).

Model Characteristics

tiny / base

Use case: Testing, prototypes, real-time on low-spec PCs

Lightest and fastest but lower accuracy. Noticeable errors with some languages. Use for testing or when speed matters more than accuracy.

small

Use case: Everyday notes, casual transcription

A balanced model that runs on GPUs with ~2GB VRAM. Provides sufficient accuracy for everyday use.

medium

Use case: Business documents, meeting minutes (medium quality)

Delivers practical accuracy for most use cases. Runs on mid-range GPUs with ~5GB VRAM.

large-v2 / large-v3

Use case: High-quality transcription, accuracy-critical scenarios

The largest models offering the highest accuracy. large-v3 improves multilingual performance over v2. Requires 10GB+ VRAM GPUs (RTX 3080 or higher recommended).

large-v3-turbo (Recommended)

Use case: Optimal for nearly all scenarios

Released in October 2024, this is a distilled model that reduces large-v3's decoder from 32 layers to 4:

Accuracy: Approaching large-v3 (minor quality degradation)
Speed: ~8x faster than large-v3
VRAM: ~6GB (60% of large-v3)
Size: 1.6GB (55% of large-v3)

Recommended Models by Use Case

Use Case	Recommended	Reason
Meeting minutes / business	large-v3-turbo	High accuracy + fast
Video subtitles	large-v3-turbo / large-v3	Accuracy-first
Real-time transcription	large-v3-turbo / small	Speed-accuracy balance
Batch processing many files	large-v3-turbo	Fast processing saves time
GPU with 4GB or less VRAM	small	Largest model that fits
CPU only (no GPU)	tiny / base	Minimize processing time
Testing / development	tiny	Fastest results

GPU VRAM Guide

GPU	VRAM	Recommended Model
GTX 1650 (4GB) / RTX 3050 (8GB)	4-8GB	small to medium
RTX 4060 (8GB) / RTX 3060 (12GB)	8-12GB	large-v3-turbo
RTX 3070 / RTX 4070	8-12GB	large-v3-turbo / large-v3
RTX 3080 / RTX 4080+	10-16GB	large-v3

WhisperApp auto-detects your GPU and selects the optimal backend (CUDA, OpenVINO, Vulkan). Model downloads are available with one click from within the app.

Conclusion

When in doubt, choose large-v3-turbo. It offers the best accuracy-speed balance for virtually every use case.

Use small when VRAM is limited, and large-v3 when maximum accuracy is essential. Matching the model to your needs ensures the best results every time.

Whisper Model Comparison: Speed, Accuracy, and VRAM from tiny to large-v3-turbo

Model Lineup and Specifications

Model Characteristics

tiny / base

small

medium

large-v2 / large-v3

large-v3-turbo (Recommended)

Recommended Models by Use Case

GPU VRAM Guide

Conclusion

Turn speech into text.

Related Articles

About Antivirus Software Warnings — WhisperApp Safe Install Guide

What Is Moonshine Voice ASR? The Edge-First Alternative to Whisper Explained

Transcription Privacy Risks: Cloud vs. Local Processing Compared