Whisper Model Comparison: Speed, Accuracy, and VRAM from tiny to large-v3-turbo

WhisperApp TeamPublished: March 3, 2026Reading time 3min
XFacebook

OpenAI Whisper offers multiple model sizes from tiny to large-v3-turbo. To answer the common question "which model should I use?", here's a thorough comparison of each model's characteristics.

Model Lineup and Specifications

Model Parameters Model Size VRAM Required Relative Speed
tiny 39M 75MB ~1GB 32x
base 74M 142MB ~1GB 16x
small 244M 466MB ~2GB 6x
medium 769M 1.5GB ~5GB 2x
large-v2 1550M 2.9GB ~10GB 1x
large-v3 1550M 2.9GB ~10GB 1x
large-v3-turbo 809M 1.6GB ~6GB 6x

Relative speed is approximate, with large-v2 as baseline (1x).

Model Characteristics

tiny / base

Use case: Testing, prototypes, real-time on low-spec PCs

Lightest and fastest but lower accuracy. Noticeable errors with some languages. Use for testing or when speed matters more than accuracy.

small

Use case: Everyday notes, casual transcription

A balanced model that runs on GPUs with ~2GB VRAM. Provides sufficient accuracy for everyday use.

medium

Use case: Business documents, meeting minutes (medium quality)

Delivers practical accuracy for most use cases. Runs on mid-range GPUs with ~5GB VRAM.

large-v2 / large-v3

Use case: High-quality transcription, accuracy-critical scenarios

The largest models offering the highest accuracy. large-v3 improves multilingual performance over v2. Requires 10GB+ VRAM GPUs (RTX 3080 or higher recommended).

Use case: Optimal for nearly all scenarios

Released in October 2024, this is a distilled model that reduces large-v3's decoder from 32 layers to 4:

  • Accuracy: Nearly identical to large-v3 (less than 1% WER difference)
  • Speed: ~6x faster than large-v3
  • VRAM: ~6GB (60% of large-v3)
  • Size: 1.6GB (55% of large-v3)
Use Case Recommended Reason
Meeting minutes / business large-v3-turbo High accuracy + fast
Video subtitles large-v3-turbo / large-v3 Accuracy-first
Real-time transcription large-v3-turbo / small Speed-accuracy balance
Batch processing many files large-v3-turbo Fast processing saves time
GPU with 4GB or less VRAM small Largest model that fits
CPU only (no GPU) tiny / base Minimize processing time
Testing / development tiny Fastest results

GPU VRAM Guide

GPU VRAM Recommended Model
GTX 1650 / RTX 3050 4GB small
RTX 3060 / RTX 4060 8GB large-v3-turbo
RTX 3070 / RTX 4070 8-12GB large-v3-turbo / large-v3
RTX 3080 / RTX 4080+ 10-16GB large-v3

WhisperApp auto-detects your GPU and selects the optimal backend (CUDA, OpenVINO, Vulkan). Model downloads are available with one click from within the app.

Conclusion

When in doubt, choose large-v3-turbo. It offers the best accuracy-speed balance for virtually every use case.

Use small when VRAM is limited, and large-v3 when maximum accuracy is essential. Matching the model to your needs ensures the best results every time.

Turn speech into text.

WhisperApp runs high-accuracy AI transcription locally on your PC. Transcribe meetings, interviews, and videos while keeping your data private.

7-day free trial — no credit card required

Related Articles