快速开始

首先，克隆仓库：

git clone https://github.com/ggerganov/whisper.cpp.git

然后，下载一个已转换为 ggml 格式的 Whisper 模型。例如：

sh ./models/download-ggml-model.sh base.en

现在构建主示例并转录音频文件如下：

# build the main example
make

# transcribe an audio file
./main -f samples/jfk.wav

要快速演示，只需运行 ‘make base.en‘：

$ make base.en

cc -I. -O3 -std=c11 -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o
c++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp whisper.o ggml.o -o main -framework Accelerate
./main -h

usage: ./main [options] file0.wav file1.wav …

options:
-h, –help [default] show this help message and exit
-t N, –threads N [4 ] number of threads to use during computation
-p N, –processors N [1 ] number of processors to use during computation
-ot N, –offset-t N [0 ] time offset in milliseconds
-on N, –offset-n N [0 ] segment index offset
-d N, –duration N [0 ] duration of audio to process in milliseconds
-mc N, –max-context N [-1 ] maximum number of text context tokens to store
-ml N, –max-len N [0 ] maximum segment length in characters
-sow, –split-on-word [false ] split on word rather than on token
-bo N, –best-of N [5 ] number of best candidates to keep
-bs N, –beam-size N [5 ] beam size for beam search
-wt N, –word-thold N [0.01 ] word timestamp probability threshold
-et N, –entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, –logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, –debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, –translate [false ] translate from source language to english
-di, –diarize [false ] stereo audio diarization
-tdrz, –tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, –no-fallback [false ] do not use temperature fallback while decoding
-otxt, –output-txt [false ] output result in a text file
-ovtt, –output-vtt [false ] output result in a vtt file
-osrt, –output-srt [false ] output result in a srt file
-olrc, –output-lrc [false ] output result in a lrc file
-owts, –output-words [false ] output script for generating karaoke video
-fp, –font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, –output-csv [false ] output result in a CSV file
-oj, –output-json [false ] output result in a JSON file
-ojf, –output-json-full [false ] include more information in the JSON file
-of FNAME, –output-file FNAME [ ] output file path (without file extension)
-ps, –print-special [false ] print special tokens
-pc, –print-colors [false ] print colors
-pp, –print-progress [false ] print progress
-nt, –no-timestamps [false ] do not print timestamps
-l LANG, –language LANG [en ] spoken language (‘auto’ for auto-detect)
-dl, –detect-language [false ] exit after automatically detecting language
–prompt PROMPT [ ] initial prompt
-m FNAME, –model FNAME [models/ggml-base.en.bin] model path
-f FNAME, –file FNAME [ ] input WAV file path
-oved D, –ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, –log-score [false ] log best decoder scores of tokens
-ng, –no-gpu [false ] disable GPU

sh ./models/download-ggml-model.sh base.en
Downloading ggml model base.en …
ggml-base.en.bin 100%[========================>] 141.11M 6.34MB/s in 24s
Done! Model ‘base.en’ saved in ‘models/ggml-base.en.bin’
You can now use it like this:

$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav

===============================================

Running base.en on all samples in ./samples …

[+] Running base.en on samples/jfk.wav … (run ‘ffplay samples/jfk.wav’ to listen)

whisper_init_from_file: loading model from ‘models/ggml-base.en.bin’
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 215.00 MB (+ 6.00 MB per decoder)
whisper_model_load: kv self size = 5.25 MB
whisper_model_load: kv cross size = 17.58 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: processing ‘samples/jfk.wav’ (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 …

[00:00:00.000 –> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 113.81 ms
whisper_print_timings: mel time = 15.40 ms
whisper_print_timings: sample time = 11.58 ms / 27 runs ( 0.43 ms per run)
whisper_print_timings: encode time = 266.60 ms / 1 runs ( 266.60 ms per run)
whisper_print_timings: decode time = 66.11 ms / 27 runs ( 2.45 ms per run)
whisper_print_timings: total time = 476.31 ms
该命令下载转换为自定义 ggml 格式的 base.en 模型，并对文件夹 samples 中的所有.wav 样本执行推理。

如需详细使用说明，请运行：./main -h

请注意，当前主要示例仅支持 16 位 WAV 文件运行，因此在使用该工具前，请确保已对输入文件进行转换。例如，您可以这样使用 ffmpeg：

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
更多音频样本
如需获取更多音频样本进行尝试，只需运行：

make samples
这将从维基百科下载更多音频文件，并通过 ffmpeg 将其转换为 16 位 WAV 格式。

您可以按如下方式下载并运行其他模型：

make tiny.en
make tiny
make base.en
make base
make small.en
make small
make medium.en
make medium
make large-v1
make large-v2
make large-v3
make large-v3-turbo
内存使用情况
模型
磁盘
记忆
微小
75 兆字节
~273 MB
基础
142 MiB
约 388 MB
小
466 MiB
~852 MB
中等
1.5 GiB
~2.1 GB
大
2.9 GiB
约 3.9 GB
量化
Whisper.cpp 支持对 Whisper ggml 模型进行整数量化。量化后的模型所需内存和磁盘空间更少，且根据硬件条件，处理效率可能更高。

以下是创建和使用量化模型的步骤：

quantize a model with Q5_0 method

make quantize
./quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0

run the examples as usual, specifying the quantized model file

./main -m models/ggml-base.en-q5_0.bin ./samples/gb0.wav
Core ML 支持
在 Apple Silicon 设备上，编码器推理可以通过 Core ML 在 Apple 神经引擎（ANE）上执行。这能带来显著的加速效果 —— 相较于仅使用 CPU 执行，速度提升超过 3 倍。以下是生成 Core ML 模型并将其与 whisper.cpp 配合使用的指南：

安装创建 Core ML 模型所需的 Python 依赖项：

pip install ane_transformers
pip install openai-whisper
pip install coremltools
为确保 coremltools 正常运行，请确认已安装 Xcode，并执行 xcode-select ——install 以安装命令行工具。
推荐使用 Python 3.10。
推荐使用 MacOS Sonoma（版本 14）或更新版本，较旧的 MacOS 版本可能会出现转录幻觉问题。
[可选] 建议使用 Python 版本管理系统，例如对于此步骤：
要创建一个环境，请使用：conda create -n py310-whisper python=3.10 -y
要激活环境，请使用：conda activate py310-whisper
生成一个 Core ML 模型。例如，要生成 base.en 模型，请使用：

./models/generate-coreml-model.sh base.en
这将生成文件夹 models/ggml-base.en-encoder.mlmodelc。

构建支持 Core ML 的 whisper.cpp：

using Makefile

make clean
WHISPER_COREML=1 make -j

using CMake

cmake -B build -DWHISPER_COREML=1
cmake –build build -j –config Release
像往常一样运行示例。例如：

$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav

…

whisper_init_state: loading Core ML model from ‘models/ggml-base.en-encoder.mlmodelc’
whisper_init_state: first run on a device may take a while …
whisper_init_state: Core ML model loaded

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 |

…
设备上的首次运行速度较慢，因为 ANE 服务需将 Core ML 模型编译为特定于设备的格式。后续运行则会加快速度。

有关 Core ML 实现的更多信息，请参考 PR #566。

OpenVINO 支持
在支持 OpenVINO 的平台上，编码器推理可以在包括 x86 CPU 和英特尔 GPU（集成与独立）在内的 OpenVINO 兼容设备上执行。

这将显著提升编码器性能。以下是生成 OpenVINO 模型并配合 whisper.cpp 使用的指导步骤：

首先，设置 Python 虚拟环境并安装 Python 依赖项。推荐使用 Python 3.10。

Windows：

cd models
python -m venv openvino_conv_env
openvino_conv_env\Scripts\activate
python -m pip install –upgrade pip
pip install -r requirements-openvino.txt
Linux 和 macOS：

cd models
python3 -m venv openvino_conv_env
source openvino_conv_env/bin/activate
python -m pip install –upgrade pip
pip install -r requirements-openvino.txt
生成一个 OpenVINO 编码器模型。例如，要生成一个 base.en 模型，请使用：

python convert-whisper-to-openvino.py –model base.en
这将生成 ggml-base.en-encoder-openvino.xml/.bin 的 IR 模型文件。建议将这些文件移至与 ggml 模型相同的文件夹中，因为这是 OpenVINO 扩展在运行时默认搜索的位置。

构建支持 OpenVINO 的 whisper.cpp：

从发布页面下载 OpenVINO 软件包。推荐使用的版本是 2023.0.0。

下载并解压软件包至您的开发系统后，通过执行 setupvars 脚本来设置所需环境。例如：

Linux：

source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
Windows（cmd）：

C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
然后使用 cmake 构建项目：

cmake -B build -DWHISPER_OPENVINO=1
cmake –build build -j –config Release
像往常一样运行示例。例如：

$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav

…

whisper_ctx_init_openvino_encoder: loading OpenVINO model from ‘models/ggml-base.en-encoder-openvino.xml’
whisper_ctx_init_openvino_encoder: first run on a device may take a while …
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = GPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
whisper_ctx_init_openvino_encoder: OpenVINO model loaded

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |

…
首次在 OpenVINO 设备上运行时速度较慢，因为 OpenVINO 框架会将 IR（中间表示）模型编译为设备特定的 “blob”。这个设备特定的 blob 会被缓存下来，供下次运行使用。

有关 Core ML 实现的更多信息，请参阅 PR #1037。

NVIDIA GPU 支持
使用 NVIDIA 显卡，通过 cuBLAS 和自定义 CUDA 内核，模型处理在 GPU 上得以高效执行。首先，请确保已安装 CUDA：https://developer.nvidia.com/cuda-downloads

现在构建带有 CUDA 支持的 whisper.cpp：

make clean
GGML_CUDA=1 make -j
通过 OpenBLAS 实现 BLAS CPU 支持
通过 OpenBLAS 可以在 CPU 上加速编码器处理。首先，请确保已安装 OpenBLAS：https://www.openblas.net/

现在构建支持 OpenBLAS 的 whisper.cpp：

make clean
GGML_OPENBLAS=1 make -j
通过英特尔 MKL 实现的 BLAS CPU 支持
通过英特尔数学核心库（MKL）的 BLAS 兼容接口，可以在 CPU 上加速编码器处理。首先，请确保已安装英特尔的 MKL 运行时和开发包：https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html

现在构建带有 Intel MKL BLAS 支持的 whisper.cpp：

source /opt/intel/oneapi/setvars.sh
mkdir build
cd build
cmake -DWHISPER_MKL=ON ..
WHISPER_MKL=1 make -j
昇腾 NPU 支持
Ascend NPU 通过 CANN 和 AI 核心提供推理加速。

首先，确认您的 Ascend NPU 设备是否受支持：

已验证设备

昇腾 NPU
状态
Atlas 300T A2
支持
然后，确保您已安装 CANN 工具包。建议使用最新版本的 CANN。

现在构建支持 CANN 的 whisper.cpp：

mkdir build
cd build
cmake .. -D GGML_CANN=on
make -j
像往常一样运行推理示例，例如：

./build/bin/main -f samples/jfk.wav -m models/ggml-base.en.bin -t 8
笔记：

若在使用昇腾 NPU 设备时遇到问题，请创建带有 [CANN] 前缀 / 标签的议题。
若您的昇腾 NPU 设备成功运行，请协助更新已验证设备表格。
Docker
Prerequisites
您的系统上必须安装并运行 Docker。
创建一个文件夹来存储大型模型和中间文件（例如：/whisper/models）
Images
我们为该项目提供了两个 Docker 镜像：

Ghcr.io/ggerganov/whisper.cpp:main: 该镜像包含主可执行文件以及 curl 和 ffmpeg 工具。（支持平台：linux/amd64，linux/arm64）
Ghcr.io/ggerganov/whisper.cpp:main-cuda: 与 main 相同，但编译时支持 CUDA。（平台：linux/amd64）
Usage

download model and persist it in a local folder

docker run -it –rm
-v path/to/models:/models
whisper.cpp:main “./models/download-ggml-model.sh base /models”

transcribe an audio file

docker run -it –rm
-v path/to/models:/models
-v path/to/audios:/audios
whisper.cpp:main “./main -m /models/ggml-base.bin -f /audios/jfk.wav”

transcribe an audio file in samples folder

docker run -it –rm
-v path/to/models:/models
whisper.cpp:main “./main -m /models/ggml-base.bin -f ./samples/jfk.wav”

作者：Jeebiz 创建时间：2025-11-04 13:55
最后编辑：Jeebiz 更新时间：2025-11-11 17:29