Llama.cpp 简介

llama.cpp 的主要目标是在各种硬件(本地和云端)上以最少的设置和最先进的性能实现 LLM 推理。
- 纯 C/C++ 实现,无任何依赖项
 - Apple 芯片是一流的——通过 ARM NEON、Accelerate 和 Metal 框架进行了优化
 - AVX、AVX2、AVX512 和 AMX 支持 x86 架构
 - 1.5 位、2 位、3 位、4 位、5 位、6 位和 8 位整数量化,可加快推理速度并减少内存使用
 - 用于在 NVIDIA GPU 上运行 LLM 的自定义 CUDA 内核(通过 HIP 支持 AMD GPU,通过 MUSA 支持 Moore Threads MTT GPU)
 - Vulkan 和 SYCL 后端支持
 - CPU+GPU 混合推理,部分加速大于 VRAM 总容量的模型
 
支持的型号:
通常也支持以下基础模型的微调。
- LLaMA 🦙
 - LLaMA 2 🦙🦙
 - LLaMA 3 🦙🦙🦙
 - Mistral 7B
 - Mixtral MoE
 - DBRX
 - Falcon
 - Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2
 - Vigogne (French)
 - BERT
 - Koala
 - Baichuan 1 & 2 + derivations
 - Aquila 1 & 2
 - Starcoder models
 - Refact
 - MPT
 - Bloom
 - Yi models
 - StableLM models
 - Deepseek models
 - Qwen models
 - PLaMo-13B
 - Phi models
 - GPT-2
 - Orion 14B
 - InternLM2
 - CodeShell
 - Gemma
 - Mamba
 - Grok-1
 - Xverse
 - Command-R models
 - SEA-LION
 - GritLM-7B + GritLM-8x7B
 - OLMo
 - OLMo 2
 - OLMoE
 - Granite models
 - GPT-NeoX + Pythia
 - Snowflake-Arctic MoE
 - Smaug
 - Poro 34B
 - Bitnet b1.58 models
 - Flan T5
 - Open Elm models
 - ChatGLM3-6b + ChatGLM4-9b
 - SmolLM
 - EXAONE-3.0-7.8B-Instruct
 - FalconMamba Models
 - Jais
 - Bielik-11B-v2.3
 - RWKV-6
 
(支持更多模型的说明: HOWTO-add-model.md)
多模态模型:
- LLaVA 1.5 models, LLaVA 1.6 models
 - BakLLaVA
 - Obsidian
 - ShareGPT4V
 - MobileVLM 1.7B/3B models
 - Yi-VL
 - Mini CPM
 - Moondream
 - Bunny
 
Bindings:
- Python: abetlen/llama-cpp-python
 - Go: go-skynet/go-llama.cpp
 - Node.js: withcatai/node-llama-cpp
 - JS/TS (llama.cpp server client): lgrammel/modelfusion
 - JS/TS (Programmable Prompt Engine CLI): offline-ai/cli
 - JavaScript/Wasm (works in browser): tangledgroup/llama-cpp-wasm
 - Typescript/Wasm (nicer API, available on npm): ngxson/wllama
 - Ruby: yoshoku/llama_cpp.rb
 - Rust (more features): edgenai/llama_cpp-rs
 - Rust (nicer API): mdrokz/rust-llama.cpp
 - Rust (more direct bindings): utilityai/llama-cpp-rs
 - C#/.NET: SciSharp/LLamaSharp
 - C#/VB.NET (more features - community license): LM-Kit.NET
 - Scala 3: donderom/llm4s
 - Clojure: phronmophobic/llama.clj
 - React Native: mybigday/llama.rn
 - Java: kherud/java-llama.cpp
 - Zig: deins/llama.cpp.zig
 - Flutter/Dart: netdur/llama_cpp_dart
 - Flutter: xuegao-tzx/Fllama
 - PHP (API bindings and features built on top of llama.cpp): distantmagic/resonance (more info)
 - Guile Scheme: guile_llama_cpp
 - Swift srgtuszy/llama-cpp-swift
 - Swift ShenghaiWang/SwiftLlama
 
**用户界面:
除非另有说明,这些项目都是具有宽松许可的开源项目:
- MindWorkAI/AI-Studio (FSL-1.1-MIT)
 - iohub/collama
 - janhq/jan (AGPL)
 - nat/openplayground
 - Faraday (proprietary)
 - LMStudio (proprietary)
 - Layla (proprietary)
 - ramalama (MIT)
 - LocalAI (MIT)
 - LostRuins/koboldcpp (AGPL)
 - Mozilla-Ocho/llamafile
 - nomic-ai/gpt4all
 - ollama/ollama
 - oobabooga/text-generation-webui (AGPL)
 - psugihara/FreeChat
 - cztomsik/ava (MIT)
 - ptsochantaris/emeltal
 - pythops/tenere (AGPL)
 - RAGNA Desktop (proprietary)
 - RecurseChat (proprietary)
 - semperai/amica
 - withcatai/catai
 - Mobile-Artificial-Intelligence/maid (MIT)
 - Msty (proprietary)
 - LLMFarm (MIT)
 - KanTV(Apachev2.0 or later)
 - Dot (GPL)
 - MindMac (proprietary)
 - KodiBot (GPL)
 - eva (MIT)
 - AI Sublime Text plugin (MIT)
 - AIKit (MIT)
 - LARS - The LLM & Advanced Referencing Solution (AGPL)
 - LLMUnity (MIT)
 - Llama Assistant (GPL)
 - PocketPal AI - An iOS and Android App (MIT)
 
(to have a project listed here, it should clearly state that it depends on llama.cpp)
工具:
- akx/ggify – 从 HuggingFace Hub 下载 PyTorch 模型并将其转换为 GGML
 - akx/ollama-dl – 从 Ollama 库下载模型,以便直接与 llama.cpp 一起使用
 - crashr/gppm – 利用 NVIDIA Tesla P40 或 P100 GPU 启动 llama.cpp 实例,降低空闲功耗
 - gpustack/gguf-parser - 查看/检查 GGUF 文件并估计内存使用情况
 - Styled Lines (专有许可,用于 Unity3d 游戏开发的推理部分的异步包装器,带有预构建的移动和 Web 平台包装器以及模型示例)
 
基础设施:
- Paddler - 为 llama.cpp 量身定制的状态负载均衡器
 - GPUStack - 管理用于运行 LLM 的 GPU 集群
 - llama_cpp_canister - llama.cpp 作为互联网计算机上的智能合约,使用 WebAssembly
 
游戏:
- Lucy’s Labyrinth - 一个简单的迷宫游戏,其中由 AI 模型控制的代理将试图欺骗你。
 
作者:Jeebiz  创建时间:2024-11-28 16:06
最后编辑:Jeebiz 更新时间:2024-11-28 16:22
最后编辑:Jeebiz 更新时间:2024-11-28 16:22