Llama.cpp 简介
llama.cpp 的主要目标是在各种硬件(本地和云端)上以最少的设置和最先进的性能实现 LLM 推理。
- 纯 C/C++ 实现,无任何依赖项
- Apple 芯片是一流的——通过 ARM NEON、Accelerate 和 Metal 框架进行了优化
- AVX、AVX2、AVX512 和 AMX 支持 x86 架构
- 1.5 位、2 位、3 位、4 位、5 位、6 位和 8 位整数量化,可加快推理速度并减少内存使用
- 用于在 NVIDIA GPU 上运行 LLM 的自定义 CUDA 内核(通过 HIP 支持 AMD GPU,通过 MUSA 支持 Moore Threads MTT GPU)
- Vulkan 和 SYCL 后端支持
- CPU+GPU 混合推理,部分加速大于 VRAM 总容量的模型
支持的型号:
通常也支持以下基础模型的微调。
- LLaMA 🦙
- LLaMA 2 🦙🦙
- LLaMA 3 🦙🦙🦙
- Mistral 7B
- Mixtral MoE
- DBRX
- Falcon
- Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2
- Vigogne (French)
- BERT
- Koala
- Baichuan 1 & 2 + derivations
- Aquila 1 & 2
- Starcoder models
- Refact
- MPT
- Bloom
- Yi models
- StableLM models
- Deepseek models
- Qwen models
- PLaMo-13B
- Phi models
- GPT-2
- Orion 14B
- InternLM2
- CodeShell
- Gemma
- Mamba
- Grok-1
- Xverse
- Command-R models
- SEA-LION
- GritLM-7B + GritLM-8x7B
- OLMo
- OLMo 2
- OLMoE
- Granite models
- GPT-NeoX + Pythia
- Snowflake-Arctic MoE
- Smaug
- Poro 34B
- Bitnet b1.58 models
- Flan T5
- Open Elm models
- ChatGLM3-6b + ChatGLM4-9b
- SmolLM
- EXAONE-3.0-7.8B-Instruct
- FalconMamba Models
- Jais
- Bielik-11B-v2.3
- RWKV-6
(支持更多模型的说明: HOWTO-add-model.md)
多模态模型:
- LLaVA 1.5 models, LLaVA 1.6 models
- BakLLaVA
- Obsidian
- ShareGPT4V
- MobileVLM 1.7B/3B models
- Yi-VL
- Mini CPM
- Moondream
- Bunny
Bindings:
- Python: abetlen/llama-cpp-python
- Go: go-skynet/go-llama.cpp
- Node.js: withcatai/node-llama-cpp
- JS/TS (llama.cpp server client): lgrammel/modelfusion
- JS/TS (Programmable Prompt Engine CLI): offline-ai/cli
- JavaScript/Wasm (works in browser): tangledgroup/llama-cpp-wasm
- Typescript/Wasm (nicer API, available on npm): ngxson/wllama
- Ruby: yoshoku/llama_cpp.rb
- Rust (more features): edgenai/llama_cpp-rs
- Rust (nicer API): mdrokz/rust-llama.cpp
- Rust (more direct bindings): utilityai/llama-cpp-rs
- C#/.NET: SciSharp/LLamaSharp
- C#/VB.NET (more features - community license): LM-Kit.NET
- Scala 3: donderom/llm4s
- Clojure: phronmophobic/llama.clj
- React Native: mybigday/llama.rn
- Java: kherud/java-llama.cpp
- Zig: deins/llama.cpp.zig
- Flutter/Dart: netdur/llama_cpp_dart
- Flutter: xuegao-tzx/Fllama
- PHP (API bindings and features built on top of llama.cpp): distantmagic/resonance (more info)
- Guile Scheme: guile_llama_cpp
- Swift srgtuszy/llama-cpp-swift
- Swift ShenghaiWang/SwiftLlama
**用户界面:
除非另有说明,这些项目都是具有宽松许可的开源项目:
- MindWorkAI/AI-Studio (FSL-1.1-MIT)
- iohub/collama
- janhq/jan (AGPL)
- nat/openplayground
- Faraday (proprietary)
- LMStudio (proprietary)
- Layla (proprietary)
- ramalama (MIT)
- LocalAI (MIT)
- LostRuins/koboldcpp (AGPL)
- Mozilla-Ocho/llamafile
- nomic-ai/gpt4all
- ollama/ollama
- oobabooga/text-generation-webui (AGPL)
- psugihara/FreeChat
- cztomsik/ava (MIT)
- ptsochantaris/emeltal
- pythops/tenere (AGPL)
- RAGNA Desktop (proprietary)
- RecurseChat (proprietary)
- semperai/amica
- withcatai/catai
- Mobile-Artificial-Intelligence/maid (MIT)
- Msty (proprietary)
- LLMFarm (MIT)
- KanTV(Apachev2.0 or later)
- Dot (GPL)
- MindMac (proprietary)
- KodiBot (GPL)
- eva (MIT)
- AI Sublime Text plugin (MIT)
- AIKit (MIT)
- LARS - The LLM & Advanced Referencing Solution (AGPL)
- LLMUnity (MIT)
- Llama Assistant (GPL)
- PocketPal AI - An iOS and Android App (MIT)
(to have a project listed here, it should clearly state that it depends on llama.cpp
)
工具:
- akx/ggify – 从 HuggingFace Hub 下载 PyTorch 模型并将其转换为 GGML
- akx/ollama-dl – 从 Ollama 库下载模型,以便直接与 llama.cpp 一起使用
- crashr/gppm – 利用 NVIDIA Tesla P40 或 P100 GPU 启动 llama.cpp 实例,降低空闲功耗
- gpustack/gguf-parser - 查看/检查 GGUF 文件并估计内存使用情况
- Styled Lines (专有许可,用于 Unity3d 游戏开发的推理部分的异步包装器,带有预构建的移动和 Web 平台包装器以及模型示例)
基础设施:
- Paddler - 为 llama.cpp 量身定制的状态负载均衡器
- GPUStack - 管理用于运行 LLM 的 GPU 集群
- llama_cpp_canister - llama.cpp 作为互联网计算机上的智能合约,使用 WebAssembly
游戏:
- Lucy’s Labyrinth - 一个简单的迷宫游戏,其中由 AI 模型控制的代理将试图欺骗你。
作者:Jeebiz 创建时间:2024-11-28 16:06
最后编辑:Jeebiz 更新时间:2024-11-28 16:14
最后编辑:Jeebiz 更新时间:2024-11-28 16:14