OpenLLM - Powered by MinDoc

用于在生产中操作大型语言模型 (LLM) 的开放平台。轻松微调、服务、部署和监控任何法学硕士。

简介

OpenLLM 是一个开源平台，旨在促进大型语言模型 (LLM) 在实际应用中的部署和操作。借助 OpenLLM，您可以在任何开源 LLM 上运行推理，将其部署在云端或本地，并构建强大的 AI 应用程序。

主要特点包括：

🚂最先进的 LLM：对各种开源 LLM 和模型运行时的集成支持，包括但不限于 Llama 2、StableLM、Falcon、Dolly、Flan-T5、ChatGLM 和 StarCoder。
🔥灵活的 API：使用单个命令通过 RESTful API 或 gRPC 为 LLM 提供服务。您可以使用 Web UI、CLI、Python/JavaScript 客户端或您选择的任何 HTTP 客户端与模型进行交互。
⛓️自由构建：对 LangChain、BentoML、LlamaIndex、OpenAI 端点和 Hugging Face 的一流支持，让您可以通过将 LLM 与其他模型和服务组合来轻松创建自己的 AI 应用程序。
🎯简化部署：自动生成 LLM 服务器 Docker 映像或通过 ☁️ BentoCloud部署为无服务器端点，它可以轻松管理 GPU 资源，根据流量进行扩展，并确保成本效益。
🤖️带上你自己的法学硕士：微调任何法学硕士以满足你的需求。您可以加载 LoRA 层来微调模型，以获得特定任务的更高准确性和性能。模型的统一微调 API ( LLM.tuning()) 即将推出。
⚡ 量化：使用LLM.int8、SpQR (int4)、AWQ、 GPTQ和SqueezeLLM等量化技术以更少的计算和内存成本运行推理。
📡 Streaming：通过服务器发送事件（SSE）支持令牌流。您可以使用/v1/generate_stream 端点来流式传输来自 LLM 的响应。
🔄 连续批处理：通过vLLM支持连续批处理，以提高总吞吐量。

OpenLLM 专为致力于构建基于 LLM 的生产就绪应用程序的 AI 应用程序开发人员而设计。它提供了一套全面的工具和功能，用于微调、服务、部署和监控这些模型，从而简化了法学硕士的端到端部署工作流程。

快速开始

以下提供了如何在本地开始使用 OpenLLM 的说明。

先决条件

您已安装 Python 3.8（或更高版本）和 pip.我们强烈建议使用虚拟环境来防止包冲突。

安装 OpenLLM

使用pip以下命令安装 OpenLLM：

pip install openllm

要验证安装，请运行：

$ openllm -h

Usage: openllm [OPTIONS] COMMAND [ARGS]...

   ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
  ██╔═══██╗██╔══██╗██╔════╝████╗  ██║██║     ██║     ████╗ ████║
  ██║   ██║██████╔╝█████╗  ██╔██╗ ██║██║     ██║     ██╔████╔██║
  ██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║██║     ██║     ██║╚██╔╝██║
  ╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
   ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝.

  An open platform for operating large language models in production.
  Fine-tune, serve, deploy, and monitor any LLMs with ease.

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  build       Package a given models into a BentoLLM.
  import      Setup LLM interactively.
  models      List all supported models.
  prune       Remove all saved models, (and optionally bentos) built with OpenLLM locally.
  query       Query a LLM interactively, from a terminal.
  start       Start a LLMServer for any supported LLM.
  start-grpc  Start a gRPC LLMServer for any supported LLM.

Extensions:
  build-base-container  Base image builder for BentoLLM.
  dive-bentos           Dive into a BentoLLM.
  get-containerfile     Return Containerfile of any given Bento.
  get-prompt            Get the default prompt used by OpenLLM.
  list-bentos           List available bentos built by OpenLLM.
  list-models           This is equivalent to openllm models...
  playground            OpenLLM Playground.

启动LLM服务器

OpenLLM 允许您使用快速启动 LLM 服务器openllm start。例如，要启动 phi-2 服务器，请运行以下命令：

TRUST_REMOTE_CODE=True openllm start microsoft/phi-2

这将在http://0.0.0.0:3000/处启动服务器。如果之前尚未注册，OpenLLM 会将模型下载到 BentoML 本地模型存储。要查看本地模型，请运行bentoml models list.

要与服务器交互，您可以访问 http://0.0.0.0:3000/的 Web UI或使用发送请求 curl。您还可以使用OpenLLM的内置Python客户端与服务器交互：

import openllm

client = openllm.client.HTTPClient('http://localhost:3000')
client.query('Explain to me the difference between "further" and "farther"')

或者，使用以下 openllm query 命令查询模型：

export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'

OpenLLM 无缝支持许多模型及其变体。您可以指定要提供服务的模型的不同变体。例如：

openllm start –

作者：Jeebiz 创建时间：2024-02-18 22:29
最后编辑：Jeebiz 更新时间：2025-11-11 17:29