快速入门

# 克隆仓库
git clone https://github.com/microsoft/visual-chatgpt.git

# 进入目录
cd visual-chatgpt

# 使用 conda 创建一个新的名称为visgpt环境
conda create -n visgpt python=3.8

# 激活新环境
conda activate visgpt

# 准备基础环境
pip install -r requirements.txt

# 准备您的 OpenAI 私钥（适用于 Linux）
export OPENAI_API_KEY={Your_Private_Openai_Key}

# 准备您的 OpenAI 私钥（适用于 Windows）
set OPENAI_API_KEY={Your_Private_Openai_Key}

# 启动 Visual ChatGPT！
# 可以通过“--load”指定GPU/CPU分配，参数表示是哪个
# 要使用的 Visual Foundation 模型以及它将加载到的位置
# 型号和设备用下划线'_'分隔，不同型号用逗号','分隔
# 可用的 Visual Foundation Models 可以在下表中找到
# 例如，如果你想加载 ImageCaptioning 到 cpu 和 Text2Image 到 cuda:0
# 你可以使用：“ImageCaptioning_cpu,Text2Image_cuda:0”

# 给 CPU 用户的建议
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu

# 建议 1 Tesla T4 15GB (Google Colab)
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"

# 建议 4 Tesla V100 32GB
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

GPU 内存使用

这里我们列出了每个视觉基础模型的GPU显存使用情况，你可以指定你喜欢哪个：

基础模型	GPU 内存 (MB)
ImageEditing	3981
InstructPix2Pix	2827
Text2Image	3385
ImageCaptioning	1209
Image2Canny	0
CannyText2Image	3531
Image2Line	0
LineText2Image	3529
Image2Hed	0
HedText2Image	3529
Image2Scribble	0
ScribbleText2Image	3531
Image2Pose	0
PoseText2Image	3529
Image2Seg	919
SegText2Image	3529
Image2Depth	0
DepthText2Image	3531
Image2Normal	0
NormalText2Image	3529
VisualQuestionAnswering	1495

致谢

我们感谢以下项目的开源：

Hugging Face
LangChain
Stable Diffusion
ControlNet
InstructPix2Pix
CLIPSeg
BLIP

作者：Jeebiz 创建时间：2023-12-12 12:21
最后编辑：Jeebiz 更新时间：2025-05-12 09:20