LLaMA-Factory - Powered by MinDoc

使用 LLaMA-Factory 进行训练

https://github.com/hiyouga/LLaMA-Factory/blob/main/README_zh.md

在接下来的章节中，我们将介绍如何使用 LLaMA-Factory 来微调 Hunyuan 模型。

先决条件
验证以下依赖项的安装：

LLaMA-Factory: 请遵循官方安装指南
DeepSpeed (可选): 请遵循官方安装指南
Transformer 库: 使用配套分支（Hunyuan 提交的代码正在审核中）
pip install git+@4970b23cedaf745f963779b4eae68da281e8c6ca"target="_blank"">https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
数据准备
我们需要准备一个自定义数据集：

将您的数据组织成 json 格式，并放置在 LLaMA-Factory 中的 data 目录下。当前实现使用 sharegpt 数据集格式，需要以下结构：
[
{
“messages”: [
{
“role”: “system”,
“content”: “System prompt (optional)”
},
{
“role”: “user”,
“content”: “Human instruction”
},
{
“role”: “assistant”,
“content”: “Model response”
}
]
}
]
有关详细信息，请参阅前面提到的数据格式部分。

在 data/dataset_info.json 文件中按照以下格式定义您的数据集：
“dataset_name”: {
“file_name”: “dataset.json”,
“formatting”: “sharegpt”,
“columns”: {
“messages”: “messages”
},
“tags”: {
“role_tag”: “role”,
“content_tag”: “content”,
“user_tag”: “user”,
“assistant_tag”: “assistant”,
“system_tag”: “system”
}
}
训练执行
将 train/llama_factory_support/example_configs 目录下的所有文件复制到 LLaMA-Factory 中的 example/hunyuan 目录。
修改配置文件 hunyuan_full.yaml 中的模型路径和数据集名称。根据需要调整其他配置：

model

model_name_or_path: [!!!add the model path here!!!]

dataset

dataset: [!!!add the dataset name here!!!]
执行训练命令： *单节点训练注意：设置环境变量 DISABLE_VERSION_CHECK 为 1 以避免版本冲突。
export DISABLE_VERSION_CHECK=1
llamafactory-cli train examples/hunyuan/hunyuan_full.yaml
*多节点训练在每个节点上执行以下命令。根据您的环境配置 NNODES、NODE_RANK、MASTER_ADDR 和 MASTER_PORT：
export DISABLE_VERSION_CHECK=1
FORCE_TORCHRUN=1 NNODES=${NNODES} NODE_RANK=${NODE_RANK} MASTER_ADDR=${MASTER_ADDR} MASTER_PORT=${MASTER_PORT}
llamafactory-cli train examples/hunyuan/hunyuan_full.yaml

作者：Ddd4j 创建时间：2025-08-04 23:08
最后编辑：Ddd4j 更新时间：2026-02-27 09:37