Skywork-13B-Math-8bits

Skywork-13B-Math 模型经过专门的数学能力强化训练。在13B规模的模型中,Skywork-13B-Math 模型在 GSM8K 评测上得分第一,同时在MATH数据集以及CMATH上也表现优异,处于13B模型顶尖水平。
Skywork-13B-Math-8bits 模型为 Skywork-13B-Math8bits量化版 ,支持用户在消费级显卡上进行进行部署和推理。

Hugging Face:https://huggingface.co/Skywork/Skywork-13B-Math-8bits
GitHub:https://github.com/SkyworkAI/Skywork

Skywork-13B-Math 模型评估(Results)

Skywork-13B-Math 在数学能力相对Base模型进一步加强,我们在主流的数学相关 benchmark,GSM8K,MATH和CMATH上进行评估。结果显示在13B规模模型中,我们的模型在GSM8K和CMATH评测中得分第一,同时MATH评测也处于前列。

要求(Requirements)

  • Python 3.8及以上版本
  • Pytorch 2.0及以上版本
  • CUDA建议使用11.4以上版本。

Skywork-13B-Base 模型,Skywork-13B-Chat 模型和 Skywork-13B-Math 模型运行下面的脚本进行Python依赖安装。

pip install -r requirements.txt

依赖项 (Dependency)

快速使用(Quickstart)

我们将模型参数、配置文件、tokenizer等在huggingface和modelscope上进行了开源。

Huggingface模型测试(Demostration)

Math 模型推理(Math Model Inferecen)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer_path = ""
checkpoint_path = ""

tokenizer = AutoTokenizer.from_pretrained(
    tokenizer_path, use_fast=False, trust_remote_code=True, padding_side='left')

model = AutoModelForCausalLM.from_pretrained(
    checkpoint_path, device_map="auto", trust_remote_code=True).eval()
tokenizer.add_tokens(["[USER]", "[BOT]", "[SEP]"])

def special_encode(input, tokenizer):
    raw_str = "[USER]%s[SEP][BOT]" % input.strip().replace("\r", "")
    eos_id = tokenizer.eos_token_id
    bos_id = tokenizer.bos_token_id
    sep_id = tokenizer.encode("[SEP]")[-1]
    res_id = [eos_id, bos_id]
    arr = raw_str.split("[SEP]")
    for elem_idx in range(len(arr)):
        elem = arr[elem_idx]
        elem_id = tokenizer.encode(elem)[1:]
        res_id += elem_id
        if elem_idx < len(arr) - 1:
            res_id.append(sep_id)

    return res_id

def extract_res(response):
    if "[BOT]" in response:
        response = response.split("[BOT]")[1]
    if "<s>" in response:
        response = response.split("<s>")[-1]
    if "</s>" in response:
        response = response.split("</s>")[0]
    if "[SEP]" in response:
        response = response.split("[SEP]")[0]
    return response


if __name__ == '__main__':
    text = "小王要将150千克含药量20%的农药稀释成含药量5%的药水.需要加水多少千克?"
    text_token_ids = torch.tensor(special_encode(
        text, tokenizer)).to(model.device).reshape(1, -1)
    response = model.generate(text_token_ids, do_sample=False, max_length=512)
    response_text = tokenizer.decode(response.cpu()[0], skip_special_tokens=True)

    response_text = extract_res(response_text)
    print(response_text)    
    """输出结果:
    首先,我们需要计算出150千克含药量20%的农药中含有多少千克的药。\n\n150千克 * 20% = 30千克\n\n然后,我们需要计算出要得到含药量5%的药水,需要多少千克的药水。\n\n30千克 / 5% = 600千克\n\n最后,我们需要计算出需要加多少千克的水。\n\n600千克 - 150千克 = 450千克\n\n所以答案是,小王需要加450千克的水。
    """ 
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer_path = ""
checkpoint_path = ""

tokenizer = AutoTokenizer.from_pretrained(
    tokenizer_path, use_fast=False, trust_remote_code=True, padding_side='left')

model = AutoModelForCausalLM.from_pretrained(
    checkpoint_path, device_map="auto", trust_remote_code=True).eval()
tokenizer.add_tokens(["[USER]", "[BOT]", "[SEP]"])

def special_encode(input, tokenizer):
    raw_str = "[USER]%s[SEP][BOT]" % input.strip().replace("\r", "")
    eos_id = tokenizer.eos_token_id
    bos_id = tokenizer.bos_token_id
    sep_id = tokenizer.encode("[SEP]")[-1]
    res_id = [eos_id, bos_id]
    arr = raw_str.split("[SEP]")
    for elem_idx in range(len(arr)):
        elem = arr[elem_idx]
        elem_id = tokenizer.encode(elem)[1:]
        res_id += elem_id
        if elem_idx < len(arr) - 1:
            res_id.append(sep_id)

    return res_id

def extract_res(response):
    if "[BOT]" in response:
        response = response.split("[BOT]")[1]
    if "<s>" in response:
        response = response.split("<s>")[-1]
    if "</s>" in response:
        response = response.split("</s>")[0]
    if "[SEP]" in response:
        response = response.split("[SEP]")[0]
    return response

if __name__ == '__main__':
    text="Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"
    text_token_ids = torch.tensor(special_encode(
        text, tokenizer)).to(model.device).reshape(1, -1)
    response = model.generate(text_token_ids, do_sample=False, max_length=512)
    response_text = tokenizer.decode(response.cpu()[0], skip_special_tokens=True)
    response_text = extract_res(response_text)
    print(response_text)
    """Skywork-13B-Math Response:
    First, we need to find out how many eggs Janet has left after eating for breakfast and baking for her friends. \n\nShe has 16 eggs per day, eats 3 for breakfast and uses 4 for baking. So, 16 - 3 - 4 = 9 eggs are left for selling at the farmers' market.\n\nSince she sells each egg for $2, she makes 9 * 2 = $<<9*2=18>>18 every day at the farmers' market.\n\nSo, the answer is $18.
    """

量化部署(Quantization)

8bit量化(Int8 Quantization)

Skywork 采用主流8bits量化方法:BitsAndBytes。该方法量化后性能基本无损,且已经集成到transformers库中,基于BitsAndBytes,我们提供在线量化和离线8bits模型两种方式。

以下我们提供示例说明如何使用int8量化模型,在开始使用之前,请先安装BitsAndBytes库并安装所需依赖包,具体安装方式见BitsAndBytes库。
在线量化(Online Quantization)

model = AutoModelForCausalLM.from_pretrained("skywork-13B-Base", torch_dtype=torch.bfloat16,load_in_8bit=True, trust_remote_code=True).eval()
离线量化(Offline Quantization)
model = AutoModelForCausalLM.from_pretrained("skywork-13B-Base-8bits", device_map="auto", torch_dtype=torch.bfloat16,trust_remote_code=True).eval()
量化效果(Evaluation)

我们对量化模型在基准评测数据集上做了测试,结果如下所示:

Precision C-Eval MMLU CMMLU
bf16 60.6 61.8 62.1
8bits 58.5 61.8 61.0

显存占用(GPU Mem in GB)

Precision Skywork-13B
bf16 25.91
8bits 13.57
作者:Jeebiz  创建时间:2023-12-23 21:56
最后编辑:Jeebiz  更新时间:2025-05-12 09:20