多轮对话数据微调 qwen

多轮数据训练可以让模型学会在连续对话中理解上下文、保持对话连贯性和角色一致性。相比直接生成（单轮问答）, 多轮训练能让模型更好地处理复杂对话场景, 实现更自然的人机交互。直接生成只关注单次提问和回答, 无法捕捉对话历史信息。

以下以 qwen2.5-3b-instruct 在一个心理辅导数据集上以 messages 格式训练。

环境搭建

1 2	export HF_ENDPOINT=https://hf-mirror.com huggingface-cli download Qwen/Qwen2.5-3B-Instruct --local-dir DC/qwen2.5-3B-ins --resume-download

import os, json, torch
from torch.utils.data import Dataset
from typing import Dict, Optional, List
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

model_name_or_path = "DC/qwen2.5-3B-ins"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
	model_name_or_path, 
	device_map="auto", 
    torch_dtype="auto",
	trust_remote_code=True
)
print(model.dtype)

最好指定下 torch_dtype=”auto”, 不然模型会以 fp32 精度加载。

推理测试 - 直接生成和 chat 生成

# 直接生成
input = "What is the capital of France?"
inputs = tokenizer(input, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7, top_p=0.9)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)

# chat 生成
messages = [
	{
		"role": "system",
		"content": "You are a helpful assistant."
	},
	{
		"role": "user",
		"content": "What is the capital of France?"
	}
]
# 使用 tokenizer.apply_chat_templat e来处理对话消息 messages, 只对有 chat template 的模型有效
# 一般会将 messages 内容加上角色的标识, add_generation_prompt 是指加上属于模型的标识, 这个其实不需要模型生成, 直接加了会起到提示大模型该轮到你说了
inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(inputs, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7, top_p=0.9)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(output_text)

直接生成:

What is the capital of France? The capital of France is Paris. Paris is a beautiful city known for its art, fashion, food, and culture. It is also home to many famous landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum

chat 格式生成:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>

区别在于特殊的角色标识符号, 可以更显式的区别人与模型的内容, 可以人与模型的内容交叉实现对话的形式。

加载数据集

from datasets import load_dataset

ds = load_dataset("Amod/mental_health_counseling_conversations")
print(ds)
print(json.dumps(ds["train"][0], indent=2, ensure_ascii=False))

DatasetDict({
	train: Dataset({
		features: ['Context', 'Response'],
		num_rows: 3512
	})
})
{
	"Context": "I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone?",
	"Response": "If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are many inspirational messages you can find in social media.  Maybe read some of the ones which state that no person is worthless, and that everyone has a good purpose to their life.Also, since our culture is so saturated with the belief that if someone doesn't feel good about themselves that this is somehow terrible.Bad feelings are part of living.  They are the motivation to remove ourselves from situations and relationships which do us more harm than good.Bad feelings do feel terrible.   Your feeling of worthlessness may be good in the sense of motivating you to find out that you are much better than your feelings today."
}

将数据格式化为 openai 格式的 messages

将原始的心理健康咨询对话数据集（ds）中的每条数据转换为对话格式（message_format_ds）, 每条数据包含一组 user-assistant 消息, 便于后续用于对话模型的训练或推理。

def process_into_chat_format(example):
	messages = []
	messages.append({
		"role": "system", 
		"content": "You are a mental health counselor to help those who are suffering from a number of disorders including anxiety or depression.."
	})
	messages.append({
		"role": "user", 
		"content": example["Context"].replace("\xa0", " ")
	})
	messages.append({
		"role": "assistant", 
		"content": example["Response"].replace("\xa0", " ")
	})
	return {"conversations": messages}

message_format_ds = ds.map(
	process_into_chat_format, 
	remove_columns=["Context", "Response"]
)
message_format_ds = message_format_ds['train'].train_test_split(test_size=0.1, seed=42)

print(message_format_ds)
print(json.dumps(message_format_ds['train'][0], indent=2, ensure_ascii=False))

DatasetDict({
	train: Dataset({
		features: ['conversations'],
		num_rows: 3160
	})
	test: Dataset({
		features: ['conversations'],
		num_rows: 352
	})
})
{
	"conversations": [
	{
		"content": "You are a mental health counselor to help those who are suffering from a number of disorders including anxiety or depression..",
		"role": "system"
	},
	{
		"content": "I just took a job that requires me to travel far away from home. My family and I really need this job.\n   People keep telling me I have \"anxiety\" and I'm terrified of having an anxiety attack on the road. This is all new to me. What can I do?",
		"role": "user"
	},
	{
		"content": "It is ok to have anxiety.   Please don't be anxious about being anxious.If you feel anxiety coming over you, then pull off the road to a safe place.   Concentrate on centering yourself and to breath slowly.   Take some sips of water.  Sit still.     The anxiety should pass in about twenty minutes.If it does not pass, then continue calming yourself until you feel safe enough to drive to your hotel.     You can always explain to your supervisor that you were taking care of a medical problem, because anxiety is a medical problem.",
		"role": "assistant"
	}
	]
}

tokenize 对话

tokenizer.apply_chat_template 用在推理时时方便的, 但在转换训练数据时需要对不同角色的conntent和特殊符号分别处理, 以下的函数是基于 qwen template 设计的。

def preprocess_openai_messages_qwen_format(
	messages: List[Dict[str, str]],
	tokenizer: AutoTokenizer,
	max_length: int = 2048
) -> Dict[str, List[int]]:
	"""
	将对话数据集转换为适用于 Qwen 格式的输入特征, 包括 input_ids、labels 和 attention_mask, 便于后续微调模型。并提供解码函数, 方便检查预处理结果的正确性。
	和非 chat 数据的区别在于需要注意 chat 格式和 label 仅为 assistant 内容。
	"""
	input_ids = []
	labels = []

	for msg in messages:
		role = msg["role"]
		content = msg["content"]

		# 1. <|im_start|>{role}\n → 不训练
		prefix = f"<|im_start|>{role}\n"
		prefix_ids = tokenizer(prefix, add_special_tokens=False)["input_ids"]
		input_ids.extend(prefix_ids)
		labels.extend([-100] * len(prefix_ids))

		# 2. content → assistant 才训练
		content_ids = tokenizer(content, add_special_tokens=False)["input_ids"]
		input_ids.extend(content_ids)
		if role == "assistant":
			labels.extend(content_ids)
		else:
			labels.extend([-100] * len(content_ids))

		# 3. <|im_end|> → 仅 assistant 时参与训练
		suffix = "<|im_end|>"
		suffix_ids = tokenizer(suffix, add_special_tokens=False)["input_ids"]
		input_ids.extend(suffix_ids)
		if role == "assistant":
			labels.extend(suffix_ids)
		else:
			labels.extend([-100] * len(suffix_ids))

		# 4. 添加换行符
		input_ids.extend(tokenizer('\n', add_special_tokens=False)["input_ids"])
		labels.append(-100)

	assert len(input_ids) == len(labels), "Input IDs and labels must have the same length."
	# 截断
	input_ids = input_ids[:max_length]
	labels = labels[:max_length]
	attention_mask = [1] * len(input_ids)

	return {
		"input_ids": input_ids,
		"labels": labels,
		"attention_mask": attention_mask
	}

def decode_labels(labels: List[int], tokenizer: AutoTokenizer) -> str:
	# 将 labels 中连续的非 -100 段分别 decode, 并用特殊分隔符拼接
	segments = []
	current = []
	for t in labels:
		if t != -100:
			current.append(t)
		else:
			if current:
				segments.append(tokenizer.decode(current, skip_special_tokens=False))
				current = []
	if current:
		segments.append(tokenizer.decode(current, skip_special_tokens=False))
	return segments

messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "What is the capital of France?"},
	{"role": "assistant", "content": "The capital of France is Paris."},
	{"role": "user", "content": "What is the capital of Germany?"},
	{"role": "assistant", "content": "The capital of Germany is Berlin."}
]

sample = preprocess_openai_messages_qwen_format(messages, tokenizer)

print(tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False))

# 打印 input_ids 解码后内容
print("Decoded input:\n{}".format(tokenizer.decode(sample["input_ids"], skip_special_tokens=False)))

# 打印 labels 解码后内容（只显示参与训练的内容）
print("Decoded labels:\n{}".format(decode_labels(sample["labels"], tokenizer)))

tokenizer.apply_chat_template:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is the capital of Germany?<|im_end|>
<|im_start|>assistant
The capital of Germany is Berlin.<|im_end|>

Decoded input:

Decoded input:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is the capital of Germany?<|im_end|>
<|im_start|>assistant
The capital of Germany is Berlin.<|im_end|>

Decoded labels:
['The capital of France is Paris.<|im_end|>', 'The capital of Germany is Berlin.<|im_end|>']

Decoded labels:
[‘The capital of France is Paris.<|im_end|>’, ‘The capital of Germany is Berlin.<|im_end|>’]

labels中非-100的只有

def wrapped_preprocess(example, tokenizer, max_length=2048):
	# batched=True: example["conversations"] is a list of conversations
	conversations_list = example["conversations"]
	results = preprocess_openai_messages_qwen_format(example["conversations"], tokenizer, max_length)
	return results

input_ds = message_format_ds.map(
	wrapped_preprocess,
	remove_columns=["conversations"],
	desc="Processing training dataset",
	fn_kwargs={"tokenizer": tokenizer, "max_length": 2048} # max_length=8192 时会OOM, 原因时有两个数据太长了, 一般数据都在1k以下
)

这个 cell 是可选的, 静态 bucketing, 在 batch 内排序以减小 padding, 下面的结果都是没有执行静态 bucketing 的结果。

1 2	input_ds = input_ds.map(lambda x: {"length": len(x["input_ids"])}, desc="Calculating input length") input_ds = input_ds.sort("length") # 排序！

from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(
	tokenizer=tokenizer,
	return_tensors="pt",
	padding=True,
)

samples = [input_ds['train'][i] for i in range(3)]
batch = data_collator(samples)
for key, value in batch.items():
	print(f"{key}: {value.shape}")

input_ids: torch.Size([3, 516])
attention_mask: torch.Size([3, 516])
labels: torch.Size([3, 516])

检查 batch 长度

防止出现过长的数据, 导致突然 OOM

import torch
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from tqdm import tqdm

def plot_batch_lengths(dataset, data_collator, batch_size=1, title="Batch Token Lengths"):
    dataloader = DataLoader(
        dataset,
        batch_size=batch_size,
        collate_fn=data_collator
    )

    batch_lengths = []
    for batch in tqdm(dataloader, desc="Analyzing batches"):
        input_ids = batch["input_ids"]
        # 如果是多条样本拼成的 batch, 取最长的那条（最大长度）
        if isinstance(input_ids, torch.Tensor):
            length = input_ids.shape[1]
        else:
            # 防止出现 List[List[int]]
            length = max(len(seq) for seq in input_ids)
        batch_lengths.append(length)

    # 绘图
    plt.figure(figsize=(12, 4))
    plt.plot(batch_lengths, marker='o', markersize=2, linewidth=0.8)
    plt.xlabel("Batch Index (Step)")
    plt.ylabel("Token Length")
    plt.title(title)
    plt.grid(True)
    plt.tight_layout()
    plt.show()

    return batch_lengths

# 使用方法
batch_lengths = plot_batch_lengths(
    dataset=input_ds['train'],
    data_collator=data_collator,
    batch_size=1,
    title="Token Length per Batch in Training Dataset"
)

之前用 max_length=8192 还是过于看得起 4090 了, 有几个数据很长, 大约在 56 step 时就会遇到超长数据, 会突然 OOM。因此最终改 max_length=1024。

peft

模型定义

from peft import (
    LoraConfig,
	TaskType,
    get_peft_model,
)

lora_config = LoraConfig(
	task_type=TaskType.CAUSAL_LM, 
	target_modules=['q_proj', 'v_proj'], 
	r=16, 
	lora_alpha=16
)

peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

trainable params: 3,686,400 || all params: 3,089,625,088 || trainable%: 0.1193

模型训练

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
	output_dir="./lora-conversation-2",
	per_device_train_batch_size=1,
	gradient_accumulation_steps=32,
	per_device_eval_batch_size=4,
	num_train_epochs=2,
	learning_rate=2e-4,
	weight_decay=0.01,
	logging_steps=10,
	save_steps=100,
	eval_strategy="steps",
	eval_steps=10,
	save_total_limit=1,
	load_best_model_at_end=False,
	report_to='none'
)

trainer = Trainer(
	model=peft_model,
	args=training_args,
	train_dataset=input_ds['train'],
	eval_dataset=input_ds['test'],
	data_collator=data_collator,
)

1 2	trainer.train() trainer.evaluate()

{‘eval_loss’: 2.4432363510131836,
‘eval_runtime’: 9.3492,
‘eval_samples_per_second’: 37.65,
‘eval_steps_per_second’: 9.413,
‘epoch’: 1.9822784810126581}

test

model_name_or_path = "DC/qwen2.5-3B-ins"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
	model_name_or_path, 
	device_map="auto", 
    torch_dtype="auto",
	trust_remote_code=True
)

example = message_format_ds['test'][0]['conversations']
example_i = example[:-1]  # 去掉最后一条 assistant
example_o = example[-1]  # 最后一条是 assistant 的回复
inputs = tokenizer.apply_chat_template(example_i, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(inputs, return_tensors="pt").to(model.device)

1
2
3

outputs = model.generate(**inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print("Output:\n", output_text)

from peft import PeftModel, PeftConfig
peft_model_id = "lora-conversation-2/checkpoint-196"  # LoRA adapter 保存路径
config = PeftConfig.from_pretrained(peft_model_id)

model_name_or_path = "DC/qwen2.5-3B-ins"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
base_model  = AutoModelForCausalLM.from_pretrained(
	model_name_or_path, 
	device_map="auto", 
    torch_dtype="auto",
	trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, peft_model_id)

outputs = model.generate(**inputs, max_new_tokens=500, do_sample=True, temperature=0.7, top_p=0.9)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print("Output:\n", output_text)

原输出

Instead of fighting about trust, is it possible for you and your wife to talk with other about areas which upset each of you?Whenever feelings are hurt, knowing what exactly is problematic and being heard and understood by the partner, goes a long way to building trust.These type of discussions are hard to have, and especially for the first time. A lot of emotions arise and often people lose their conversation focus from this.If you and your wife have a tough time opening up to each other, consider scheduling time with a couples therapist.Just by each of you committing time to invest in the relationship will show faith in the relationship, and this may contribute to restoring trust.

微调前

It sounds like you’re experiencing significant distress related to feelings of betrayal and loss of trust, which can be deeply painful and challenging to navigate. It’s important to address these emotions and the situation with care and consideration for both your own emotional well-being and that of your wife.

Here are some steps you might consider:

Communication: Have an open, honest conversation with your wife about how you feel. It’s crucial to express your concerns without placing blame. Use “I” statements to avoid making her defensive. For example, say something like, “I’ve been feeling really hurt by the recent changes in our relationship dynamics.”
Listening: Allow your wife to express her side of the story. Sometimes, people act out of fear or insecurity. Listening can provide you with insights into why she may have acted as she did and show her that you respect her perspective.
Seek Support: Consider speaking with a therapist or counselor who can help facilitate communication between you and your wife. They can offer strategies for rebuilding trust and understanding.
Establish Boundaries: Set clear boundaries regarding privacy and communication with friends and romantic partners. This can help prevent similar situations from arising in the future.
Self-Care: Engage in activities that promote your mental and emotional health. This could include exercise, meditation, or hobbies that bring you joy and relaxation.
Professional Help: If the situation feels overwhelming, professional guidance can be invaluable. A psychologist or psychiatrist can provide tools and techniques to help manage your anxiety and depression, which are common responses to betrayal and loss of trust.
Time: Give yourself and your wife time to heal. Healing takes time, and it’s essential not to rush this process.

Remember, the goal is to strengthen your relationship, not just to survive the current crisis. Trust can be rebuilt over time with patience, honesty, and commitment from both of you.

How does this resonate with you, and what specific areas do you need more assistance with?<|im_end|>

微调后

It sounds like you are in the middle of a “trust gap” between your spouse and yourself. You both are in different places emotionally regarding the issue of trust. It is a good idea for you to start by talking with your wife about what you have experienced and how it has affected you. She may not be aware of your feelings and concerns. You may also want to discuss your thoughts and feelings with someone else outside of your marriage, such as a trusted family member, friend, or therapist. Having an objective listener can help you sort through your feelings and thoughts regarding this situation. <|im_end|>

回答效果不一定是正向提升, 但整体风格更接近数据集。