Skip to content

Training Guide

LEGIONHETO supports three training methods: SFT, DPO, and ORPO.

Supervised Fine-Tuning (SFT)

Standard instruction fine-tuning:

from legionheto import LegionHetoModel, SFTTrainer

model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)

trainer = SFTTrainer(
    model=model,
    dataset=dataset,
    output_dir="./output",
    num_train_epochs=3,
    learning_rate=2e-4,
)

trainer.train()

Direct Preference Optimization (DPO)

Train from preference data:

from legionheto import LegionHetoModel, DPOTrainer

model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
ref_model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)

trainer = DPOTrainer(
    model=model,
    ref_model=ref_model,
    dataset=preference_dataset,
    output_dir="./dpo_output",
    beta=0.1,
)

trainer.train()

Odds Ratio Preference Optimization (ORPO)

Combined SFT and preference optimization:

from legionheto import LegionHetoModel, ORPOTrainer

model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)

trainer = ORPOTrainer(
    model=model,
    dataset=dataset,
    output_dir="./orpo_output",
    beta=0.1,
)

trainer.train()

Configuration Options

All trainers support these common parameters:

Parameter Default Description
num_train_epochs 3 Number of training epochs
per_device_train_batch_size 4 Batch size per device
gradient_accumulation_steps 4 Gradient accumulation steps
learning_rate 2e-4 Learning rate
warmup_steps 100 Warmup steps
max_grad_norm 0.3 Max gradient norm
weight_decay 0.01 Weight decay