Training Guide
LEGIONHETO supports three training methods: SFT, DPO, and ORPO.
Supervised Fine-Tuning (SFT)
Standard instruction fine-tuning:
from legionheto import LegionHetoModel, SFTTrainer
model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)
trainer = SFTTrainer(
model=model,
dataset=dataset,
output_dir="./output",
num_train_epochs=3,
learning_rate=2e-4,
)
trainer.train()
Direct Preference Optimization (DPO)
Train from preference data:
from legionheto import LegionHetoModel, DPOTrainer
model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
ref_model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)
trainer = DPOTrainer(
model=model,
ref_model=ref_model,
dataset=preference_dataset,
output_dir="./dpo_output",
beta=0.1,
)
trainer.train()
Odds Ratio Preference Optimization (ORPO)
Combined SFT and preference optimization:
from legionheto import LegionHetoModel, ORPOTrainer
model = LegionHetoModel("meta-llama/Llama-2-7b-hf")
model.setup_lora(r=16, alpha=32)
trainer = ORPOTrainer(
model=model,
dataset=dataset,
output_dir="./orpo_output",
beta=0.1,
)
trainer.train()
Configuration Options
All trainers support these common parameters:
| Parameter | Default | Description |
|---|---|---|
| num_train_epochs | 3 | Number of training epochs |
| per_device_train_batch_size | 4 | Batch size per device |
| gradient_accumulation_steps | 4 | Gradient accumulation steps |
| learning_rate | 2e-4 | Learning rate |
| warmup_steps | 100 | Warmup steps |
| max_grad_norm | 0.3 | Max gradient norm |
| weight_decay | 0.01 | Weight decay |