Export Guide

LEGIONHETO supports exporting to multiple formats.

GGUF Export

Export to llama.cpp compatible format:

from legionheto import export_to_gguf

export_to_gguf(
    model=model,
    tokenizer=tokenizer,
    output_path="./model.gguf",
    quantization="Q4_K_M",
)

Supported quantization types:

Q4_0, Q4_1
Q5_0, Q5_1
Q8_0
Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q8_K
F16, F32

CLI Export

legionheto export \
    --model ./output \
    --output ./model.gguf \
    --format gguf \
    --quantization Q4_K_M

Adapter Export

Export only LoRA weights:

model.save_adapter("./adapter")

Merged Model Export

Export full merged model:

model.merge_and_unload()
model.save_pretrained("./merged")