utils.data.sft

utils.data.sft

Data handling specific to SFT.

Functions

Name Description
prepare_datasets Prepare training and evaluation datasets based on configuration.

prepare_datasets

utils.data.sft.prepare_datasets(
    cfg,
    tokenizer,
    processor=None,
    preprocess_iterable=False,
)

Prepare training and evaluation datasets based on configuration.

Parameters

Name Type Description Default
cfg DictDefault Dictionary mapping axolotl config keys to values. required
tokenizer PreTrainedTokenizer Tokenizer to use for processing text. required
processor ProcessorMixin | None Optional processor for multimodal datasets. None
preprocess_iterable bool Whether to use iterable preprocessing. False

Returns

Name Type Description
tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] Tuple of (train_dataset, eval_dataset, total_steps, prompters).