utils.data.sft
utils.data.sft
Data handling specific to SFT.
Functions
Name | Description |
---|---|
prepare_datasets | Prepare training and evaluation datasets based on configuration. |
prepare_datasets
utils.data.sft.prepare_datasets(
cfg,
tokenizer,=None,
processor=False,
preprocess_iterable )
Prepare training and evaluation datasets based on configuration.
Parameters
Name | Type | Description | Default |
---|---|---|---|
cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
tokenizer | PreTrainedTokenizer | Tokenizer to use for processing text. | required |
processor | ProcessorMixin | None | Optional processor for multimodal datasets. | None |
preprocess_iterable | bool | Whether to use iterable preprocessing. | False |
Returns
Name | Type | Description |
---|---|---|
tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] | Tuple of (train_dataset, eval_dataset, total_steps, prompters). |