Dataset Managers
Modules:
Name | Description |
---|---|
aci_bench |
|
huggingface |
|
Classes:
Name | Description |
---|---|
AciBenchDatasetManager |
A dataset manager for the ACI Bench dataset. |
HuggingFaceDatasetManager |
A dataset manager for Hugging Face datasets. |
AciBenchDatasetManager
Bases: FileBasedDatasetManager
A dataset manager for the ACI Bench dataset.
Methods:
Name | Description |
---|---|
__init__ |
Initializes a new AciBenchDatasetManager. |
can_handle |
Checks if the DatasetManager can handle the given dataset. |
create |
Creates a new dataset manager for the specified dataset. |
is_retrieved |
Checks if the dataset at the specific version is already downloaded. |
load |
Loads the dataset as a HuggingFace dataset. |
remove |
Deletes the dataset at the specific version from disk. |
retrieve |
Downloads and preprocesses a dataset. |
unload |
Unloads the dataset from memory. |
Attributes:
Name | Type | Description |
---|---|---|
dataset_path |
Path
|
The top-level directory for storing this dataset. |
main_data_path |
Path
|
The path for storing the preprocessed dataset files for a specific version. |
record |
DatasetRecord
|
Returns a record identifying the dataset. |
version_path |
Path
|
The directory for storing a specific version of this dataset. |
Source code in evalsense/datasets/managers/aci_bench.py
dataset_path
property
The top-level directory for storing this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset directory. |
main_data_path
property
The path for storing the preprocessed dataset files for a specific version.
Returns:
Type | Description |
---|---|
Path
|
The main dataset directory. |
record
property
Returns a record identifying the dataset.
Returns:
Type | Description |
---|---|
DatasetRecord
|
The dataset record. |
version_path
property
The directory for storing a specific version of this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset version directory. |
__init__
__init__(
version: str | None = _DEFAULT_VERSION,
splits: list[str] | None = None,
data_dir: str | None = None,
**kwargs,
)
Initializes a new AciBenchDatasetManager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str
|
The dataset version to retrieve. |
_DEFAULT_VERSION
|
splits
|
list[str]
|
The dataset splits to retrieve. |
None
|
data_dir
|
str
|
The top-level directory for storing all datasets. Defaults to "datasets" in the user cache directory. |
None
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Source code in evalsense/datasets/managers/aci_bench.py
can_handle
classmethod
Checks if the DatasetManager can handle the given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the manager can handle the dataset, False otherwise. |
Source code in evalsense/datasets/managers/aci_bench.py
create
classmethod
create(
name: str,
splits: list[str],
version: str | None = None,
data_dir: str | None = None,
**kwargs: dict,
) -> DatasetManager
Creates a new dataset manager for the specified dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
splits
|
list[str]
|
The dataset splits to retrieve. |
required |
version
|
str | None
|
The dataset version to retrieve. |
None
|
data_dir
|
str | None
|
The top-level directory for storing all datasets. |
None
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
DatasetManager
|
The created dataset manager. |
Source code in evalsense/datasets/dataset_manager.py
is_retrieved
Checks if the dataset at the specific version is already downloaded.
Returns:
Type | Description |
---|---|
bool
|
True if the dataset exists locally, False otherwise. |
load
load(
*,
retrieve: bool = True,
cache: bool = True,
force_retrieve: bool = False,
load_as_dict: bool = False,
) -> Dataset | DatasetDict
Loads the dataset as a HuggingFace dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieve
|
bool
|
Whether to retrieve the dataset if it does not exist locally. Defaults to True. |
True
|
cache
|
bool
|
Whether to cache the dataset in memory. Defaults to True. |
True
|
force_retrieve
|
bool
|
Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the |
False
|
load_as_dict
|
bool
|
Whether to load the dataset with multiple splits as a DatasetDict. If False (the default), the selected dataset splits are concatenated into a single dataset. |
False
|
Returns:
Type | Description |
---|---|
Dataset | DatasetDict
|
The loaded dataset. |
Source code in evalsense/datasets/dataset_manager.py
remove
retrieve
Downloads and preprocesses a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Source code in evalsense/datasets/dataset_manager.py
HuggingFaceDatasetManager
Bases: DatasetManager
A dataset manager for Hugging Face datasets.
Methods:
Name | Description |
---|---|
__init__ |
Initializes a new HuggingFaceDatasetManager. |
can_handle |
Checks if the DatasetManager can handle the given dataset. |
create |
Creates a new dataset manager for the specified dataset. |
is_retrieved |
Checks if the dataset at the specific version is already downloaded. |
load |
Loads the dataset as a HuggingFace dataset. |
remove |
Deletes the dataset at the specific version from disk. |
retrieve |
Downloads and preprocesses a dataset. |
unload |
Unloads the dataset from memory. |
Attributes:
Name | Type | Description |
---|---|---|
dataset_path |
Path
|
The top-level directory for storing this dataset. |
main_data_path |
Path
|
The path for storing the preprocessed dataset files for a specific version. |
record |
DatasetRecord
|
Returns a record identifying the dataset. |
version_path |
Path
|
The directory for storing a specific version of this dataset. |
Source code in evalsense/datasets/managers/huggingface.py
dataset_path
property
The top-level directory for storing this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset directory. |
main_data_path
property
The path for storing the preprocessed dataset files for a specific version.
Returns:
Type | Description |
---|---|
Path
|
The main dataset directory. |
record
property
Returns a record identifying the dataset.
Returns:
Type | Description |
---|---|
DatasetRecord
|
The dataset record. |
version_path
property
The directory for storing a specific version of this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset version directory. |
__init__
__init__(
name: str,
version: str = "main",
splits: list[str] | None = None,
data_dir: str | None = None,
**kwargs,
)
Initializes a new HuggingFaceDatasetManager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
version
|
str
|
The dataset version to retrieve. |
'main'
|
splits
|
list[str]
|
The dataset splits to retrieve. |
None
|
data_dir
|
str
|
The top-level directory for storing all datasets. Defaults to "datasets" in the user cache directory. |
None
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Source code in evalsense/datasets/managers/huggingface.py
can_handle
classmethod
Checks if the DatasetManager can handle the given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the manager can handle the dataset, False otherwise. |
Source code in evalsense/datasets/managers/huggingface.py
create
classmethod
create(
name: str,
splits: list[str],
version: str | None = None,
data_dir: str | None = None,
**kwargs: dict,
) -> DatasetManager
Creates a new dataset manager for the specified dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
splits
|
list[str]
|
The dataset splits to retrieve. |
required |
version
|
str | None
|
The dataset version to retrieve. |
None
|
data_dir
|
str | None
|
The top-level directory for storing all datasets. |
None
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
DatasetManager
|
The created dataset manager. |
Source code in evalsense/datasets/dataset_manager.py
is_retrieved
Checks if the dataset at the specific version is already downloaded.
Returns:
Type | Description |
---|---|
bool
|
True if the dataset exists locally, False otherwise. |
load
load(
*,
retrieve: bool = True,
cache: bool = True,
force_retrieve: bool = False,
load_as_dict: bool = False,
) -> Dataset | DatasetDict
Loads the dataset as a HuggingFace dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieve
|
bool
|
Whether to retrieve the dataset if it does not exist locally. Defaults to True. |
True
|
cache
|
bool
|
Whether to cache the dataset in memory. Defaults to True. |
True
|
force_retrieve
|
bool
|
Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the |
False
|
load_as_dict
|
bool
|
Whether to load the dataset with multiple splits as a DatasetDict. If False (the default), the selected dataset splits are concatenated into a single dataset. |
False
|
Returns:
Type | Description |
---|---|
Dataset | DatasetDict
|
The loaded dataset. |
Source code in evalsense/datasets/dataset_manager.py
remove
retrieve
Downloads and preprocesses a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|