Dataset Managers
Modules:
Name | Description |
---|---|
aci_bench |
|
Classes:
Name | Description |
---|---|
AciBenchDatasetManager |
A dataset manager for the ACI Bench dataset. |
AciBenchDatasetManager
Bases: DatasetManager
A dataset manager for the ACI Bench dataset.
Methods:
Name | Description |
---|---|
__init__ |
Initializes a new AciBenchDatasetManager. |
can_handle |
Checks if the DatasetManager can handle the given dataset. |
get |
Downloads and preprocesses a dataset. |
is_retrieved |
Checks if the dataset at the specific version is already downloaded. |
load |
Loads the dataset as a HuggingFace dataset. |
load_dict |
Loads the dataset as a HuggingFace dataset dictionary. |
remove |
Deletes the dataset at the specific version from disk. |
unload |
Unloads the dataset from memory. |
unload_dict |
Unloads the dataset dictionary from memory. |
Attributes:
Name | Type | Description |
---|---|---|
dataset_path |
Path
|
The top-level directory for storing this dataset. |
main_data_path |
Path
|
The path for storing the preprocessed dataset files for a specific version. |
record |
DatasetRecord
|
Returns a record identifying the dataset. |
version_path |
Path
|
The directory for storing a specific version of this dataset. |
Source code in evalsense/datasets/managers/aci_bench.py
dataset_path
property
The top-level directory for storing this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset directory. |
main_data_path
property
The path for storing the preprocessed dataset files for a specific version.
Returns:
Type | Description |
---|---|
Path
|
The main dataset directory. |
record
property
Returns a record identifying the dataset.
Returns:
Type | Description |
---|---|
DatasetRecord
|
The dataset record. |
version_path
property
The directory for storing a specific version of this dataset.
Returns:
Type | Description |
---|---|
Path
|
The dataset version directory. |
__init__
__init__(
version: str = "5d3cd4d8a25b4ebb5b2b87c3923a7b2b7150e33d",
splits: list[str] | None = None,
data_dir: str | None = None,
**kwargs,
)
Initializes a new AciBenchDatasetManager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str
|
The dataset version to retrieve. |
'5d3cd4d8a25b4ebb5b2b87c3923a7b2b7150e33d'
|
splits
|
list[str]
|
The dataset splits to retrieve. |
None
|
data_dir
|
str
|
The top-level directory for storing all datasets. Defaults to "datasets" in the user cache directory. |
None
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Source code in evalsense/datasets/managers/aci_bench.py
can_handle
classmethod
Checks if the DatasetManager can handle the given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the manager can handle the dataset, False otherwise. |
Source code in evalsense/datasets/managers/aci_bench.py
get
Downloads and preprocesses a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Source code in evalsense/datasets/dataset_manager.py
is_retrieved
Checks if the dataset at the specific version is already downloaded.
Returns:
Type | Description |
---|---|
bool
|
True if the dataset exists locally, False otherwise. |
load
Loads the dataset as a HuggingFace dataset.
If multiple splits are specified, they are concatenated into a single
dataset. See the load_dict
method if you wish to load the dataset as a
DatasetDict
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieve
|
bool
|
Whether to retrieve the dataset if it does not exist locally. Defaults to True. |
True
|
cache
|
bool
|
Whether to cache the dataset in memory. Defaults to True. |
True
|
force_retrieve
|
bool
|
Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the |
False
|
Returns:
Type | Description |
---|---|
Dataset
|
The loaded dataset. |
Source code in evalsense/datasets/dataset_manager.py
load_dict
load_dict(
retrieve: bool = True,
cache: bool = True,
force_retrieve: bool = False,
) -> DatasetDict
Loads the dataset as a HuggingFace dataset dictionary.
See the load
method if you wish to concatenate the splits into
a single dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
retrieve
|
bool
|
Whether to retrieve the dataset if it does not exist locally. Defaults to True. |
True
|
cache
|
bool
|
Whether to cache the dataset in memory. Defaults to True. |
True
|
force_retrieve
|
bool
|
Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the |
False
|
Returns:
Type | Description |
---|---|
DatasetDict
|
The loaded dataset dictionary. |