Dataset Managers
Modules:
| Name | Description | 
|---|---|
aci_bench | 
            
               | 
          
huggingface | 
            
               | 
          
Classes:
| Name | Description | 
|---|---|
AciBenchDatasetManager | 
            
               A dataset manager for the ACI Bench dataset.  | 
          
HuggingFaceDatasetManager | 
            
               A dataset manager for Hugging Face datasets.  | 
          
AciBenchDatasetManager
              Bases: FileBasedDatasetManager
A dataset manager for the ACI Bench dataset.
Methods:
| Name | Description | 
|---|---|
__init__ | 
              
                 Initializes a new AciBenchDatasetManager.  | 
            
can_handle | 
              
                 Checks if the DatasetManager can handle the given dataset.  | 
            
create | 
              
                 Creates a new dataset manager for the specified dataset.  | 
            
is_retrieved | 
              
                 Checks if the dataset at the specific version is already downloaded.  | 
            
load | 
              
                 Loads the dataset as a HuggingFace dataset.  | 
            
remove | 
              
                 Deletes the dataset at the specific version from disk.  | 
            
retrieve | 
              
                 Downloads and preprocesses a dataset.  | 
            
unload | 
              
                 Unloads the dataset from memory.  | 
            
Attributes:
| Name | Type | Description | 
|---|---|---|
dataset_path | 
            
                  Path
             | 
            
               The top-level directory for storing this dataset.  | 
          
main_data_path | 
            
                  Path
             | 
            
               The path for storing the preprocessed dataset files for a specific version.  | 
          
record | 
            
                  DatasetRecord
             | 
            
               Returns a record identifying the dataset.  | 
          
version_path | 
            
                  Path
             | 
            
               The directory for storing a specific version of this dataset.  | 
          
Source code in evalsense/datasets/managers/aci_bench.py
                
            dataset_path
  
      property
  
    The top-level directory for storing this dataset.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The dataset directory.  | 
          
            main_data_path
  
      property
  
    The path for storing the preprocessed dataset files for a specific version.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The main dataset directory.  | 
          
            record
  
      property
  
    Returns a record identifying the dataset.
Returns:
| Type | Description | 
|---|---|
                  DatasetRecord
             | 
            
               The dataset record.  | 
          
            version_path
  
      property
  
    The directory for storing a specific version of this dataset.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The dataset version directory.  | 
          
__init__
__init__(
    version: str | None = _DEFAULT_VERSION,
    splits: list[str] | None = None,
    data_dir: str | None = None,
    **kwargs,
)
Initializes a new AciBenchDatasetManager.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                version
             | 
            
                  str
             | 
            
               The dataset version to retrieve.  | 
            
                  _DEFAULT_VERSION
             | 
          
                splits
             | 
            
                  list[str]
             | 
            
               The dataset splits to retrieve.  | 
            
                  None
             | 
          
                data_dir
             | 
            
                  str
             | 
            
               The top-level directory for storing all datasets. Defaults to "datasets" in the user cache directory.  | 
            
                  None
             | 
          
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             | 
          
Source code in evalsense/datasets/managers/aci_bench.py
              
            can_handle
  
      classmethod
  
    Checks if the DatasetManager can handle the given dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  str
             | 
            
               The name of the dataset.  | 
            required | 
Returns:
| Type | Description | 
|---|---|
                  bool
             | 
            
               True if the manager can handle the dataset, False otherwise.  | 
          
Source code in evalsense/datasets/managers/aci_bench.py
              
            create
  
      classmethod
  
create(
    name: str,
    splits: list[str],
    version: str | None = None,
    data_dir: str | None = None,
    **kwargs: dict,
) -> DatasetManager
Creates a new dataset manager for the specified dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  str
             | 
            
               The name of the dataset.  | 
            required | 
                splits
             | 
            
                  list[str]
             | 
            
               The dataset splits to retrieve.  | 
            required | 
                version
             | 
            
                  str | None
             | 
            
               The dataset version to retrieve.  | 
            
                  None
             | 
          
                data_dir
             | 
            
                  str | None
             | 
            
               The top-level directory for storing all datasets.  | 
            
                  None
             | 
          
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             | 
          
Returns:
| Type | Description | 
|---|---|
                  DatasetManager
             | 
            
               The created dataset manager.  | 
          
Source code in evalsense/datasets/dataset_manager.py
              is_retrieved
Checks if the dataset at the specific version is already downloaded.
Returns:
| Type | Description | 
|---|---|
                  bool
             | 
            
               True if the dataset exists locally, False otherwise.  | 
          
load
load(
    *,
    retrieve: bool = True,
    cache: bool = True,
    force_retrieve: bool = False,
    load_as_dict: bool = False,
) -> Dataset | DatasetDict
Loads the dataset as a HuggingFace dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                retrieve
             | 
            
                  bool
             | 
            
               Whether to retrieve the dataset if it does not exist locally. Defaults to True.  | 
            
                  True
             | 
          
                cache
             | 
            
                  bool
             | 
            
               Whether to cache the dataset in memory. Defaults to True.  | 
            
                  True
             | 
          
                force_retrieve
             | 
            
                  bool
             | 
            
               Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the   | 
            
                  False
             | 
          
                load_as_dict
             | 
            
                  bool
             | 
            
               Whether to load the dataset with multiple splits as a DatasetDict. If False (the default), the selected dataset splits are concatenated into a single dataset.  | 
            
                  False
             | 
          
Returns:
| Type | Description | 
|---|---|
                  Dataset | DatasetDict
             | 
            
               The loaded dataset.  | 
          
Source code in evalsense/datasets/dataset_manager.py
              remove
retrieve
Downloads and preprocesses a dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             | 
          
Source code in evalsense/datasets/dataset_manager.py
              HuggingFaceDatasetManager
              Bases: DatasetManager
A dataset manager for Hugging Face datasets.
Methods:
| Name | Description | 
|---|---|
__init__ | 
              
                 Initializes a new HuggingFaceDatasetManager.  | 
            
can_handle | 
              
                 Checks if the DatasetManager can handle the given dataset.  | 
            
create | 
              
                 Creates a new dataset manager for the specified dataset.  | 
            
is_retrieved | 
              
                 Checks if the dataset at the specific version is already downloaded.  | 
            
load | 
              
                 Loads the dataset as a HuggingFace dataset.  | 
            
remove | 
              
                 Deletes the dataset at the specific version from disk.  | 
            
retrieve | 
              
                 Downloads and preprocesses a dataset.  | 
            
unload | 
              
                 Unloads the dataset from memory.  | 
            
Attributes:
| Name | Type | Description | 
|---|---|---|
dataset_path | 
            
                  Path
             | 
            
               The top-level directory for storing this dataset.  | 
          
main_data_path | 
            
                  Path
             | 
            
               The path for storing the preprocessed dataset files for a specific version.  | 
          
record | 
            
                  DatasetRecord
             | 
            
               Returns a record identifying the dataset.  | 
          
version_path | 
            
                  Path
             | 
            
               The directory for storing a specific version of this dataset.  | 
          
Source code in evalsense/datasets/managers/huggingface.py
                
            dataset_path
  
      property
  
    The top-level directory for storing this dataset.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The dataset directory.  | 
          
            main_data_path
  
      property
  
    The path for storing the preprocessed dataset files for a specific version.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The main dataset directory.  | 
          
            record
  
      property
  
    Returns a record identifying the dataset.
Returns:
| Type | Description | 
|---|---|
                  DatasetRecord
             | 
            
               The dataset record.  | 
          
            version_path
  
      property
  
    The directory for storing a specific version of this dataset.
Returns:
| Type | Description | 
|---|---|
                  Path
             | 
            
               The dataset version directory.  | 
          
__init__
__init__(
    name: str,
    version: str = "main",
    splits: list[str] | None = None,
    data_dir: str | None = None,
    **kwargs,
)
Initializes a new HuggingFaceDatasetManager.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  str
             | 
            
               The name of the dataset.  | 
            required | 
                version
             | 
            
                  str
             | 
            
               The dataset version to retrieve.  | 
            
                  'main'
             | 
          
                splits
             | 
            
                  list[str]
             | 
            
               The dataset splits to retrieve.  | 
            
                  None
             | 
          
                data_dir
             | 
            
                  str
             | 
            
               The top-level directory for storing all datasets. Defaults to "datasets" in the user cache directory.  | 
            
                  None
             | 
          
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             | 
          
Source code in evalsense/datasets/managers/huggingface.py
              
            can_handle
  
      classmethod
  
    Checks if the DatasetManager can handle the given dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  str
             | 
            
               The name of the dataset.  | 
            required | 
Returns:
| Type | Description | 
|---|---|
                  bool
             | 
            
               True if the manager can handle the dataset, False otherwise.  | 
          
Source code in evalsense/datasets/managers/huggingface.py
              
            create
  
      classmethod
  
create(
    name: str,
    splits: list[str],
    version: str | None = None,
    data_dir: str | None = None,
    **kwargs: dict,
) -> DatasetManager
Creates a new dataset manager for the specified dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                name
             | 
            
                  str
             | 
            
               The name of the dataset.  | 
            required | 
                splits
             | 
            
                  list[str]
             | 
            
               The dataset splits to retrieve.  | 
            required | 
                version
             | 
            
                  str | None
             | 
            
               The dataset version to retrieve.  | 
            
                  None
             | 
          
                data_dir
             | 
            
                  str | None
             | 
            
               The top-level directory for storing all datasets.  | 
            
                  None
             | 
          
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             | 
          
Returns:
| Type | Description | 
|---|---|
                  DatasetManager
             | 
            
               The created dataset manager.  | 
          
Source code in evalsense/datasets/dataset_manager.py
              is_retrieved
Checks if the dataset at the specific version is already downloaded.
Returns:
| Type | Description | 
|---|---|
                  bool
             | 
            
               True if the dataset exists locally, False otherwise.  | 
          
load
load(
    *,
    retrieve: bool = True,
    cache: bool = True,
    force_retrieve: bool = False,
    load_as_dict: bool = False,
) -> Dataset | DatasetDict
Loads the dataset as a HuggingFace dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                retrieve
             | 
            
                  bool
             | 
            
               Whether to retrieve the dataset if it does not exist locally. Defaults to True.  | 
            
                  True
             | 
          
                cache
             | 
            
                  bool
             | 
            
               Whether to cache the dataset in memory. Defaults to True.  | 
            
                  True
             | 
          
                force_retrieve
             | 
            
                  bool
             | 
            
               Whether to force retrieving and
reloading the dataset even if it is already cached. Overrides
the   | 
            
                  False
             | 
          
                load_as_dict
             | 
            
                  bool
             | 
            
               Whether to load the dataset with multiple splits as a DatasetDict. If False (the default), the selected dataset splits are concatenated into a single dataset.  | 
            
                  False
             | 
          
Returns:
| Type | Description | 
|---|---|
                  Dataset | DatasetDict
             | 
            
               The loaded dataset.  | 
          
Source code in evalsense/datasets/dataset_manager.py
              remove
retrieve
Downloads and preprocesses a dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                **kwargs
             | 
            
                  dict
             | 
            
               Additional keyword arguments.  | 
            
                  {}
             |