metatransformer
MetaTransformer
The metatransformer is responsible for transforming input dataset into a format that can be used by the model
module, and for transforming
this module's output back to the original format of the input dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
DataFrame
|
The raw input DataFrame. |
required |
metadata |
Optional[MetaData]
|
Optionally, a |
None
|
missingness_strategy |
Optional[str]
|
The missingness strategy to use. Defaults to augmenting missing values in the data, see the missingness strategies for more information. |
'augment'
|
impute_value |
Optional[Any]
|
Only used when |
None
|
After calling MetaTransformer.apply()
, the following attributes and methods will be available:
Attributes:
Name | Type | Description |
---|---|---|
typed_dataset |
DataFrame
|
The dataset with the dtypes applied. |
post_missingness_strategy_dataset |
DataFrame
|
The dataset with the missingness strategies applied. |
transformed_dataset |
DataFrame
|
The transformed dataset. |
single_column_indices |
list[int]
|
The indices of the columns that were transformed into a single column. |
multi_column_indices |
list[list[int]]
|
The indices of the columns that were transformed into multiple columns. |
Methods:
get_typed_dataset()
: Returns the typed dataset.get_prepared_dataset()
: Returns the dataset with the missingness strategies applied.get_transformed_dataset()
: Returns the transformed dataset.get_multi_and_single_column_indices()
: Returns the indices of the columns that were transformed into one or multiple column(s).get_sdv_metadata()
: Returns the metadata in the correct format for SDMetrics.save_metadata()
: Saves the metadata to a file.save_constraint_graphs()
: Saves the constraint graphs to a file.
Note that mt.apply
is a helper function that runs mt.apply_dtypes
, mt.apply_missingness_strategy
and mt.transform
in sequence.
This is the recommended way to use the MetaTransformer to ensure that it is fully instantiated for use downstream.
Source code in src/nhssynth/modules/dataloader/metatransformer.py
|
|
apply()
Applies the various steps of the MetaTransformer to a passed DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed dataset. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
apply_dtypes(data)
Applies dtypes from the metadata to dataset
.
Returns:
Type | Description |
---|---|
DataFrame
|
The dataset with the dtypes applied. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
apply_missingness_strategy()
Resolves missingness in the dataset via the MetaTransformer
's global missingness strategy or
column-wise missingness strategies. In the case of the AugmentMissingnessStrategy
, the missingness
is not resolved, instead a new column / value is added for later transformation.
Returns:
Type | Description |
---|---|
DataFrame
|
The dataset with the missingness strategies applied. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
drop_columns()
Drops columns from the dataset that are not in the MetaData
.
from_dict(dataset, metadata, **kwargs)
classmethod
Instantiates a MetaTransformer from a metadata dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
DataFrame
|
The raw input DataFrame. |
required |
metadata |
dict
|
A dictionary of raw metadata. |
required |
Returns:
Type | Description |
---|---|
Self
|
A MetaTransformer object. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
from_path(dataset, metadata_path, **kwargs)
classmethod
Instantiates a MetaTransformer from a metadata file via a provided path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
DataFrame
|
The raw input DataFrame. |
required |
metadata_path |
str
|
The path to the metadata file. |
required |
Returns:
Type | Description |
---|---|
Self
|
A MetaTransformer object. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
get_multi_and_single_column_indices()
Returns the indices of the columns that were transformed into one or multiple column(s).
Returns:
Type | Description |
---|---|
tuple[list[int], list[int]]
|
A tuple containing the indices of the single and multi columns. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
get_sdv_metadata()
Calls the MetaData
method to reformat its contents into the correct format for use with SDMetrics.
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any]]
|
The metadata in the correct format for SDMetrics. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
inverse_apply(dataset)
Reverses the transformation applied by the MetaTransformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
DataFrame
|
The transformed dataset. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The original dataset. |
Source code in src/nhssynth/modules/dataloader/metatransformer.py
transform()
Prepares the dataset by applying each of the columns' transformers and recording the indices of the single and multi columns.
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed dataset. |