categorical
OHECategoricalTransformer
Bases: ColumnTransformer
A transformer to one-hot encode categorical features via sklearn's OneHotEncoder.
Essentially wraps the fit_transformer and inverse_transform methods of OneHotEncoder to comply with the ColumnTransformer interface.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drop
|
Optional[Union[list, str]]
|
str or list of str, to pass to |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
missing_value |
Any
|
The value used to fill missing values in the data. |
After applying the transformer, the following attributes will be populated:
Attributes:
| Name | Type | Description |
|---|---|---|
original_column_name |
The name of the original column. |
|
new_column_names |
The names of the columns generated by the transformer. |
Source code in src/nhssynth/modules/dataloader/transformers/categorical.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
apply(data, constraint_adherence, missing_value=None)
Applies a transformation to the input data using scikit-learn's OneHotEncoder's fit_transform method.
This method transforms the input data (data) into one-hot encoded format. If a missing_value is provided, missing values are replaced
with the specified value before the transformation is applied. The transformation is further filtered based on the provided
constraint_adherence Series, which determines which rows are included in the transformation process (only rows where the value in
constraint_adherence is 1 are retained).
The resulting transformed data includes the one-hot encoded columns for the original data, with the constraint adherence values
appended as an additional column. If a column labeled 0 is created (which may happen in certain transformations), it is dropped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Series
|
The input column of data to be transformed. The data is expected to be a single column to which the transformation will be applied. |
required |
constraint_adherence
|
Optional[Series]
|
A Series indicating whether each row should be included in the transformation.
Rows corresponding to |
required |
missing_value
|
Optional[Any]
|
The value used to replace missing values ( |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the transformed data. The DataFrame consists of:
- One-hot encoded columns based on the original data, with each column corresponding to a unique category.
- The |
Notes
- If
missing_valueis provided, missing values in thedatacolumn are replaced with the specified value before the transformation. - Only rows where
constraint_adherence == 1will be included in the transformed data. - Any columns with the header
0(which can occur in some transformations) will be dropped. - The method ensures that the transformed data maintains the original index (
semi_index) and properly aligns with the input data.