Miscellaneous
Generic functions for manipulating Python files and objects.
get_n_unique_values(table, use_col='subject_id')
Compute number of unique values in particular column in table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
DataFrame | LazyFrame
|
Table. |
required |
use_col |
str
|
Column to use. Defaults to "subject_id". |
'subject_id'
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Number of unique values. |
Source code in src/utils/functions.py
impute_from_df(impute_to, impute_from, use_col=None, key_col=None)
Imputes values from one dataframe to another.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
impute_to |
DataFrame | LazyFrame
|
Table to impute values in to. |
required |
impute_from |
DataFrame
|
Table to impute values from. |
required |
use_col |
str
|
Column to containing values to impute. Defaults to None. |
None
|
key_col |
str
|
Column to use to identify matching rows. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame | LazyFrame
|
pl.DataFrame | pl.LazyFrame: description |
Source code in src/utils/functions.py
load_pickle(filepath)
Load a pickled object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to pickle (.pkl) file. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Loaded object. |
preview_data(filepath)
Prints a single example from data dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to .pkl file containing data dictionary. |
required |
Source code in src/utils/functions.py
read_from_txt(filepath, as_type='str')
Read from line-seperated txt file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to text file. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
List containing data. |
Source code in src/utils/functions.py
scale_numeric_features(table, numeric_cols=None, over=None)
Applies min/max scaling to numeric columns and rounds to 1 d.p.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
DataFrame
|
Table. |
required |
numeric_cols |
list
|
List of columns to apply to. Defaults to None. |
None
|
over |
str
|
Column to group by before computing min/max. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated table. |