Miscellaneous
Generic functions for manipulating Python files and objects.
contains_both_ltc_types(ltc_set)
Helper util function for physical-mental multimorbidity detection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ltc_set |
set
|
Set containing LTC codes. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if both physical and mental LTC types are present, False otherwise. |
Source code in src/utils/functions.py
get_demographics_summary(ed_pts)
Summarises sensitive attributes and outcome prevalence. Args: demographics (pl.DataFrame): Demographics data.
Returns:
Type | Description |
---|---|
None
|
pl.DataFrame: Summary table. |
Source code in src/utils/functions.py
get_final_episodes(stays)
Extracts the final ED episode with hospitalisation for creating a unique patient cohort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stays |
DataFrame
|
Stays data. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Patient-level data. |
Source code in src/utils/functions.py
get_n_unique_values(table, use_col='subject_id')
Compute number of unique values in particular column in table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
DataFrame | LazyFrame
|
Table. |
required |
use_col |
str
|
Column to use. Defaults to "subject_id". |
'subject_id'
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Number of unique values. |
Source code in src/utils/functions.py
get_train_split_summary(train, val, test, outcome='in_hosp_death', output_path='../outputs/exp_data', cont_cols=None, nn_cols=None, disp_dict=None, cat_cols=None, verbose=True)
Helper function to print statistical train-validation-test split summary.
Source code in src/utils/functions.py
impute_from_df(impute_to, impute_from, use_col=None, key_col=None)
Imputes values from one dataframe to another.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
impute_to |
DataFrame | LazyFrame
|
Table to impute values in to. |
required |
impute_from |
DataFrame
|
Table to impute values from. |
required |
use_col |
str
|
Column to containing values to impute. Defaults to None. |
None
|
key_col |
str
|
Column to use to identify matching rows. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame | LazyFrame
|
pl.DataFrame | pl.LazyFrame: description |
Source code in src/utils/functions.py
load_pickle(filepath)
Load a pickled object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to pickle (.pkl) file. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Loaded object. |
preview_data(filepath)
Prints a single example from data dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to .pkl file containing data dictionary. |
required |
Source code in src/utils/functions.py
read_from_txt(filepath, as_type='str')
Read from line-seperated txt file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to text file. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
List containing data. |
Source code in src/utils/functions.py
read_icd_mapping(map_path)
Reads ICD-9 to ICD-10 mapping file for chronic conditions.
Source code in src/utils/functions.py
rename_fields(col)
save_pickle(target, filepath, fname='mm_feat.pkl')
Save a pickled object from a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to pickle (.pkl) file. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Loaded object. |
Source code in src/utils/functions.py
scale_numeric_features(table, numeric_cols=None, over=None)
Applies min/max scaling to numeric columns and rounds to 1 d.p.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
DataFrame
|
Table. |
required |
numeric_cols |
list
|
List of columns to apply to. Defaults to None. |
None
|
over |
str
|
Column to group by before computing min/max. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated table. |