Preprocessing
Functions for preprocessing and cleaning extracted data.
add_time_elapsed_to_events(events, starttime, remove_charttime=False)
Adds column 'elapsed' which considers time elapsed since starttime.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
events |
DataFrame
|
Events table. |
required |
starttime |
Datetime
|
Reference start time. |
required |
remove_charttime |
bool
|
Whether to remove charttime column. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated events table. |
Source code in src/utils/preprocessing.py
clean_events(events)
Maps non-integer values to None and removes outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
events |
DataFrame
|
Events table. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Cleaned events table. |
Source code in src/utils/preprocessing.py
convert_events_to_timeseries(events)
Converts long-form events to wide-form time-series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
events |
DataFrame
|
Long-form events. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Wide-form time-series of shape (timestamp, features) |
Source code in src/utils/preprocessing.py
encode_categorical_features(stays)
Groups and applied one-hot encoding to categorical features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stays |
DataFrame
|
Stays data. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Transformed stays data. |
Source code in src/utils/preprocessing.py
process_text_to_embeddings(notes)
Generates dictionary containing embeddings from Bio+Discharge ClinicalBERT (mean vector). https://huggingface.co/emilyalsentzer/Bio_Discharge_Summary_BERT
Parameters:
Name | Type | Description | Default |
---|---|---|---|
notes |
DataFrame
|
Dataframe containing notes data. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
Dictionary containing hadm_id as keys and average wode embeddings as values. |
Source code in src/utils/preprocessing.py
transform_gender(data)
Maps gender values to predefined categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Data to apply to. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated data. |
Source code in src/utils/preprocessing.py
transform_insurance(data)
Maps insurance status values to predefined categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Data to apply to. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated data. |
Source code in src/utils/preprocessing.py
transform_marital(data)
Maps marital status values to predefined categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Data to apply to. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated data. |
Source code in src/utils/preprocessing.py
transform_race(data)
Maps race values to predefined categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Data to apply to. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pl.DataFrame: Updated data. |