joins
Functions for processing the datasets for the pipeline.
join_provider_codes_lookup(master_devices, provider_codes_lookup)
This function is a wrapper around the join_datasets function. It joins on the 'der_provider_code' column in the master_devices dataset and the 'org_code' column in the provider_codes_lookup dataset. The join type is a 'many_to_one' left join.
join_datasets by defaults checks the merge health to ensure that there is not a 'left_only' or 'right_only' merge. If there are any 'left_only' or 'right_only' merges, a MergeWarning is raised. The merge column is dropped from the merged DataFrame by default.
! WARNING: This function is highly coupled to the join_datasets function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
master_devices
|
DataFrame
|
The master_devices table. Must have the "der_provider_code" column. |
required |
provider_codes_lookup
|
DataFrame
|
The provider_codes_lookup table. Must have the "org_code" column. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The master_devices table with the provider_codes_lookup table joined on. |
Source code in devices_rap/joins.py
join_device_taxonomy(master_devices, device_taxonomy)
This function is a wrapper around the join_datasets function. It joins on the 'upd_high_level_device_type' column in the master_devices dataset and the 'dev_code' column in the device_taxonomy dataset. The join type is a 'many_to_one' left join.
join_datasets by defaults checks the merge health to ensure that there is not a 'left_only' or 'right_only' merge. If there are any 'left_only' or 'right_only' merges, a MergeWarning is raised. The merge column is dropped from the merged DataFrame by default.
! WARNING: This function is highly coupled to the join_datasets function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
master_devices
|
DataFrame
|
The master_devices table. Must have the "upd_high_level_device_type" column. |
required |
device_taxonomy
|
DataFrame
|
The device_taxonomy table. Must have the "device_type" column. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The master_devices table with the device_taxonomy table joined on. |
Source code in devices_rap/joins.py
join_exceptions(master_devices, exceptions, strict_validate=False)
This function is a wrapper around the join_datasets function. It joins on the 'upd_high_level_device_type' and 'der_provider_code' columns in the master_devices dataset and the 'dev_code' and 'provider_code' column in the exceptions data. The join type is a 'many_to_many' left join (if strict_validate is false).
join_datasets by defaults checks the merge health to ensure that there is not a 'left_only' or 'right_only' merge. If there are any 'left_only' or 'right_only' merges, a MergeWarning is raised. The merge column is dropped from the merged DataFrame by default.
The strict_validate setting should be used when we are more confident in the exceptions data and want to validate a 'many-to-one' merge instead of a 'many-to-many' merge.
! WARNING: This function is highly coupled to the join_datasets function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
master_devices
|
DataFrame
|
The master_devices table. Must have the ["upd_high_level_device_type", "der_provider_code"] columns. |
required |
exceptions
|
DataFrame
|
The exceptions table. Must have the ["dev_code", "provider_code"] columns. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The master_devices table with the exceptions table joined on. |
Source code in devices_rap/joins.py
join_mini_provider_codes_lookup(master_devices, provider_codes_lookup)
This function is a wrapper around the join_provider_codes_lookup function. It reduces the provider_codes_lookup table down to only join on the required columns. The required columns are: - org_code (which is only used to join and then dropped) - current_name_in_proper_case
! WARNING: This function is highly coupled to the join_datasets function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
master_devices
|
DataFrame
|
The master_devices table. Must have the "upd_high_level_device_type" column. |
required |
provider_codes_lookup
|
DataFrame
|
The provider_codes_lookup table. Must have the "org_code" and "current_name_in_proper_case" columns. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The master_devices table with the mini_provider_codes_lookup table joined on. |
Raises:
Type | Description |
---|---|
ColumnsNotFoundError
|
If the required columns are not present in the provider_codes_lookup dataset. |
Source code in devices_rap/joins.py
join_mini_device_taxonomy(master_devices, device_taxonomy)
This function is a wrapper around the join_device_taxonomy function. It reduces the device_taxonomy table down to only join on the required columns. The required columns are: - dev_code (which is only used to join and then dropped) - description_in_title_case
! WARNING: This function is highly coupled to the join_datasets function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
master_devices
|
DataFrame
|
The master_devices table. Must have the "upd_high_level_device_type" column. |
required |
device_taxonomy
|
DataFrame
|
The device_taxonomy table. Must have the "dev_code" and "description_in_title_case" columns. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The master_devices table with the mini_device_taxonomy table joined on. |
Raises:
Type | Description |
---|---|
ColumnsNotFoundError
|
If the required columns are not present in the device_taxonomy dataset. |
Source code in devices_rap/joins.py
join_mini_exceptions(master_devices, exceptions, include_exception_notes=False)
This function is a wrapper around the join_exceptions function. It reduces the exceptions table down to only join on the required columns. The required columns are: - provider_code (which is only used to join and then dropped) - dev_code (which is only used to join and then dropped) - handover_date_zcm - handover_date_vcm - exception_notes (if include_exception_notes is True)
! WARNING: This function is highly coupled to the join_exceptions function.
Source code in devices_rap/joins.py
join_mini_tables(summary_table, provider_codes_lookup, device_taxonomy, exceptions, include_exception_notes=True)
This function is a wrapper around the join_mini_provider_codes_lookup, join_mini_device_taxonomy, and join_mini_exceptions functions. It joins the mini tables onto a summary_table