Amber Report Pipeline Process Maps
This page provides a visual overview of the Amber Report Pipeline, detailing the end-to-end data processing flow from raw data ingestion to final report generation.
Main Pipeline Flow Summary
The Amber Report Pipeline goes through several stages to process the data to produce the regional reports. These stages are:
- Configuration: Load the pipeline configuration and dataset paths.
- Data Loading: Load datasets from CSV files using information from the configuration.
- Data Cleaning: Normalise the column names and clean the datasets
- Data Joining: Join datasets to create a master devices table.
- Post-Processing: Cleanse the master devices table, resolving inconsistencies and missing values.
- Pivoting: Create summary and detailed pivot tables.
- Joining Additional Information: Join on key columns from the lookup table to add back in data lost during the pivoting.
- Preparing Outputs: Cut the data into regions and prepare it based on the output instructions.
- Output: Generate the final output in the configured formats (Excel, Excel Zip, Pickle).
These steps are visualised in the following diagram:
---
config:
layout: elk
elk:
mergeEdges: true
nodePlacementStrategy: LINEAR_SEGMENTS
flowchart:
curve: linear
---
flowchart TD
subgraph amber_report_pipeline["Amber Report Pipeline"]
config[["Pipeline Config"]]
load_devices_datasets[["Load Datasets"]]
clean_datasets[["Clean Datasets"]]
loaded_datasets["loaded_datasets"]
join_datasets[["Join Datasets"]]
master_devices_table["Master Devices Table"]
cleanse_master_joined_dataset[["Post-Process Master Devices Table"]]
pivot_datasets[["Pivot Datasets into Summary and Detailed Tables"]]
join_mini_tables[["Join Additional Information"]]
dropna["Drop Missing Regions"]
uncut_datasets["uncut_datasets"]
prepare_outputs[["Cut into regions and prepare based on output instructions"]]
output_data[["Output Data in Configured Way (Excel, Excel Zip, Pickle)"]]
end
subgraph uncut_datasets["Uncut All Region Data"]
summary_data["Device Category Summary Pivot"]
detailed_data["Device Detailed Pivot"]
master_devices_table_drop_na["Master Devices"]
end
subgraph output_storage["Outputs"]
excel_files["Regional Excel Data Packs"]
zip_excel_files["ZIP File of all Region Excel Data Packs"]
pickle["Pickle file containing collection of formated output tables"]
end
subgraph loaded_datasets["Loaded Datasets"]
master_devices["Master Devices"]
provider_codes_lookup["Provider Codes Lookup"]
device_taxonomy["Device Taxonomy"]
exceptions["Exceptions"]
end
subgraph local_storage["Locally Stored Data and Config"]
csvs["Datasets CSV Files"]
amber_report_excel_config["amber_report_excel_config.yaml"]
end
loaded_datasets --> join_datasets
join_datasets --> master_devices_table
config --> load_devices_datasets
load_devices_datasets --> clean_datasets
clean_datasets --> loaded_datasets
master_devices_table --> cleanse_master_joined_dataset & dropna
cleanse_master_joined_dataset --> join_mini_tables
join_mini_tables --> pivot_datasets
prepare_outputs --> output_data
output_data --> output_storage
amber_report_excel_config --> config
csvs --> load_devices_datasets
dropna --> master_devices_table_drop_na
pivot_datasets --> detailed_data & summary_data
summary_data --> prepare_outputs
detailed_data --> prepare_outputs
master_devices_table_drop_na --> prepare_outputs
master_devices_table@{ shape: internal-storage}
summary_data@{ shape: internal-storage}
detailed_data@{ shape: internal-storage}
master_devices_table_drop_na@{ shape: internal-storage}
excel_files@{ shape: docs}
zip_excel_files@{ shape: stored-data}
pickle@{ shape: stored-data}
master_devices@{ shape: internal-storage}
provider_codes_lookup@{ shape: internal-storage}
device_taxonomy@{ shape: internal-storage}
exceptions@{ shape: internal-storage}
csvs@{ shape: docs}
amber_report_excel_config@{ shape: stored-data}
Detailed Pipeline Steps
This diagram provides a more detailed view of the Amber Report Pipeline, showing each step in the process and how they interact with each other:
---
config:
layout: elk
elk:
mergeEdges: true
nodePlacementStrategy: LINEAR_SEGMENTS
flowchart:
curve: linear
---
flowchart TD
subgraph uncut_datasets["Uncut All Region Data"]
summary_data["Device Category Summary Pivot"]
detailed_data["Device Detailed Pivot"]
master_devices_table_drop_na["Master Devices"]
end
subgraph output_data["Output the Data"]
output_formats{"What output formats have been Configured?"}
create_excel_reports[["Create Excel Reports"]]
excel_zip{"Should the excel outputs be packaged in an ZIP File?"}
create_excel_zip_reports[["Package Excel Reports in ZIP File"]]
create_pickle[["Create Pickle file"]]
end
subgraph amber_report_pipeline["Amber Report Pipeline"]
config["config"]
load_devices_datasets[["Load Datasets"]]
clean_datasets["clean_datasets"]
loaded_datasets["loaded_datasets"]
join_datasets["join_datasets"]
master_devices_table["Master Devices Table"]
cleanse_master_joined_dataset[["Post-Process Master Devices Table"]]
pivot_datasets["pivot_datasets"]
join_mini_tables["join_mini_tables"]
dropna["Drop Missing Regions"]
uncut_datasets
prepare_outputs
output_data
end
subgraph local_storage["Locally Stored Data and Config"]
master_devices_csv["master_devices.csv"]
provider_codes_lookup_csv["provider_codes_lookup.csv"]
device_taxonomy_csv["device_taxonomy.csv"]
csv4["exceptions.csv"]
amber_report_excel_config["amber_report_excel_config.yaml"]
end
subgraph output_storage["Outputs"]
excel_files["Regional Excel Data Packs"]
zip_excel_files["ZIP File of all Region Excel Data Packs"]
pickle["Pickle file containing collection of formatted output tables"]
end
subgraph config["Config"]
define_dataset_paths["Define the Dataset Paths"]
check_paths["Check Dataset Paths Exist"]
load_amber_report_excel_config["Load the Output Format Config"]
create_output_directory["Create the Output Directory"]
dataset_config["Dataset Config"]
amber_report_output_instructions["Amber Report Output Instructions"]
end
subgraph loaded_datasets["Loaded Datasets"]
master_devices["Master Devices"]
provider_codes_lookup["Provider Codes Lookup"]
device_taxonomy["Device Taxonomy"]
exceptions["Exceptions"]
end
subgraph join_datasets["Join"]
join_provider_codes_lookup[["Join Provider Codes Lookup"]]
join_device_taxonomy[["Join Device Taxonomy"]]
join_exceptions[["Join Exceptions"]]
end
subgraph clean_datasets["Clean"]
batch_normalise_column_names[["Normalise Datasets Column Names"]]
cleanse_master_data[["Clean Master Data"]]
cleanse_device_taxonomy[["Clean Device Taxonomy"]]
cleanse_exceptions[["Clean Exceptions"]]
end
subgraph pivot_datasets["Pivot"]
create_device_category_summary_table[["Create Device Category Summary Pivot Table"]]
create_device_summary_table[["Create Device Detailed Pivot Table"]]
end
subgraph join_mini_tables["Join on additional information"]
join_mini_provider_codes_lookup[["Join Mini Provider Codes Lookup"]]
join_mini_device_taxonomy[["Join Mini Device Taxonomy"]]
join_mini_exceptions[["Join Mini Exceptions"]]
end
subgraph prepare_outputs["Prepare Datasets for Output"]
create_regional_table_cuts[["Cut data into regions"]]
interpret_output_instructions[["Process Data Based on Output Instructions"]]
end
load_devices_datasets --> batch_normalise_column_names
batch_normalise_column_names --> cleanse_master_data & cleanse_device_taxonomy & cleanse_exceptions & provider_codes_lookup
cleanse_master_data --> master_devices
cleanse_device_taxonomy --> device_taxonomy
cleanse_exceptions --> exceptions
master_devices --> join_provider_codes_lookup
provider_codes_lookup --> join_provider_codes_lookup & join_mini_provider_codes_lookup
join_provider_codes_lookup --> join_device_taxonomy
device_taxonomy --> join_device_taxonomy & join_mini_device_taxonomy
exceptions --> join_exceptions & join_mini_exceptions
join_device_taxonomy --> join_exceptions
join_exceptions --> master_devices_table
master_devices_table --> cleanse_master_joined_dataset
cleanse_master_joined_dataset --> create_device_category_summary_table & create_device_summary_table & dropna
dropna --> master_devices_table_drop_na
summary_data --> create_regional_table_cuts
detailed_data --> create_regional_table_cuts
master_devices_table_drop_na --> create_regional_table_cuts
create_regional_table_cuts --> interpret_output_instructions
output_formats -- Excel --> create_excel_reports
output_formats -- Pickle --> create_pickle
create_excel_reports --> excel_zip
excel_zip -- Yes --> create_excel_zip_reports
excel_zip -- No --> excel_files
create_pickle --> pickle
create_excel_zip_reports --> zip_excel_files & excel_files
define_dataset_paths --> check_paths
load_amber_report_excel_config --> amber_report_output_instructions
check_paths --> dataset_config
dataset_config --> load_devices_datasets
amber_report_excel_config --> load_amber_report_excel_config
csv4 --> load_devices_datasets
amber_report_output_instructions --> interpret_output_instructions
interpret_output_instructions --> output_formats
device_taxonomy_csv --> load_devices_datasets
master_devices_csv --> load_devices_datasets
provider_codes_lookup_csv --> load_devices_datasets
create_output_directory --> output_formats
join_mini_provider_codes_lookup --> join_mini_device_taxonomy
join_mini_device_taxonomy --> join_mini_exceptions
create_device_summary_table --> join_mini_provider_codes_lookup
create_device_category_summary_table --> join_mini_provider_codes_lookup
join_mini_exceptions --> detailed_data & summary_data
summary_data@{ shape: internal-storage}
detailed_data@{ shape: internal-storage}
master_devices_table_drop_na@{ shape: internal-storage}
master_devices_table@{ shape: internal-storage}
master_devices_csv@{ shape: stored-data}
provider_codes_lookup_csv@{ shape: stored-data}
device_taxonomy_csv@{ shape: stored-data}
csv4@{ shape: stored-data}
amber_report_excel_config@{ shape: stored-data}
master_devices@{ shape: internal-storage}
provider_codes_lookup@{ shape: internal-storage}
device_taxonomy@{ shape: internal-storage}
exceptions@{ shape: internal-storage}
excel_files@{ shape: docs}
zip_excel_files@{ shape: stored-data}
pickle@{ shape: stored-data}
Note
If you want to get further details on the steps of the Amber Report Pipeline, it is recommend you look at the codebase or the API Reference documentation.