interpret_output_instructions
This module provides functions to process worksheet data based on provided configurations. The functions include filtering data, adding subtotals, handling datetime columns, ordering columns, renaming columns, rounding data, and processing worksheets and regions.
Functions:
Name | Description |
---|---|
filter_data |
Filter the worksheet data based on the provided filters. |
add_subtotals |
|
) -> pd.DataFrame |
|
handle_datetime_columns |
|
) -> Tuple[pd.DataFrame, Dict[str, str]] |
Handle the datetime columns in the worksheet data. |
order_columns |
Order the columns in the worksheet data based on the provided column order. |
rename_columns |
|
) -> pd.DataFrame |
Rename the columns in the worksheet data based on the provided column mapping. |
round_data |
|
process_worksheet |
Process the worksheet data based on the provided configuration. |
process_region |
|
) -> Dict[str, pd.DataFrame] |
Process the output instructions for the specified region. |
interpret_output_instructions |
|
) -> Dict[str, Dict[str, pd.DataFrame]] |
Interpret the output instructions for each region. |
filter_data(worksheet_data, worksheet_filters)
Filter the worksheet data based on the provided filters. The function will filter the data based on the columns and values provided in the worksheet_filters dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
worksheet_data
|
DataFrame
|
The data to filter |
required |
worksheet_filters
|
dict
|
The filters to apply to the data |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The filtered data |
Examples:
>>> worksheet_data = pd.DataFrame({
... "col1": ["A", "B", "C", "D", "E"],
... "col2": [1, 2, 3, 4, 5],
... "col3": [10, 20, 30, 40, 50]
... })
>>> worksheet_filters = {"col1": ["A", "B", "C"]}
>>> filter_data(worksheet_data, worksheet_filters)
col1 col2 col3
0 A 1 10
1 B 2 20
2 C 3 30
Source code in devices_rap/interpret_output_instructions.py
add_subtotals(worksheet_data, subtotal_columns, sort_columns=None)
Adds subtotal rows to a pivoted DataFrame based on specified columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
worksheet_data
|
DataFrame
|
The pivoted DataFrame to which subtotals will be added. |
required |
subtotal_columns
|
List[str]
|
List of columns to group by and add subtotals for. |
required |
sort_columns
|
Optional[List[str]]
|
List of columns to sort the DataFrame by after adding subtotals, by default None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
The DataFrame with added subtotal rows. |
Source code in devices_rap/interpret_output_instructions.py
handle_datetime_columns(worksheet_data, worksheet_columns)
Handle the datetime columns in the worksheet data. The function will check if the datetime columns are present in the column_order list and if so, replace the "datetime_columns" element with the actual datetime columns. The function will then convert the datetime columns to the specified datetime format.
Source code in devices_rap/interpret_output_instructions.py
order_columns(worksheet_data, worksheet_columns)
Order the columns in the worksheet data based on the provided column order. The function will reindex the columns in the worksheet data based on the order provided in the worksheet_columns dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
worksheet_data
|
DataFrame
|
The dataset to arrange the columns in the order specified in the worksheet_columns dictionary |
required |
worksheet_columns
|
dict
|
The columns to order the dataset by |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The dataset with the columns ordered as specified in the worksheet_columns dictionary |
Raises:
Type | Description |
---|---|
ColumnsNotFoundError
|
If the columns specified in the worksheet_columns dictionary are not found in the dataset |
Source code in devices_rap/interpret_output_instructions.py
rename_columns(worksheet_data, worksheet_columns)
Rename the columns in the worksheet data based on the provided column mapping. Acts as a wrapper around the DataFrame.rename method but with error handling to raise a ColumnsNotFoundError if the columns specified in the worksheet_columns dictionary are not found in the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
worksheet_data
|
DataFrame
|
The dataset to rename the columns in |
required |
worksheet_columns
|
dict
|
The columns to rename with the new column names |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The dataset with the columns renamed as specified in the worksheet_columns dictionary |
Raises:
Type | Description |
---|---|
ColumnsNotFoundError
|
If the columns specified in the worksheet_columns dictionary are not found in the dataset |
Source code in devices_rap/interpret_output_instructions.py
round_data(worksheet_data, decimals)
Wrapper around the DataFrame.round method to round the data in the worksheet to the specified number of decimal places.
Source code in devices_rap/interpret_output_instructions.py
process_worksheet(worksheet_config, datasets)
Process the worksheet data based on the provided configuration. The function will filter the data based on the provided filters, handle the datetime columns, order the columns, rename the columns, and round the data to two decimal places.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
worksheet_config
|
dict
|
The configuration for the worksheet including the type of dataset to use, the filters to apply, and the columns to process |
required |
datasets
|
dict
|
The datasets to use for processing the worksheet. |
required |
Source code in devices_rap/interpret_output_instructions.py
process_region(region, datasets, instructions)
Process the output instructions for the specified region. The function will process each worksheet in the instructions and return a dictionary of the processed worksheets with the worksheet name as the key and the processed data as the value ready for writing to an Excel file.
Source code in devices_rap/interpret_output_instructions.py
interpret_output_instructions(pipeline_config, region_cuts)
Interpret the output instructions for each region. The function will process the output instructions for each region and return a dictionary of the processed worksheets with the region name as the key and the processed data dictionary as the value ready for writing to an Excel file.