Result Analysers
Modules:
Name | Description |
---|---|
meta_result_analyser |
|
metric_correlation_analyser |
|
tabular_analyser |
|
Classes:
Name | Description |
---|---|
CorrelationResults |
Class to hold correlation analysis results. |
MetaResultAnalyser |
An analyser for conducing a meta-evaluation of different evaluation methods. |
MetricCorrelationAnalyser |
An analyser calculating and visualizing correlations between |
TabularResultAnalyser |
An analyser summarising evaluation results in a tabular format. |
CorrelationResults
dataclass
Class to hold correlation analysis results.
Source code in evalsense/workflow/analysers/metric_correlation_analyser.py
MetaResultAnalyser
Bases: ResultAnalyser[T]
An analyser for conducing a meta-evaluation of different evaluation methods.
The analyser computes the Spearman rank correlation between the rankings specified by the meta tiers and the scores returned by the evaluation methods. The meta tiers can either be sourced from human annotations or be based on progressive perturbations for automatic meta-evaluation.
Methods:
Name | Description |
---|---|
__call__ |
Analyses the results from perturbation-based meta-evaluation experiments. |
Source code in evalsense/workflow/analysers/meta_result_analyser.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
__call__
__call__(
project: Project,
meta_tier_field: str = "perturbation_type_tier",
lower_tier_is_better: bool = False,
metric_labels: dict[str, str] | None = None,
**kwargs: dict[str, Any],
) -> T
Analyses the results from perturbation-based meta-evaluation experiments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
Project
|
The project holding the meta-evaluation data to analyse. |
required |
meta_tier_field
|
str
|
The field name that indicates the meta-evaluation tier to specify the expected score ranking. |
'perturbation_type_tier'
|
lower_tier_is_better
|
bool
|
If True, lower perturbation tiers correspond to better outputs. If False, higher tiers are better. Defaults to False. |
False
|
metric_labels
|
dict[str, str] | None
|
A dictionary mapping metric names to their labels in the output table. If None, no aliasing is performed. Defaults to None. |
None
|
**kwargs
|
dict[str, Any]
|
Additional keyword arguments. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
The analysed results in the specified output format. |
Source code in evalsense/workflow/analysers/meta_result_analyser.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
MetricCorrelationAnalyser
Bases: ResultAnalyser[T]
An analyser calculating and visualizing correlations between different evaluation metrics.
This class analyzes the correlation between scores returned for individual samples by pairs of different evaluation methods, and produces a correlation matrix plot.
Methods:
Name | Description |
---|---|
__call__ |
Calculates Spearman rank correlations between evaluation metrics. |
__init__ |
Initializes the metric correlation analyser. |
Source code in evalsense/workflow/analysers/metric_correlation_analyser.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|
__call__
__call__(
project: Project,
corr_method: Literal[
"spearman", "pearson"
] = "spearman",
return_plot: bool = True,
figsize: tuple[int, int] = (12, 10),
metric_labels: dict[str, str] | None = None,
method_filter_fun: Callable[
[str], bool
] = lambda _: True,
**kwargs: dict,
) -> T
Calculates Spearman rank correlations between evaluation metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
Project
|
The project holding the evaluation data to analyse. |
required |
corr_method
|
Literal['spearman', 'pearson']
|
The correlation method to use. Can be "spearman" or "pearson". Defaults to "spearman". |
'spearman'
|
return_plot
|
bool
|
Whether to generate and return a visualization of the correlation matrix. Defaults to True. |
True
|
figsize
|
Tuple[int, int]
|
Figure size for the correlation matrix plot. Defaults to (10, 8). |
(12, 10)
|
metric_labels
|
dict[str, str] | None
|
A dictionary mapping metric names to their labels in the figure. If None, no aliasing is performed. Defaults to None. |
None
|
method_filter_fun
|
Callable[[str], bool]
|
A function to filter the evaluation methods, taking the method name as input and returning True if the method should be included in the analysis. Operates on original method names before label translation. Defaults to a function that always returns True. |
lambda _: True
|
**kwargs
|
dict
|
Additional arguments for the analysis. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
The correlation results containing the correlation matrix and optionally a visualization. |
Source code in evalsense/workflow/analysers/metric_correlation_analyser.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|
__init__
__init__(
name: str = "MetricCorrelationAnalyser",
output_format: Literal[
"polars", "pandas", "numpy"
] = "polars",
)
Initializes the metric correlation analyser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the metric correlation analyser. |
'MetricCorrelationAnalyser'
|
output_format
|
Literal['polars', 'pandas', 'numpy']
|
The output format of the correlation matrix. Can be "polars", "pandas", or "numpy". Defaults to "polars". |
'polars'
|
Source code in evalsense/workflow/analysers/metric_correlation_analyser.py
TabularResultAnalyser
Bases: ResultAnalyser[T]
An analyser summarising evaluation results in a tabular format.
This class is generic in T to provide better type hints when returning
different output types. It is the responsibility of the client code to
ensure that the specified output_format
is compatible with the type T.
For example, a correct use of this class could look as follows:
analyser = TabularResultAnalyser[pl.DataFrame](
output_format="polars",
)
Methods:
Name | Description |
---|---|
__call__ |
Analyses the evaluation results. |
__init__ |
Initializes the tabular result analyser. |
Source code in evalsense/workflow/analysers/tabular_analyser.py
__call__
Analyses the evaluation results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
Project
|
The project holding the evaluation data to analyse. |
required |
**kwargs
|
dict
|
Additional arguments for the analysis. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
The analysed results in the specified output format. |
Source code in evalsense/workflow/analysers/tabular_analyser.py
__init__
__init__(
name: str = "TabularResultAnalyser",
output_format: Literal["polars", "pandas"] = "polars",
)
Initializes the tabular result analyser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the tabular result analyser. |
'TabularResultAnalyser'
|
output_format
|
Literal['polars', 'pandas', 'dataset']
|
The output format of the result. Can be "polars" or "pandas". Defaults to "polars". |
'polars'
|