Configurators

Modules:

Name	Description
`evaluator_configurator`
`evaluators`

Classes:

Name	Description
`ConfiguratorInput`	A typed dictionary for the input fields of the configurator widget.
`EvaluatorConfigurator`	A protocol for configuring evaluators.
`EvaluatorConfiguratorRegistry`	A registry for evaluator configurators.

Functions:

Name	Description
`configurator`	Decorator to register an evaluator configurator.

ConfiguratorInput

Bases: TypedDict

A typed dictionary for the input fields of the configurator widget.

Attributes:

Name	Type	Description
`input_name`	`str`	The name of the input field.
`component`	`Block`	The Gradio component for the input field.
`parser`	`Callable`	A callable to parse the input value.

Source code in evalsense/webui/configurators/evaluator_configurator.py

class ConfiguratorInput(TypedDict):
    """A typed dictionary for the input fields of the configurator widget.

    Attributes:
        input_name (str): The name of the input field.
        component (Block): The Gradio component for the input field.
        parser (Callable): A callable to parse the input value.
    """

    input_name: str
    component: Block
    parser: Callable | None

EvaluatorConfigurator

Bases: Protocol

A protocol for configuring evaluators.

Attributes:

Name	Type	Description
`name`	`str`	The string ID of the evaluator. A class attribute.

Methods:

Name	Description
`create`	Create a configurator for the specified evaluator.
`input_widget`	Constructs the configurator widget.
`instantiate_evaluator`	Instantiates the evaluator according to the specified configuration.

Source code in evalsense/webui/configurators/evaluator_configurator.py

class EvaluatorConfigurator(Protocol):
    """A protocol for configuring evaluators.

    Attributes:
        name (str): The string ID of the evaluator. A class attribute.
    """

    name: str

    @classmethod
    def create(cls, name: str) -> "EvaluatorConfigurator":
        """Create a configurator for the specified evaluator.

        Args:
            name (str): The name of the evaluator for which the configurator
                should be created.

        Returns:
            EvaluatorConfigurator: The created evaluator configurator instance.
        """
        configurator = EvaluatorConfiguratorRegistry.get(name)
        return configurator()

    @abstractmethod
    def input_widget(self) -> list[ConfiguratorInput]:
        """Constructs the configurator widget.

        Returns:
            list[ConfiguratorInput]: The input fields for the configurator widget.
        """
        ...

    @abstractmethod
    def instantiate_evaluator(self, **kwargs) -> Evaluator:
        """
        Instantiates the evaluator according to the specified configuration.

        Args:
            **kwargs (dict): The keyword arguments specifying evaluator configuration.

        Returns:
            Evaluator: The instantiated evaluator.
        """
        ...

create `classmethod`

create(name: str) -> EvaluatorConfigurator

Create a configurator for the specified evaluator.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator for which the configurator should be created.	required

Returns:

Name	Type	Description
`EvaluatorConfigurator`	`EvaluatorConfigurator`	The created evaluator configurator instance.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def create(cls, name: str) -> "EvaluatorConfigurator":
    """Create a configurator for the specified evaluator.

    Args:
        name (str): The name of the evaluator for which the configurator
            should be created.

    Returns:
        EvaluatorConfigurator: The created evaluator configurator instance.
    """
    configurator = EvaluatorConfiguratorRegistry.get(name)
    return configurator()

input_widget `abstractmethod`

input_widget() -> list[ConfiguratorInput]

Constructs the configurator widget.

Returns:

Type	Description
`list[ConfiguratorInput]`	list[ConfiguratorInput]: The input fields for the configurator widget.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@abstractmethod
def input_widget(self) -> list[ConfiguratorInput]:
    """Constructs the configurator widget.

    Returns:
        list[ConfiguratorInput]: The input fields for the configurator widget.
    """
    ...

instantiate_evaluator `abstractmethod`

instantiate_evaluator(**kwargs) -> Evaluator

Instantiates the evaluator according to the specified configuration.

Parameters:

Name	Type	Description	Default
`**kwargs`	`dict`	The keyword arguments specifying evaluator configuration.	`{}`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The instantiated evaluator.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@abstractmethod
def instantiate_evaluator(self, **kwargs) -> Evaluator:
    """
    Instantiates the evaluator according to the specified configuration.

    Args:
        **kwargs (dict): The keyword arguments specifying evaluator configuration.

    Returns:
        Evaluator: The instantiated evaluator.
    """
    ...

EvaluatorConfiguratorRegistry

A registry for evaluator configurators.

Methods:

Name	Description
`get`	Get an evaluator configurator by name.
`register`	Register a new evaluator configurator.

Source code in evalsense/webui/configurators/evaluator_configurator.py

class EvaluatorConfiguratorRegistry:
    """A registry for evaluator configurators."""

    registry: dict[str, Type["EvaluatorConfigurator"]] = {}

    @classmethod
    def register(cls, configurator: Type["EvaluatorConfigurator"]):
        """Register a new evaluator configurator.

        Args:
            configurator (Type["EvaluatorConfigurator"]): The evaluator configurator to register.
        """
        cls.registry[configurator.name] = configurator

    @classmethod
    def get(cls, name: str) -> Type["EvaluatorConfigurator"]:
        """Get an evaluator configurator by name.

        Args:
            name (str): The name of the evaluator configurator to retrieve.

        Returns:
            Type["EvaluatorConfigurator"]: The requested evaluator configurator.
        """
        if name not in cls.registry:
            raise ValueError(f"No configurator for {name} has been registered.")
        return cls.registry[name]

get `classmethod`

get(name: str) -> Type[EvaluatorConfigurator]

Get an evaluator configurator by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator configurator to retrieve.	required

Returns:

Type	Description
`Type[EvaluatorConfigurator]`	Type["EvaluatorConfigurator"]: The requested evaluator configurator.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def get(cls, name: str) -> Type["EvaluatorConfigurator"]:
    """Get an evaluator configurator by name.

    Args:
        name (str): The name of the evaluator configurator to retrieve.

    Returns:
        Type["EvaluatorConfigurator"]: The requested evaluator configurator.
    """
    if name not in cls.registry:
        raise ValueError(f"No configurator for {name} has been registered.")
    return cls.registry[name]

register `classmethod`

register(configurator: Type[EvaluatorConfigurator])

Register a new evaluator configurator.

Parameters:

Name	Type	Description	Default
`configurator`	`Type[EvaluatorConfigurator]`	The evaluator configurator to register.	required

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def register(cls, configurator: Type["EvaluatorConfigurator"]):
    """Register a new evaluator configurator.

    Args:
        configurator (Type["EvaluatorConfigurator"]): The evaluator configurator to register.
    """
    cls.registry[configurator.name] = configurator

configurator

configurator(
    configurator: Type[EvaluatorConfigurator],
) -> Type[EvaluatorConfigurator]

Decorator to register an evaluator configurator.

Parameters:

Name	Type	Description	Default
`configurator`	`Type[EvaluatorConfigurator]`	The evaluator configurator to register.	required

Returns:

Type	Description
`Type[EvaluatorConfigurator]`	Type["EvaluatorConfigurator"]: The registered evaluator configurator.

Source code in evalsense/webui/configurators/evaluator_configurator.py

def configurator(
    configurator: Type["EvaluatorConfigurator"],
) -> Type["EvaluatorConfigurator"]:
    """Decorator to register an evaluator configurator.

    Args:
        configurator (Type["EvaluatorConfigurator"]): The evaluator configurator to register.

    Returns:
        Type["EvaluatorConfigurator"]: The registered evaluator configurator.
    """
    EvaluatorConfiguratorRegistry.register(configurator)
    return configurator

Evaluators

Module evalsense.webui.configurators.evaluators.

Modules:

Name	Description
`bertscore`
`bleu`
`g_eval`
`rouge`

Classes:

Name	Description
`BertScoreConfigurator`	Configurator for the BERTScore evaluator.
`BleuConfigurator`	Configurator for the BLEU evaluator.
`GEvalConfigurator`	Configurator for the G-Eval evaluator.
`RougeConfigurator`	Configurator for the ROUGE evaluator.

BertScoreConfigurator

Bases: EvaluatorConfigurator

Configurator for the BERTScore evaluator.

Methods:

Name	Description
`create`	Create a configurator for the specified evaluator.
`input_widget`	Constructs the input widget for BERTScore.
`instantiate_evaluator`	Instantiates the BERTScore evaluator according to the specified configuration.

Source code in evalsense/webui/configurators/evaluators/bertscore.py

@configurator
class BertScoreConfigurator(EvaluatorConfigurator):
    """Configurator for the BERTScore evaluator."""

    name = "BERTScore"

    @override
    def input_widget(self) -> list[ConfiguratorInput]:
        """Constructs the input widget for BERTScore.

        Returns:
            list[ConfiguratorInput]: The input fields for the configurator widget.
        """
        return [
            {
                "input_name": "name",
                "component": gr.Textbox(
                    label="Metric Name",
                    info="The name of the metric to show in the results.",
                    value="BERTScore",
                ),
                "parser": None,
            },
            {
                "input_name": "model_type",
                "component": gr.Textbox(
                    label="Model Type",
                    info="The type of BERT model to use. See [this Google sheet](https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/view) for available models.",
                    value="microsoft/deberta-xlarge-mnli",
                ),
                "parser": None,
            },
            {
                "input_name": "lang",
                "component": gr.Textbox(
                    label="Language",
                    info="The language of the text to evaluate.",
                    value="en",
                ),
                "parser": None,
            },
            {
                "input_name": "num_layers",
                "component": gr.Textbox(
                    label="Layer Number",
                    info="The layer of representations to use. When empty, defaults to the best layer according to the WMT16 correlation data.",
                    value="",
                ),
                "parser": empty_is_none_parser_for(int),
            },
            {
                "input_name": "device",
                "component": gr.Textbox(
                    label="Device",
                    info="The device to use for computing the contextual embeddings. If this argument is left empty, the model will be loaded on `cuda:0` if available.",
                    value="",
                ),
                "parser": empty_is_none_parser_for(str),
            },
        ]

    @override
    def instantiate_evaluator(self, **kwargs) -> Evaluator:
        """
        Instantiates the BERTScore evaluator according to the specified configuration.

        Args:
            **kwargs (dict): The keyword arguments specifying evaluator configuration.

        Returns:
            Evaluator: The instantiated evaluator.
        """
        return get_bertscore_evaluator(**kwargs)

create `classmethod`

create(name: str) -> EvaluatorConfigurator

Create a configurator for the specified evaluator.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator for which the configurator should be created.	required

Returns:

Name	Type	Description
`EvaluatorConfigurator`	`EvaluatorConfigurator`	The created evaluator configurator instance.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def create(cls, name: str) -> "EvaluatorConfigurator":
    """Create a configurator for the specified evaluator.

    Args:
        name (str): The name of the evaluator for which the configurator
            should be created.

    Returns:
        EvaluatorConfigurator: The created evaluator configurator instance.
    """
    configurator = EvaluatorConfiguratorRegistry.get(name)
    return configurator()

input_widget

input_widget() -> list[ConfiguratorInput]

Constructs the input widget for BERTScore.

Returns:

Type	Description
`list[ConfiguratorInput]`	list[ConfiguratorInput]: The input fields for the configurator widget.

Source code in evalsense/webui/configurators/evaluators/bertscore.py

@override
def input_widget(self) -> list[ConfiguratorInput]:
    """Constructs the input widget for BERTScore.

    Returns:
        list[ConfiguratorInput]: The input fields for the configurator widget.
    """
    return [
        {
            "input_name": "name",
            "component": gr.Textbox(
                label="Metric Name",
                info="The name of the metric to show in the results.",
                value="BERTScore",
            ),
            "parser": None,
        },
        {
            "input_name": "model_type",
            "component": gr.Textbox(
                label="Model Type",
                info="The type of BERT model to use. See [this Google sheet](https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/view) for available models.",
                value="microsoft/deberta-xlarge-mnli",
            ),
            "parser": None,
        },
        {
            "input_name": "lang",
            "component": gr.Textbox(
                label="Language",
                info="The language of the text to evaluate.",
                value="en",
            ),
            "parser": None,
        },
        {
            "input_name": "num_layers",
            "component": gr.Textbox(
                label="Layer Number",
                info="The layer of representations to use. When empty, defaults to the best layer according to the WMT16 correlation data.",
                value="",
            ),
            "parser": empty_is_none_parser_for(int),
        },
        {
            "input_name": "device",
            "component": gr.Textbox(
                label="Device",
                info="The device to use for computing the contextual embeddings. If this argument is left empty, the model will be loaded on `cuda:0` if available.",
                value="",
            ),
            "parser": empty_is_none_parser_for(str),
        },
    ]

instantiate_evaluator

instantiate_evaluator(**kwargs) -> Evaluator

Instantiates the BERTScore evaluator according to the specified configuration.

Parameters:

Name	Type	Description	Default
`**kwargs`	`dict`	The keyword arguments specifying evaluator configuration.	`{}`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The instantiated evaluator.

Source code in evalsense/webui/configurators/evaluators/bertscore.py

@override
def instantiate_evaluator(self, **kwargs) -> Evaluator:
    """
    Instantiates the BERTScore evaluator according to the specified configuration.

    Args:
        **kwargs (dict): The keyword arguments specifying evaluator configuration.

    Returns:
        Evaluator: The instantiated evaluator.
    """
    return get_bertscore_evaluator(**kwargs)

BleuConfigurator

Bases: EvaluatorConfigurator

Configurator for the BLEU evaluator.

Methods:

Name	Description
`create`	Create a configurator for the specified evaluator.
`input_widget`	Constructs the input widget for BLEU.
`instantiate_evaluator`	Instantiates the BLEU evaluator according to the specified configuration.

Source code in evalsense/webui/configurators/evaluators/bleu.py

@configurator
class BleuConfigurator(EvaluatorConfigurator):
    """Configurator for the BLEU evaluator."""

    name = "BLEU"

    @override
    def input_widget(self) -> list[ConfiguratorInput]:
        """Constructs the input widget for BLEU.

        Returns:
            list[ConfiguratorInput]: The input fields for the configurator widget.
        """
        return [
            {
                "input_name": "name",
                "component": gr.Textbox(
                    label="Metric Name",
                    info="The name of the metric to show in the results.",
                    value="BLEU",
                ),
                "parser": None,
            },
            {
                "input_name": "scorer_name",
                "component": gr.Textbox(
                    label="Scorer Name",
                    info="The name of the internal scorer to show in the results.",
                    value="BLEU Precision",
                ),
                "parser": None,
            },
        ]

    @override
    def instantiate_evaluator(self, **kwargs) -> Evaluator:
        """
        Instantiates the BLEU evaluator according to the specified configuration.

        Args:
            **kwargs (dict): The keyword arguments specifying evaluator configuration.

        Returns:
            Evaluator: The instantiated evaluator.
        """
        return get_bleu_evaluator(**kwargs)

create `classmethod`

create(name: str) -> EvaluatorConfigurator

Create a configurator for the specified evaluator.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator for which the configurator should be created.	required

Returns:

Name	Type	Description
`EvaluatorConfigurator`	`EvaluatorConfigurator`	The created evaluator configurator instance.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def create(cls, name: str) -> "EvaluatorConfigurator":
    """Create a configurator for the specified evaluator.

    Args:
        name (str): The name of the evaluator for which the configurator
            should be created.

    Returns:
        EvaluatorConfigurator: The created evaluator configurator instance.
    """
    configurator = EvaluatorConfiguratorRegistry.get(name)
    return configurator()

input_widget

input_widget() -> list[ConfiguratorInput]

Constructs the input widget for BLEU.

Returns:

Type	Description
`list[ConfiguratorInput]`	list[ConfiguratorInput]: The input fields for the configurator widget.

Source code in evalsense/webui/configurators/evaluators/bleu.py

@override
def input_widget(self) -> list[ConfiguratorInput]:
    """Constructs the input widget for BLEU.

    Returns:
        list[ConfiguratorInput]: The input fields for the configurator widget.
    """
    return [
        {
            "input_name": "name",
            "component": gr.Textbox(
                label="Metric Name",
                info="The name of the metric to show in the results.",
                value="BLEU",
            ),
            "parser": None,
        },
        {
            "input_name": "scorer_name",
            "component": gr.Textbox(
                label="Scorer Name",
                info="The name of the internal scorer to show in the results.",
                value="BLEU Precision",
            ),
            "parser": None,
        },
    ]

instantiate_evaluator

instantiate_evaluator(**kwargs) -> Evaluator

Instantiates the BLEU evaluator according to the specified configuration.

Parameters:

Name	Type	Description	Default
`**kwargs`	`dict`	The keyword arguments specifying evaluator configuration.	`{}`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The instantiated evaluator.

Source code in evalsense/webui/configurators/evaluators/bleu.py

@override
def instantiate_evaluator(self, **kwargs) -> Evaluator:
    """
    Instantiates the BLEU evaluator according to the specified configuration.

    Args:
        **kwargs (dict): The keyword arguments specifying evaluator configuration.

    Returns:
        Evaluator: The instantiated evaluator.
    """
    return get_bleu_evaluator(**kwargs)

GEvalConfigurator

Bases: EvaluatorConfigurator

Configurator for the G-Eval evaluator.

Methods:

Name	Description
`create`	Create a configurator for the specified evaluator.
`input_widget`	Constructs the input widget for G-Eval.
`instantiate_evaluator`	Instantiates the BERTScore evaluator according to the specified configuration.

Source code in evalsense/webui/configurators/evaluators/g_eval.py

@configurator
class GEvalConfigurator(EvaluatorConfigurator):
    """Configurator for the G-Eval evaluator."""

    name = "G-Eval"

    @override
    def input_widget(self) -> list[ConfiguratorInput]:
        """Constructs the input widget for G-Eval.

        Returns:
            list[ConfiguratorInput]: The input fields for the configurator widget.
        """
        return [
            {
                "input_name": "name",
                "component": gr.Textbox(
                    label="Metric Name",
                    info="The name of the metric to show in the results.",
                    value="G-Eval",
                ),
                "parser": None,
            },
            {
                "input_name": "quality_name",
                "component": gr.Textbox(
                    label="Quality Name",
                    info="The name of the quality to be evaluated by G-Eval.",
                    value="Unknown",
                ),
                "parser": None,
            },
            {
                "input_name": "prompt_template",
                "component": gr.TextArea(
                    label="Prompt Template",
                    info="The prompt template to use for evaluation. The supplied template should be a Python f-string with `{prediction}` and (optionally) `{reference}` as placeholders, as well as any additional placeholders for entries in Inspect AI sample/task state metadata. The template should instruct the judge model to respond with a numerical score between the specified `min_score` and `max_score`.",
                    max_lines=15,
                ),
                "parser": None,
            },
            {
                "input_name": "model_name",
                "component": gr.Textbox(
                    label="Model Name",
                    info="The name of the model to use as a judge following the [Inspect AI naming conventions](https://inspect.aisi.org.uk/models.html).",
                ),
                "parser": None,
            },
            {
                "input_name": "model_args",
                "component": gr.Textbox(
                    label="Model Arguments",
                    info="The arguments to pass to the model during evaluation, formatted as a Python dictionary. These will be passed to the [`get_model`](https://inspect.aisi.org.uk/reference/inspect_ai.model.html#get_model) function when creating the model.",
                ),
                "parser": dict_parser,
            },
            {
                "input_name": "generation_args",
                "component": gr.Textbox(
                    label="Generation Arguments",
                    info="The arguments to pass to the model during generation, formatted as a Python dictionary. See [`GenerateConfigArgs`](https://inspect.aisi.org.uk/reference/inspect_ai.model.html#generateconfigargs) Inspect AI documentation for valid values.",
                ),
                "parser": dict_parser,
            },
            {
                "input_name": "min_score",
                "component": gr.Number(
                    label="Min Score",
                    info="The minimum score on the G-Eval rating scale.",
                    value=1,
                ),
                "parser": int,
            },
            {
                "input_name": "max_score",
                "component": gr.Number(
                    label="Max Score",
                    info="The maximum score on the G-Eval rating scale.",
                    value=5,
                ),
                "parser": int,
            },
            {
                "input_name": "logprobs",
                "component": gr.Checkbox(
                    label="Log Probs",
                    info="Whether to use log probabilities of the generated tokens to compute a weighted evaluation score.",
                    value=True,
                ),
                "parser": None,
            },
            {
                "input_name": "top_logprobs",
                "component": gr.Number(
                    label="Top Log Probs",
                    info="The number of top log probabilities to consider for each generated token.",
                    value=20,
                ),
                "parser": int,
            },
            {
                "input_name": "normalise",
                "component": gr.Checkbox(
                    label="Normalise",
                    info="Whether to normalise the evaluation scores to be between 0 and 1.",
                    value=True,
                ),
                "parser": None,
            },
        ]

    @override
    def instantiate_evaluator(self, **kwargs) -> Evaluator:
        """
        Instantiates the BERTScore evaluator according to the specified configuration.

        Args:
            **kwargs (dict): The keyword arguments specifying evaluator configuration.

        Returns:
            Evaluator: The instantiated evaluator.
        """
        model_name = kwargs.pop("model_name")
        model_args = kwargs.pop("model_args", {})
        generation_args = kwargs.pop("generation_args", {})
        model_config = ModelConfig(
            model=model_name,
            model_args=model_args,
            generation_args=GenerateConfigArgs(**generation_args),
        )

        return get_g_eval_evaluator(**kwargs, model_config=model_config)

create `classmethod`

create(name: str) -> EvaluatorConfigurator

Create a configurator for the specified evaluator.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator for which the configurator should be created.	required

Returns:

Name	Type	Description
`EvaluatorConfigurator`	`EvaluatorConfigurator`	The created evaluator configurator instance.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def create(cls, name: str) -> "EvaluatorConfigurator":
    """Create a configurator for the specified evaluator.

    Args:
        name (str): The name of the evaluator for which the configurator
            should be created.

    Returns:
        EvaluatorConfigurator: The created evaluator configurator instance.
    """
    configurator = EvaluatorConfiguratorRegistry.get(name)
    return configurator()

input_widget

input_widget() -> list[ConfiguratorInput]

Constructs the input widget for G-Eval.

Returns:

Type	Description
`list[ConfiguratorInput]`	list[ConfiguratorInput]: The input fields for the configurator widget.

Source code in evalsense/webui/configurators/evaluators/g_eval.py

@override
def input_widget(self) -> list[ConfiguratorInput]:
    """Constructs the input widget for G-Eval.

    Returns:
        list[ConfiguratorInput]: The input fields for the configurator widget.
    """
    return [
        {
            "input_name": "name",
            "component": gr.Textbox(
                label="Metric Name",
                info="The name of the metric to show in the results.",
                value="G-Eval",
            ),
            "parser": None,
        },
        {
            "input_name": "quality_name",
            "component": gr.Textbox(
                label="Quality Name",
                info="The name of the quality to be evaluated by G-Eval.",
                value="Unknown",
            ),
            "parser": None,
        },
        {
            "input_name": "prompt_template",
            "component": gr.TextArea(
                label="Prompt Template",
                info="The prompt template to use for evaluation. The supplied template should be a Python f-string with `{prediction}` and (optionally) `{reference}` as placeholders, as well as any additional placeholders for entries in Inspect AI sample/task state metadata. The template should instruct the judge model to respond with a numerical score between the specified `min_score` and `max_score`.",
                max_lines=15,
            ),
            "parser": None,
        },
        {
            "input_name": "model_name",
            "component": gr.Textbox(
                label="Model Name",
                info="The name of the model to use as a judge following the [Inspect AI naming conventions](https://inspect.aisi.org.uk/models.html).",
            ),
            "parser": None,
        },
        {
            "input_name": "model_args",
            "component": gr.Textbox(
                label="Model Arguments",
                info="The arguments to pass to the model during evaluation, formatted as a Python dictionary. These will be passed to the [`get_model`](https://inspect.aisi.org.uk/reference/inspect_ai.model.html#get_model) function when creating the model.",
            ),
            "parser": dict_parser,
        },
        {
            "input_name": "generation_args",
            "component": gr.Textbox(
                label="Generation Arguments",
                info="The arguments to pass to the model during generation, formatted as a Python dictionary. See [`GenerateConfigArgs`](https://inspect.aisi.org.uk/reference/inspect_ai.model.html#generateconfigargs) Inspect AI documentation for valid values.",
            ),
            "parser": dict_parser,
        },
        {
            "input_name": "min_score",
            "component": gr.Number(
                label="Min Score",
                info="The minimum score on the G-Eval rating scale.",
                value=1,
            ),
            "parser": int,
        },
        {
            "input_name": "max_score",
            "component": gr.Number(
                label="Max Score",
                info="The maximum score on the G-Eval rating scale.",
                value=5,
            ),
            "parser": int,
        },
        {
            "input_name": "logprobs",
            "component": gr.Checkbox(
                label="Log Probs",
                info="Whether to use log probabilities of the generated tokens to compute a weighted evaluation score.",
                value=True,
            ),
            "parser": None,
        },
        {
            "input_name": "top_logprobs",
            "component": gr.Number(
                label="Top Log Probs",
                info="The number of top log probabilities to consider for each generated token.",
                value=20,
            ),
            "parser": int,
        },
        {
            "input_name": "normalise",
            "component": gr.Checkbox(
                label="Normalise",
                info="Whether to normalise the evaluation scores to be between 0 and 1.",
                value=True,
            ),
            "parser": None,
        },
    ]

instantiate_evaluator

instantiate_evaluator(**kwargs) -> Evaluator

Instantiates the BERTScore evaluator according to the specified configuration.

Parameters:

Name	Type	Description	Default
`**kwargs`	`dict`	The keyword arguments specifying evaluator configuration.	`{}`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The instantiated evaluator.

Source code in evalsense/webui/configurators/evaluators/g_eval.py

@override
def instantiate_evaluator(self, **kwargs) -> Evaluator:
    """
    Instantiates the BERTScore evaluator according to the specified configuration.

    Args:
        **kwargs (dict): The keyword arguments specifying evaluator configuration.

    Returns:
        Evaluator: The instantiated evaluator.
    """
    model_name = kwargs.pop("model_name")
    model_args = kwargs.pop("model_args", {})
    generation_args = kwargs.pop("generation_args", {})
    model_config = ModelConfig(
        model=model_name,
        model_args=model_args,
        generation_args=GenerateConfigArgs(**generation_args),
    )

    return get_g_eval_evaluator(**kwargs, model_config=model_config)

RougeConfigurator

Bases: EvaluatorConfigurator

Configurator for the ROUGE evaluator.

Methods:

Name	Description
`create`	Create a configurator for the specified evaluator.
`input_widget`	Constructs the input widget for ROUGE.
`instantiate_evaluator`	Instantiates the ROUGE evaluator according to the specified configuration.

Source code in evalsense/webui/configurators/evaluators/rouge.py

@configurator
class RougeConfigurator(EvaluatorConfigurator):
    """Configurator for the ROUGE evaluator."""

    name = "ROUGE"

    @override
    def input_widget(self) -> list[ConfiguratorInput]:
        """Constructs the input widget for ROUGE.

        Returns:
            list[ConfiguratorInput]: The input fields for the configurator widget.
        """
        return [
            {
                "input_name": "name",
                "component": gr.Textbox(
                    label="Metric Name",
                    info="The name of the metric to show in the results.",
                    value="ROUGE",
                ),
                "parser": None,
            },
        ]

    @override
    def instantiate_evaluator(self, **kwargs) -> Evaluator:
        """
        Instantiates the ROUGE evaluator according to the specified configuration.

        Args:
            **kwargs (dict): The keyword arguments specifying evaluator configuration.

        Returns:
            Evaluator: The instantiated evaluator.
        """
        return get_rouge_evaluator(**kwargs)

create `classmethod`

create(name: str) -> EvaluatorConfigurator

Create a configurator for the specified evaluator.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the evaluator for which the configurator should be created.	required

Returns:

Name	Type	Description
`EvaluatorConfigurator`	`EvaluatorConfigurator`	The created evaluator configurator instance.

Source code in evalsense/webui/configurators/evaluator_configurator.py

@classmethod
def create(cls, name: str) -> "EvaluatorConfigurator":
    """Create a configurator for the specified evaluator.

    Args:
        name (str): The name of the evaluator for which the configurator
            should be created.

    Returns:
        EvaluatorConfigurator: The created evaluator configurator instance.
    """
    configurator = EvaluatorConfiguratorRegistry.get(name)
    return configurator()

input_widget

input_widget() -> list[ConfiguratorInput]

Constructs the input widget for ROUGE.

Returns:

Type	Description
`list[ConfiguratorInput]`	list[ConfiguratorInput]: The input fields for the configurator widget.

Source code in evalsense/webui/configurators/evaluators/rouge.py

@override
def input_widget(self) -> list[ConfiguratorInput]:
    """Constructs the input widget for ROUGE.

    Returns:
        list[ConfiguratorInput]: The input fields for the configurator widget.
    """
    return [
        {
            "input_name": "name",
            "component": gr.Textbox(
                label="Metric Name",
                info="The name of the metric to show in the results.",
                value="ROUGE",
            ),
            "parser": None,
        },
    ]

instantiate_evaluator

instantiate_evaluator(**kwargs) -> Evaluator

Instantiates the ROUGE evaluator according to the specified configuration.

Parameters:

Name	Type	Description	Default
`**kwargs`	`dict`	The keyword arguments specifying evaluator configuration.	`{}`

Returns:

Name	Type	Description
`Evaluator`	`Evaluator`	The instantiated evaluator.

Source code in evalsense/webui/configurators/evaluators/rouge.py

@override
def instantiate_evaluator(self, **kwargs) -> Evaluator:
    """
    Instantiates the ROUGE evaluator according to the specified configuration.

    Args:
        **kwargs (dict): The keyword arguments specifying evaluator configuration.

    Returns:
        Evaluator: The instantiated evaluator.
    """
    return get_rouge_evaluator(**kwargs)

Configurators

ConfiguratorInput

EvaluatorConfigurator

create classmethod

input_widget abstractmethod

instantiate_evaluator abstractmethod

EvaluatorConfiguratorRegistry

get classmethod

register classmethod

configurator

Evaluators

BertScoreConfigurator

create classmethod

input_widget

instantiate_evaluator

BleuConfigurator

create classmethod

input_widget

instantiate_evaluator

GEvalConfigurator

create classmethod

input_widget

instantiate_evaluator

RougeConfigurator

create classmethod

input_widget

instantiate_evaluator

create `classmethod`

input_widget `abstractmethod`

instantiate_evaluator `abstractmethod`

get `classmethod`

register `classmethod`

create `classmethod`

create `classmethod`

create `classmethod`

create `classmethod`