Skip to main content

EvalSense

Comprehensive guidance and tooling for evaluating large language models (LLMs)

LLM Evaluation Library

Python library for systematic evaluation of large language models on open-ended generation tasks.

Interactive Guide

Interactive guide helping you select the right evaluation methods for your use-case.

Evaluation Method Catalogue

Extensive catalogue of evaluation methods, including descriptions, supported tasks, and more.