EvalSense
Comprehensive guidance and tooling for evaluating large language models (LLMs)
LLM Evaluation Library
Python library for systematic evaluation of large language models on open-ended generation tasks.
Interactive Guide
Interactive guide helping you select the right evaluation methods for your use-case.
Evaluation Method Catalogue
Extensive catalogue of evaluation methods, including descriptions, supported tasks, and more.