Evaluation

Comprehensive benchmarks and evaluation frameworks for assessing the safety, alignment, and trustworthiness of large language models and multimodal AI.

Rigorous evaluation is the foundation of trustworthy AI. This research direction develops hierarchical benchmarks, standardized datasets, and evaluation frameworks that measure safety, alignment, and capability across diverse AI systems.


SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

ACL 2024

SALAD-Bench introduces a multi-level safety taxonomy covering 6 domains, 16 tasks, and 66 specific safety categories, providing the most comprehensive evaluation suite for LLM safety to date.

(Li et al., 2024)


SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models

2024

A large-scale preference alignment dataset for VLMs spanning 6 harmfulness domains, 0.1M QA pairs with chosen/rejected responses, designed to advance safety evaluation and alignment training.

(Zhang et al., 2024)


CH3EF: Assessment of Multimodal Large Language Models in Alignment with Human Values

2024

A human-centered evaluation framework that assesses MLLMs across 12 human value principles, covering 46 subtasks to measure alignment with human values and social norms.

(Shi et al., 2024)


ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal LLMs

2023

ChEF provides a unified evaluation protocol with desiderata for MLLMs, enabling reproducible and comparable assessments across different model architectures and tasks.

(Shi et al., 2023)


LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

NeurIPS 2023

A dataset, training framework, and benchmark for multi-modal instruction-following, enabling systematic evaluation of language-assisted visual understanding.

(Yin et al., 2023)