Evaluation
Comprehensive benchmarks and evaluation frameworks for assessing the safety, alignment, and trustworthiness of large language models and multimodal AI.
Rigorous evaluation is the foundation of trustworthy AI. This research direction develops hierarchical benchmarks, standardized datasets, and evaluation frameworks that measure safety, alignment, and capability across diverse AI systems.
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL 2024
SALAD-Bench introduces a multi-level safety taxonomy covering 6 domains, 16 tasks, and 66 specific safety categories, providing the most comprehensive evaluation suite for LLM safety to date.
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
2024
A large-scale preference alignment dataset for VLMs spanning 6 harmfulness domains, 0.1M QA pairs with chosen/rejected responses, designed to advance safety evaluation and alignment training.
CH3EF: Assessment of Multimodal Large Language Models in Alignment with Human Values
2024
A human-centered evaluation framework that assesses MLLMs across 12 human value principles, covering 46 subtasks to measure alignment with human values and social norms.
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal LLMs
2023
ChEF provides a unified evaluation protocol with desiderata for MLLMs, enabling reproducible and comparable assessments across different model architectures and tasks.
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS 2023
A dataset, training framework, and benchmark for multi-modal instruction-following, enabling systematic evaluation of language-assisted visual understanding.