Frontier AI Risk Analysis
Systematic analysis of trustworthiness, safety risks, and behavioral dynamics in frontier AI systems including large language models and multimodal models.
Understanding the risks posed by frontier AI requires systematic investigation of how models acquire and express potentially dangerous behaviors across training, deployment, and interaction. This research direction characterizes these risks at scale.
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
ACL 2024
We trace how trustworthiness properties—including truthfulness, calibration, robustness, and fairness—evolve throughout the pre-training process, revealing when and how safety-relevant behaviors emerge.
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Technical Report 2024
A large-scale assessment of frontier multimodal models across text, image, video, and audio modalities, benchmarking generalizability, trustworthiness, and causal reasoning capabilities of GPT-4V, Gemini, and other state-of-the-art systems.