Frontier AI Risk Analysis

Understanding the risks posed by frontier AI requires systematic investigation of how models acquire and express potentially dangerous behaviors across training, deployment, and interaction. This research direction characterizes these risks at scale.

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

ACL 2024

We trace how trustworthiness properties—including truthfulness, calibration, robustness, and fairness—evolve throughout the pre-training process, revealing when and how safety-relevant behaviors emerge.

(Qian et al., 2024)

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Technical Report 2024

A large-scale assessment of frontier multimodal models across text, image, video, and audio modalities, benchmarking generalizability, trustworthiness, and causal reasoning capabilities of GPT-4V, Gemini, and other state-of-the-art systems.

(Lu et al., 2024)