Publications
Please visit my Google Scholar profile to check out my up-to-date publication list.
# indicates equal contributions; * indicates corresponding authors.
2024
- SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language ModelArxiv, 2024
-
- CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code CompletionACL, 2024
- Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language ModelsACL, 2024
- SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language ModelsACL, 2024
- From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four ModalitiesTechnicle Report, 2024
- PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System SafetyACL, 2024
2023
- ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language ModelsArxiv, 2023
- LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and BenchmarkNeurIPS, 2023
2022
- 1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)CVPR, 2022
- ERGO: Event Relational Graph Transformer for Document-level Event Causality IdentificationArxiv, 2022
- Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and SupervisionArxiv, 2022
- X-Learner: Learning Cross Sources and Tasks for Universal Visual RepresentationECCV, 2022
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training ParadigmIn International Conference on Learning Representations, Mar 2022
- MMEKG: Multi-modal Event Knowledge Graph towards Universal Representation across ModalitiesIn Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, May 2022
- Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument ExtractionIn Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022
- ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action RecognitionNeurIPS, May 2022
- Few-shot Forgery Detection via Guided Adversarial InterpolationArxiv, May 2022
- Task-Balanced Distillation for Object DetectionPR, May 2022
- RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-trainingIn Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Jul 2022
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative SamplesICIC, Jul 2022
- Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine SynergyArxiv, Jul 2022
- Benchmarking Omni-Vision Representation through the Lens of Visual RealmsECCV, Jul 2022
- Robust Face Anti-Spoofing with Dual Probabilistic ModelingArxiv, Jul 2022
2021
- Actor-Context-Actor Relation Network for Spatio-Temporal Action LocalizationIn 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021
- ForgeryNet - Face Forgery Analysis Challenge 2021: Methods and ResultsCoRR, Jun 2021
- ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisIn 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021
- A Simple Long-Tailed Recognition Baseline via Vision-Language ModelCoRR, Jun 2021
- Few-Shot Domain Expansion for Face Anti-SpoofingCoRR, Jun 2021
- BlockQNN: Efficient Block-Wise Neural Network Architecture GenerationIEEE Transactions on Pattern Analysis and Machine Intelligence, Jul 2021Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
2020
- 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020CoRR, Jul 2020
- High-Quality Video Generation from Static Structural AnnotationsInternational Journal of Computer Vision, Nov 2020
- Morphing and Sampling Network for Dense Point Cloud CompletionIn Proceedings of the AAAI Conference on Artificial Intelligence, Apr 2020Number: 07
- CelebA-Spoof: Large-Scale Face Anti-spoofing Dataset with Rich AnnotationsIn Computer Vision – ECCV 2020, Apr 2020
- Learning Connectivity of Neural Networks from a Topological PerspectiveIn Computer Vision – ECCV 2020, Apr 2020
- Powering One-Shot Topological NAS with Stabilized Share-Parameter ProxyIn Computer Vision – ECCV 2020, Apr 2020
- Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware CluesIn Computer Vision – ECCV 2020, Apr 2020
- PV-NAS: Practical Neural Architecture Search for Video RecognitionCoRR, Apr 2020
- PV-NAS: Practical Neural Architecture Search for Video RecognitionCoRR, Apr 2020
2019
- Unsupervised Bi-directional Flow-based Video Generation from one SnapshotCoRR, Apr 2019
- Improving Referring Expression Grounding With Cross-Modal Attention-Guided ErasingIn 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019
- Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image SynthesisIn Advances in Neural Information Processing Systems, Jun 2019
- Video Generation From Single Semantic Label MapIn 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019
- CAMP: Cross-Modal Adaptive Message Passing for Text-Image RetrievalIn 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2019
- Context and Attribute Grounded Dense CaptioningIn 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019
- Semantics Disentangling for Text-To-Image GenerationIn 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019
2018
- Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised DetectionIn Proceedings of the 26th ACM international conference on Multimedia, Jun 2018
- Localization Guided Learning for Pedestrian Attribute RecognitionIn British Machine Vision Conference 2018, BMVC 2018, Sep 2018
- Exploring Disentangled Feature Representation Beyond Face IdentificationIn 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018
- Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language AssociationIn Computer Vision – ECCV 2018, Jun 2018
- Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled DataIn Computer Vision – ECCV 2018, Jun 2018
- Transductive Centroid Projection for Semi-supervised Large-Scale RecognitionIn Computer Vision – ECCV 2018, Jun 2018
- Zoom-Net: Mining Deep Feature Interactions for Visual Relationship RecognitionIn Computer Vision – ECCV 2018, Jun 2018
- Avatar-Net: Multi-scale Zero-Shot Style Transfer by Feature DecorationIn 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018
- Practical Block-Wise Neural Network Architecture GenerationIn 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018
2017
- Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identificationIn 2017 IEEE International Conference on Computer Vision (ICCV), Jun 2017
- HydraPlus-Net: Attentive Deep Features for Pedestrian AnalysisIn 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017
- Crowded Scene Understanding by Deeply Learned Volumetric SlicesIEEE Transactions on Circuits and Systems for Video Technology, Mar 2017
- Learning Scene-Independent Group Descriptors for Crowd UnderstandingIEEE Transactions on Circuits and Systems for Video Technology, Jun 2017
- Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and FusionIn 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017
2016
- Slicing Convolutional Neural Network for Crowd Video UnderstandingIn 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016ISSN: 1063-6919
2015
- Deeply learned attributes for crowded scene understandingIn 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015
2014
- Scene-Independent Group Profiling in CrowdIn 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014