Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Zhi Gao*, Bofei Zhang*, Pengxiang Li*, Xiaojian Ma, Yue Fan, Tao Yuan, Yuwei Wu✉, Yunde Jia, Song-Chun Zhu, Qing Li✉ ICLR 2025 下载 查看更多
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge Yuntao Du*, Kailin Jiang*, Zhi Gao, Chenrui Shi, Zilong Zheng✉, Siyuan Qi, and Qing Li✉ ICLR 2025 下载 查看更多
Robust Data Clustering with Outliers via Transformed TensorLow-Rank Representation Tong Wu International Conference on Artificial Intelligence and Statistics 2024 下载 查看更多
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Pengxiang Li*, Zhi Gao*, Bofei Zhang*, Tao Yuan, Yuwei Wu†, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li† NeurIPS Datasets and Benchmarks 2024 下载 查看更多
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma†, Yitao Liang† NeurIPS 2024 下载 查看更多
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Haozhe Zhao*, Xiaojian Ma*, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li†, Baobao Chang†, NeurIPS Datasets and Benchmarks 2024 下载 查看更多