FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Pengxiang Li*, Zhi Gao*, Bofei Zhang*, Tao Yuan, Yuwei Wu†, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li† NeurIPS Datasets and Benchmarks 2024 下载 查看更多
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma†, Yitao Liang† NeurIPS 2024 下载 查看更多
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Haozhe Zhao*, Xiaojian Ma*, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li†, Baobao Chang†, NeurIPS Datasets and Benchmarks 2024 下载 查看更多
An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding Tong Wu, Yanpeng Zhao, Zilong Zheng† NeurIPS 2024 下载 查看更多
PhyRecon: Physically Plausible Neural Scene Reconstruction Junfeng Ni*, Yixin Chen*, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang NeurIPS 2024 下载 查看更多
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Yuxuan Wang, Alan Yuille, Zhuowan Li✉, and Zilong Zheng✉ CoLM 2024 下载 查看更多