FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Pengxiang Li*, Zhi Gao*, Bofei Zhang*, Tao Yuan, Yuwei Wu†, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li† NeurIPS Datasets and Benchmarks 2024 下载 查看更多
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma†, Yitao Liang† NeurIPS 2024 下载 查看更多
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Haozhe Zhao*, Xiaojian Ma*, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li†, Baobao Chang†, NeurIPS Datasets and Benchmarks 2024 下载 查看更多
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Yue Fan*, Xiaojian Ma*†, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li ECCV 2024 下载 查看更多
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li ICML 2024 下载 查看更多
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update Zhi Gao, Yuntao Du, Xintong Zhang, Xiaojian Ma, Wenjuan Han, Song-chun Zhu, Qing Li CVPR 2024 下载 查看更多