论文索引

More results...

NEP: Autoregressive lmage Editing via Next EditingToken Prediction

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding