Course Schedule
Paper reading list and presenters
- Jan. 27, Tue
- Course Overview (Slides)
- Chen
- How to Read a CS Research Paper by Philip Fong
- How to do research by Bill Freeman
- How to do write a good paper by Bill Freeman
- Novelty in Science by Michael Black
- How to speak (video) by Patrick Winston
- Jan. 29, Thu
- Deep Learning Recap (Slides)
- Chen
- Feb. 3, Tue
- The Unreasonable Effectiveness of Data (Slides)
- Chen
- Feb. 5, Thu
- Visual Concepts (Slides)
- Chen
- Feb. 5, Thu
- Due Presentation signup sheet
- Feb. 10, Tue
- Overview of Multimodal LLMs (Slides)
- Zitian Tang
- Feb. 12, Thu
- Generative AI for Robot Learning (Slides)
- Zilai Zeng
- Feb. 19, Thu
- Flow Matching and Normalizing Flows (Slides)
- Dr. Calvin Luo
- Feb. 19, Thu
- MP Mini Project
- Due on March 12
- Mini Project Handout
- Submission Form
- Feb. 24, Tue
- Teaching Video Models to Understand Physics Control (Slides)
- Nate Gillman
- Feb. 26, Thu
- “Emergent” Abilities in Large Pre-trained Models (Slides)
- Andrew, Daniel, Taj, and Woody
- Emergent Abilities of Large Language Models
- Are Emergent Abilities of Large Language Models a Mirage?
- Feb. 26, Thu
- FINAL Final Project Proposal
- Due on March 10
- Submission Form
- Mar. 3, Tue
- Few-shot and In-context Learning (Slides)
- Athulith, Benjamin, Kenneth, Vanessa, and Zheyu
- Reading Survey
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Function Vectors in Large Language Models
- Mar. 5, Thu
- The World After Transformers (1) (Slides)
- Armaan, Asher, Chaitanya, Jiayi, and Manan
- Reading Survey
- Vision Transformers Need Registers
- Efficiently Modeling Long Sequences with Structured State Spaces
- Mar. 10, Tue
- Quo Vadis, Computer Vision? (Slides)
- Akash, Benjamin, Faisai, and Lyfey
- Reading Survey
- VGGT: Visual Geometry Grounded Transformer
- V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
- Mar. 12, Thu
- Visual Understanding vs. Generation (Slides)
- Alexander, Evan, Ruthwik, Om, and Yinghua
- Reading Survey
- Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
- Diffusion Transformers with Representation Autoencoders
- Mar. 17, Tue
- Final Project Idea Pitch (1)
- Slide deck
- Mar. 19, Thu
- Final Project Idea Pitch (2)
- Slide deck
- Mar. 31, Tue
- Video Generation Meets the Laws of Physics (Slides)
- Aashish, Gary, Xiaoyan, Xijie, and Yuqiao
- Reading Survey
- WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
- Video models are zero-shot learners and reasoners
- Apr. 2, Thu
- World Models (Slides)
- Eric, Ethan, Ioanna, Lihao, and Mark
- Reading Survey
- Genie: Generative Interactive Environments
- Mastering Diverse Domains through World Models
- Apr. 3, Fri
- INVITED Learning World Models and Agents for High-Cost Environments
- Prof. Sherry Yang
- Apr. 7, Tue
- Videos, Language, and Robots (Slides)
- Arin, Chandradithya, Enyan, Peiyan, and Peter
- Reading Survey
- Learning to Play Minecraft with Video PreTraining (VPT)
- π∗0.6: a VLA That Learns From Experience
- Apr. 9, Thu
- INVITED Assessing Adaptive World Models in Machines with Novel Games
- Lance Ying
- Apr. 14, Tue
- Abstract Reasoning with LLMs (Slides)
- Apoorv Khandelwal
- Apr. 16, Thu
- The World After Transformers (2) (Slides)
- Akul, Harshit, Heejeong, Ronit, and Shravya
- Reading Survey
- Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
- Nested Learning: The Illusion of Deep Learning Architectures
- May 8, Fri
- Final project presentations (Lubrano 1 to 4 pm) (Slides)
- May 11, Mon
- Due Project submission (Form)