Course Schedule
Paper reading list and presenters
- Jan 27, Thu
- Course Overview (slides)
- Chen Sun
- Feb. 1, Tue
- Recap: CNNs and Transformers (slides)
- Chen Sun
- Presentation signup sheet
- Paper nomination form
- Feb. 3, Thu
- Overview: Self- and Cross-modal Learning (slides)
- Chen Sun
- (Background) How to Read a CS Research Paper by Philip Fong
- (Background) How to do research by Bill Freeman
- (Background) How to do write a good paper by Bill Freeman
- (Background) Novelty in Science by Michael Black
- (Background) Self-supervised learning: The dark matter of intelligence by Yann LeCun and Ishan Misra
- Feb. 4, Fri
- Due Presentation signup
- Feb. 8, Tue
- The Unreasonable Effectiveness of Data (Reading survey / Questions / Slides)
- Jorge, Koyena, Yipu
- Revisiting Unreasonable Effectiveness of Data in the Deep Learning Era
- A ConvNet for the 2020s
- (Background) The Unreasonable Effectiveness of Data
- (Background) Training data-efficient image transformers & distillation through attention
- (Background) Exploring Randomly Wired Neural Networks for Image Recognition
- (Background) NAS evaluation is frustratingly hard
- (Background) The bitter lesson
- Feb. 10, Thu
- Semi-supervised Learning (Reading survey / Questions / Slides)
- Cheng-You, Vivek
- Mean teachers are better role models
- MixMatch: A Holistic Approach to Semi-Supervised Learning
- (Background) Semi-Supervised Classification with Graph Convolutional Networks
- (Background) Inductive Representation Learning on Large Graphs
- (Background) Transfer Learning in a Transductive Setting
- Feb. 15, Tue
- Transfer Learning (Reading survey / Questions / Slides)
- Changcheng, Gabriel, Kangping
- Big Transfer (BiT): General Visual Representation Learning
- Rethinking Pre-training and Self-training
- (Background) A Survey on Transfer Learning
- (Background) Transfusion: Understanding Transfer Learning for Medical Imaging
- (Background) Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks
- (Background) A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
- (Background) Rethinking ImageNet Pre-training
- Feb. 17, Thu
- Few-shot Learning (Reading survey / Questions / Slides)
- Anessa, Reza, Yong
- Matching Networks for One Shot Learning
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- (Background) Prototypical Networks for Few-shot Learning
- (Background) Learning to Learn (Blog)
- (Background) Meta-Learning: Learning to Learn Fast (Blog)
- (Background) A Closer Look at Few-shot Classification
- (Background) Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?
- Feb. 22, Tue
- University holiday, no class
- Feb. 24, Thu
- Multitask Learning (Reading survey / Questions / Slides)
- Amir, Hyuk, Jinwoo, Leonard
- Taskonomy: Disentangling Task Transfer Learning
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Section 1, 2, 4)
- (Background) UberNet: Training a Universal Convolutional Neural Network
- (Background) On the Opportunities and Risks of Foundation Models
- (Background) ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
- Mar. 1, Tue
- AI Safety (Reading Survey / Questions / Slides)
- Anna, Mason, Will Yang
- Concrete Problems in AI Safety
- (Background) Deep reinforcement learning from human preferences
- (Background) AI safety via debate
- (Background) Avoiding Side Effects By Considering Future Tasks
- (Background) Objective Robustness in Deep Reinforcement Learning
- Mar. 3, Thu
- Transformer and its variants (Reading Survey / Questions / Slides1 / 2)
- George Zerveas, Kai
- Big Bird: Transformers for Longer Sequences
- Synthesizer: Rethinking Self-Attention in Transformer Models
- (Background) Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- (Background) MLP-Mixer: An all-MLP Architecture for Vision
- (Background) Linformer: Self-Attention with Linear Complexity
- (Background) Highly accurate protein structure prediction with AlphaFold
- Mar. 8, Tue
- Vision Transformers (1) (Reading Survey / Questions / Slides)
- Chace, Justin, Shijie
- Swin Transformer
- On the Relationship between Self-Attention and Convolutional Layers
- (Background) ViViT: A Video Vision Transformer
- (Background) VideoBERT: A Joint Model for Video and Language Representation Learning
- (Background) Video Action Transformer Network
- Mar. 10, Thu
- Vision Transformers (2) (Reading Survey / Questions / Slides)
- Avi, George Hu, Peilin
- End-to-End Object Detection with Transformers
- Perceiver: General Perception with Iterative Attention
- (Background) TrackFormer: Multi-Object Tracking with Transformers
- (Background) MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
- (Background) Perceiver IO: A General Architecture for Structured Inputs & Outputs
- (Background) Episodic Transformer for Vision-and-Language Navigation
- Mar. 11, Fri
- Due Final project signup
- Mar. 14, Mon
- Due Mid-term feedback
- Mar. 15, Tue
- Self-supervised Learning for NLP (Reading Survey / Questions / Slides)
- Catherine, William Jurayj, William Rudman
- Language Models are Few-Shot Learners
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- (Background) REALM: Retrieval-Augmented Language Model Pre-Training
- (Background) SpanBERT: Improving Pre-training by Representing and Predicting Spans
- (Background) RoBERTa: A Robustly Optimized BERT Pretraining Approach
- (Background) Human Language Understanding & Reasoning
- (Background) Do Large Language Models Understand Us?
- Mar. 17, Thu
- Self-supervised Learning for Images (Reading Survey / Questions / Slides)
- Sijie, Tian, Vadim
- BEiT: BERT Pre-Training of Image Transformers
- Representation Learning with Contrastive Predictive Coding
- (Background) Dimensionality Reduction by Learning an Invariant Mapping
- (Background) Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
- (Background) Masked Autoencoders Are Scalable Vision Learners
- (Background) Deep Clustering for Unsupervised Learning of Visual Features
- (Background) Towards the Generalization of Contrastive Self-Supervised Learning
- (Background) Bootstrap your own latent: A new approach to self-supervised Learning
- Mar. 22, Tue
- Invited Learning Structured Models of the World
- Thomas Kipf
- (Background) Object-Centric Learning with Slot Attention
- (Background) Conditional Object-Centric Learning from Video
- Mar. 24, Thu
- Project proposal (Master deck)
- Mar. 29, Tue
- Spring break
- Mar. 31, Thu
- Spring break
- Apr. 5, Tue
- Self-supervised Learning for Videos (Reading Survey / Slides)
- Bader, Ce, Trevor
- Time-Contrastive Networks: Self-Supervised Learning from Video
- Learning image representations tied to ego-motion
- (Background) Simulation as an engine of physical scene understanding
- (Background) Learning correspondence from the cycle-consistency of time
- (Background) Learning Temporal Dynamics from Cycles in Narrated Video
- Apr. 7, Thu
- Representation Learning for RL (Reading Survey / Slides)
- Aditya, Calvin, Haotian
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- (Background) R3M: A Universal Visual Representation for Robot Manipulation
- (Background) Understanding the World Through Action
- (Background) Learning Latent Plans from Play
- (Background) Learning Latent Dynamics for Planning from Pixels
- (Background) Control-Aware Representations for Model-based Reinforcement Learning
- (Background) Shaping Belief States with Generative Environment Models for RL
- (Background) Goal-Aware Prediction: Learning to Model What Matters
- Apr. 8, Fri
- Due Project proposal
- Apr. 12, Tue
- Invited Multimodal Learning (Slides)
- Arsha Nagrani
- (Background) Attention Bottlenecks for Multimodal Fusion
- (Background) Speech2Action: Cross-modal Supervision for Action Recognition
- Apr. 14, Thu
- 3D Computer Vision (Reading Survey / Slides)
- Arman, Jiahao, Mikhail, Rao
- MarrNet: 3D Shape Reconstruction via 2.5D Sketches
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
- (Background) Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
- (Background) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
- (Background) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
- Apr. 19, Tue
- Generative Modeling (Reading Survey / Slides)
- Michal, Nate, Yuanhao
- Neural Discrete Representation Learning
- Diffusion Models Beat GANs on Image Synthesis
- (Background) PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows
- (Background) Variational Graph Auto-Encoders
- (Background) What are Diffusion Models?
- (Background) Zero-Shot Text-to-Image Generation
- (Background) Denoising Diffusion Probabilistic Models
- Apr. 21, Thu
- Data and model bias (Reading Survey / Slides)
- Arun, Ghulam, Kunal, Pinar
- Beyond Accuracy: Behavioral Testing of NLP models with CheckList
- Equality of Opportunity in Supervised Learning
- (Background) Measuring and Reducing Gendered Correlations in Pre-trained Models
- (Background) Comparing Human and Machine Bias in Face Recognition
- Apr. 26, Tue
- Model interpretability (Reading Survey / Slides)
- Amanda, Usha, Zachary
- Do Vision Transformers See Like Convolutional Neural Networks?
- Acquisition of Chess Knowledge in AlphaZero
- (Background) BERT rediscovers the classical NLP pipeline
- (Background) A Primer in BERTology: What We Know About How BERT Works
- Apr. 28, Thu
- Future Prediction, Causality (Reading Survey / Slides)
- Alexander, Heejun, Peisen, Tiancheng
- Attention over learned object embeddings enables complex visual reasoning
- PHYRE: A New Benchmark for Physical Reasoning
- (Background) Machine Theory of Mind
- (Background) Shaking the foundations: delusions in sequence models for interaction and control
- Apr. 29, Fri
- Due Presentation Slot Signup
- May 3, Tue
- Final project office hours
- May 5, Thu
- Final project office hours
- May 10, Tue
- Final project presentations (Slides)
- May 12, Thu
- Final project presentations (Slides)
- May 13, Fri
- Due Project submission
- May 22, Fri
- Due Post-semester feedback
- Other
- Student Nominated Readings
- (Physics-informed ML) Physics-informed neural networks
- (Operator Learning) Learning nonlinear operators via DeepONet
- (Biologically-Inspired Learning) Training Spiking Neural Networks Using Lessons From Deep Learning