STAT547U: Topics in Deep Learning Theory

Jan-Feb 2025

This course takes place in the traditional, ancestral and unceded territory of the xʷməθkʷəy̓əm (Musqueam). It is important to reflect upon this fact as we study, work, and grow on this land. (https://indigenous.ubc.ca.)

Course Overview

Description. The success of neural networks defies classical statistical learning theory. While a detailed analysis of neural networks remains theoretically intractable, recent analyses of high-dimensional linear regression and infinite-width neural networks have shed light on

  1. why neural networks optimize well, despite the inherent non-convexity of training, and
  2. why neural networks generalize well, despite their overparameterized nature.

This course will be a self-contained exploration of recent theoretical results in the theory of neural networks and high-dimensional regression. Students will learn from lectures that simplify complex mathematics and recent research papers that prove theorems in full technical detail. Students will gain familiarity with existing research, common problem setups and simplifying assumptions, and common techniques used to prove theoretical results.

Learning objectives. If we're successful, you will:

  • articulate the limitations of classical learning theory methods,
  • understand why neural network optimize well despite non-convexity,
  • understand why high-dimensional linear models generalize despite overparameterization,
  • understand how to approximate neural networks with linear models,
  • know how to set up high-dimensional asymptotic proofs,
  • be fluent with the basic mathematical techniques underpinning neural network theory (linear algebra, probability theory, etc), and
  • become familiar with advanced mathematical techniques necessary for proofs (chaining arguments, random matrix theory, etc.), and
  • gain confidence at reading theoretical machine learning papers.

Prerequisites. If you are worried that you won't satisfy the following prerequisites, please speak to me after the first class:

  • one upper-division course in statistics or machine learning (e.g. STAT 460 or CS 540),
  • one course in analysis (e.g. MATH 320),
  • fluency with linear algebra, and
  • (recommended) one upper-division course in probability (e.g. STAT 547C).

Values. This course will be a safe and inclusive learning environment for all. If you are treated unfairly or disrespectfully—whether by another student or myself—please (1) contact me directly (if you feel comfortable), (2) submit an anonymous feedback form, or (3) work with the UBC ombudsperson office. I also aim to accommodate neurodiversity, mental health, and access needs within the course. Please contact me for special accommodation requests or take advantage of UBC's student resources.

Instructor: Geoff Pleiss

About me. I am an assistant professor in the stats department. My research generally encompasses machine learning, with an emphasis on uncertainty quantification and decision making. I work on neural networks—both from theoretical and methodological perspectives— Bayesian optimization, and spatiotemporal modelling with Gaussian processes.

Office hours. See Canvas for details. I'll also stick around to chat for ~10 minutes after class.

How to contact me. Send me a message through Canvas, the UBC Stats Slack (preferred, if you have access), or email (only if you absolutely need to). Please note that I will not respond on weekends or after 6 PM on weekdays.

Learning Activities and Assessment

Neural network theory is a challenging subject that cannot be learned passively. If you plan to just sit through lectures, you will likely not come away with an understanding of the material.

You will learn through the following activities:

Lecture

Class is where we will cover most of the technical content in this course. I will present the material in self-contained lectures. Again, you cannot learn the material simply by sitting through lectures. You are expected to read material before class (see below) and to actively participate with questions and comments.

Readings and written summaries

Most lectures will have required readings that you are expected to complete before class. These readings will be a mix of book chapters and some recent (accessible) research and review papers.

To further facilitate your understanding, you are required to write a half-page summary of each reading (to be uploaded to Canvas). Each summary should include exactly 4 paragraphs:

  1. Setting (problem formulation, assumptions, etc.)
  2. The Goal and Main Result (what is trying to be proved, what other papers have already proved, etc.)
  3. Description of Mathematical Techniques (what types of math are used to arrive at each theoretical result)
  4. Points of Confusion (what were you not able to understand)

You should spend roughly 2 hours on each reading and 1 hour on each written summary.

Diagnostic assignment

There will be a single problem set for this course, which is due on Canvas at the beginning of the second week of class. This problem set aims to gauge your fluency with linear algebra and probability theory while also setting up the material for the first few lectures. If you give this diagnostic a reasonable effort, you will be well-prepared for the mathematics in this course. If you struggle with it, you will struggle with the rest of the material.

Final paper reading assignment

The final assessment for this course involves thoroughly reading a modern research paper on neural network theory and a 30 minute oral examination to assess your understanding.

You will select a paper from a predetermined list. All of the papers are quite challenging, with advanced mathematical techniques that will not be covered in class. You will need to spend significant time on the paper (read: 2+ weeks); reading it multiple times and researching the background mathematics necessary to understand the paper.

During the oral examination, I will ask you about the following aspects of the paper:

  • the problem being solved,
  • related work (what has been studied before),
  • the problem setting,
  • the main theoretical results,
  • sketches of the proof, and
  • limitations of the analysis.
  • To ensure you're on track, there will be a written intermediate check-in assignment, in which you are expected to write a 1-page summary of the paper you have chosen. It should follow the same format as the written summaries of the readings, with the same 4 paragraphs. Each paragraph will be graded on a 2.5-point scale, for a total of 10 points. (The rubric will be the same as the written summaries of the readings, with everything scaled by 1.25.)

    Assessment

    The grades and weightings target the following scale:

    • An A- represents a cursory understanding of NN theory (e.g. comprehending the lecture material but not the readings).
    • An A represents an in-depth understanding of NN theory (e.g. comprehending the lecture material and readings to a great extent).
    • An A+ represents that you are ready to perform research on NN theory (e.g. going above and beyond the course materials).
    CategoryContributionNotes
    Class participation10%5%: attending class regularly, 5%: engaging during lecture. (Let me know if you need to miss class.)
    Diagnostic assignment15%See description above. It will be worth a total of 60 points, scaled to be worth 15% of your final grade.
    Reading summaries40%

    See description above. There are 6 written summaries, each worth 8 points (2 points per paragraph), for a total of 48 points. Your grade will be min(40, total_points).

    Each paragraph will be graded on the following scale: (1 points) demonstrates that you skimmed the reading, (2 points) demonstrates that you read the paper/chapter thoroughly.

    Final paper reading assignment35%See description above. The intermediate check-in will be 10 points, and the oral presentation will be 25 points.

    Rubrics for all assignments will be available on Canvas.

    Concessions

    Please contact me as soon as possible if you will have trouble completing an assignment on time and/or participating in class. I understand the effects of life events, health circumstances (mental or physical), and scheduling, and I will work with you to best structure your learning/assessment. Please also refer to UBC's policies on academic concessions.

    Resources

    Canvas

    We will use Canvas for 1) announcements, 2) discussions, and 3) uploading assignments.

    Readings (required and optional)

    Please see the schedule for required readings and a bibliography of additional papers related to each lecture.

    Other References

    Policies and Values

    UBC provides resources to support student learning and maintaining healthy lifestyles but recognizes that crises sometimes arise. There are additional resources to access, including those for survivors of sexual violence. UBC values respect for the person and ideas of all academic community members. Harassment and discrimination are not tolerated, nor is suppression of academic freedom. UBC provides appropriate accommodation for students with disabilities and religious, spiritual and cultural observances. UBC values academic honesty, and students are expected to acknowledge the ideas generated by others and to uphold the highest academic standards in all of their actions. Details of the policies and how to access support are available here.