About

Table of contents

  1. What’s this course?
  2. Prerequisites
    1. Linear Algebra
    2. Multivariable Calculus
    3. Probability Theory and Statistics
  3. Logistics
  4. Resources
  5. Assessment
  6. Additional Information

What’s this course?

This course is a graduate-level introduction to machine learning. We try to present machine learning as a story where many algorithmic techniques drop out of a common statistical learning framework.

The course covers a wide variety of topics in machine learning and statistical modeling. While mathematical methods and theoretical aspects will be covered, the primary goal is to provide students with the tools and principles needed to solve the data science problems found in practice. This course will serve as a foundation of knowledge on which more advanced courses and further independent study can build. A tentative syllabus for the course can be found here.

A key “subplot” of the course is the proliferation of data as historically driving machine learning progress and how it has obviated considerations of bias-variance “tradeoffs”. As we’ve hit upon more and more available data to train models, we could relax our inductive biases more and more. Another goal of this course is to make clear how the proliferation of large-scale data has naturally led us to our current, modern techniques.

If you’re a student, all content, logistics, and materials will be available on this page for ease of access (check here first instead of Brightspace). All course announcements will be on Ed Discussion. Please see Course Content for the lecture material and problem sets.

Acknowledgment: Much of the material of this course (especially Weeks 1-11) are adapted from DS-GA 1003 developed originally by David S. Rosenberg and later adapted by He He, Tal Linzen, Mengye Ren, and others.

Feedback? This is the first time Sam and Nick are teaching this particular course. To this end, if there’s anything I can do to help you learn better, do not hesitate to contact us through Ed or email or use our Anonymous Feedback Form.

Prerequisites

This course requires some basic introductory knowledge of machine learning (ML) at the level of understanding the basic ML pipeline from a “black-box” perspective (i.e., train-test split, cross-validation, evaluating a model, etc.). This should be familiar if students have taken DS-GA 1001 and DS-GA 1002; if there are concerns, please email the instructors.

  • Solid mathematical background. Equivalent to a 1-semester undergraduate course in: linear algebra, multivariable calculus, and statistics.
  • Programming background. Ability to program in Python is required for most assignments.
  • DS-GA 1001: Introduction to Data Science (or equivalent)
  • DS-GA-1002: Probability and Statistics for Data Science (or equivalent)
  • Recommended, but not required: At least one advanced, proof-based mathematics course.

Machine learning is a confluence of different subjects, but the three most important foundational mathematical subjects for understanding machine learning are: linear algebra, multivariable calculus, and probability. A free recommended resource for refreshing these subjects is Mathematics for Machine Learning by Deisenroth, Faisal, and Ong.

Sam also designed a course the past two summers at Columbia meant to give students a deeper understanding of these prerequisites (given that they have already taken them and would like to progress to graduate-level machine learning). The entire course could be found at this page: Math for ML, and lecture videos can be found in “Video Recordings” on this page.

Several other resources for brushing up on these subjects are:

Linear Algebra

Multivariable Calculus

Probability Theory and Statistics

Logistics

  • Lecture Time: Tuesdays 2:45-4:45PM (in-person)
    • Lecture Location: 36 E 8th St (Cantor Film Ctr) Room 200
  • Lab Time: Thursdays 7:10-8PM (in-person)
    • Lab Location: 238 Thompson St (GCASL) Room C95
  • Instructor Office Hours:
    • Sam: Tuesdays 5:00pm - 6:00pm (after class in CDS 242); Wednesdays 1:00pm - 2:00pm (CDS 242)
    • Nick: Wednesdays 3:00pm - 4:00pm (CDS 617)
    • For section leader office hours, check the Staff page of the site
    • For changes in office hours, please keep an eye out on Ed and the Calendar page of the site.
  • Announcements and Discussion: All course announcements and discussion will be handled on Ed Discussion. Instead of emailing the instructor and instructional staff, please post your questions on Ed. We will also be using Ed for all class-related announcements, so please check your email and Ed Discussion frequently to keep up to date with any class logistics/changes/etc.

Resources

The course does not have any official or required textbooks. All slides will be published before each lecture in Course Content so students can follow along during lecture. However, we recommend the following optional resources:

We will sometimes post recommended optional reading from these textbooks or a free online source in the Course Content page of the site.

Assessment

There will be a total of 6-7 biweekly homework assignments. We will also have a midterm exam, as well as a final project. The grading breakdown is as follows:

  • Homeworks: 20%
  • Midterm Exam: 35%
  • Final Project: 35%
  • Lab Attendance: 10%

Homeworks. There will be a total of 6-7 biweekly homework assignments with a mix of theoretical problems and coding problems. For details about the homework assignments, please see the Homework page of this site.

Midterm Policy. The in-class midterm during the usual lecture slot on Tuesday, March 10, 2026 2:45pm - 4:45pm ET. Please make sure you are available in-person on this day!

Due to resource constraints, we will not be able to allow make-up midterms for those who cannot attend lecture on that day, so missing the midterm will result in a zero for the midterm. If you must miss the midterm due to an unexpected emergency or extenuating circumstance (and do not want to continue the course with a grade of zero on the midterm), you should bring this up with the instructors as soon as possible so we can possibly arrange assigning you a grade of “Incomplete.”

Lab Attendance. Although lecture attendance is optional (though strongly recommended), lab attendance is an easy part of your final grade. For each lab you attend, you will get 1 point; the lab attendance portion of your grade is then simply the number of labs you attended divided by 10. There are more than 10 labs in the course, so you can miss some of them and still get full credit. Attending more than 10 labs will give you a slight boost to your grade, as the 10% of your grade dedicated to lab attendance will weight X/10 labs, where X > 10. We will be tallying lab attendance in-person through a QR code – it is your responsibility to make sure that you sign in at each lab session you attend.

Final Project. Details about the final project can be found on the Project page.

Additional Information

Community Guidelines. Students, staff, and instructors are all expected to abide by the CDS community member statement.

Moses Center for Disabilities. If you are student with a disability who is requesting accommodations, please contact New York University’s Moses Center for Students with Disabilities (CSD) at 212-998-4980 or mosescsd@nyu.edu. You must be registered with CSD to receive accommodations. Information about the Moses Center can be found here.