Course Description

In this class, students will learn the fundamental techniques of machine learning (ML) / reinforcement learning (RL) required to train multi-agent systems to accomplish autonomous tasks in complex environments. Foundations include reinforcement learning, dynamical systems, control, neural networks, state estimation, and partially observed Markov decision processes (POMDPs). Core methods include Deep Q Networks (DQN), actor-critic methods, and derivative-free methods. Multi-agent reinforcement learning topics include independent learners, action-dependent baselines, MADDPG, QMIX, shared policies, multi-headed policies, feudal reinforcement learning, switching policies, and adversarial training. The students will have the opportunity to implement the techniques learned on a multi-agent simulation platform, called Flow, which integrates RL libraries and SUMO (a state-of-the-art microsimulation software) on AWS EC2. The students may alternatively implement the techniques learned on their own platforms or platforms of their choice (in which case they are responsible for implementation). The class will teach applications of the ML/RL methods in the context of urban mobility and mixed autonomy, i.e., insertion of self driving vehicles in human-driven traffic. Thus the class will also includes an introduction to traffic modeling to enable the students to perform meaningful simulations, on benchmark cases as well as concrete calibrated models with field data.

Course Instructors

Alexandre Bayen

Eugene Vinitsky

Aboudy Kreidieh

Yashar Zeiynali Farid

Cathy Wu

Prerequisites for this class

  • Proficiency in Python

    All class assignments will be in Python (using numpy and Tensorflow and optionally Keras). There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Javascript) you will probably be fine.

  • College Calculus, Linear Algebra

    You should be comfortable taking derivatives and understanding matrix vector operations and notation.

  • Basic Probability and Statistics

    You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.

Not required but helpful

  • Foundations of Machine Learning

    We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either UC Berkeley CS 189/289 or Stanford CS 229 covers this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.

    Artificial Intelligence

    We will be covering advanced search methods. UC Berkeley CS188 covers the background. We will not assume knowledge of Markov Decision Processes and exact reinforcement learning methods.

Learning Outcomes

By the end of the class students should be able to:

  • Define the key features of RL that distinguishes it from artificial intelligence and non-interactive ML (as assessed by homework).
  • Given an application problem (e.g. from transportation, computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes, be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it, and justify your answer (as assessed by the project).
  • Implement in code common RL algorithms such as imitation learning (as assessed by the homework).
  • Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by homework).
  • Identify key problems in vehicle transportation that are worth future study.
  • Understand challenges in multi-agent RL and be able to formulate research solutions to them.

Class Time and Location

Fall Semester (August 23 - December ??, 2018)
Lecture: Tuesday, Thursday 3:30-5:00pm
Location: 531 Cory Hall

Course Schedule / Syllabus (Including Due Dates)

See the Course Schedule page.



There is no official textbook for the class but a number of the supporting readings will come from:

  • Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for free here and references will refer to the January 1 2018 draft available here.

Some other additional references that may be useful are listed below:

  • Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]
  • Traffic Flow Dynamics, Martin Treiber and Arne Kesting. [link]
  • Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds. [link]
  • Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig. [link]

Grade Breakdown

  • Assignment 1: 15%
  • Assignment 2: 15%
  • Assignment 3: 15%
  • Assignment 4: 15%
  • Course Project: 40%
    • Proposal: 1%
    • Milestone: 8%
    • Poster Presentation: 10%
    • Paper: 21%

Late Day Policy

  • You can use 6 late days.
  • A late day extends the deadline by 24 hours.
  • You are allowed up to 2 late days per assignment. If you hand an assignment in after 48 hours, it will be worth at most 50% of the full credit. No credit will be given to assignments handed in after 72 hours — contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.
  • You can use late days on the project proposal (up to 2) and milestone (up to 2). No late days are allowed for the poster presentation and final report. Any late days on the project writeup will decrease the potential score on the project by 25% of the full credit. To use a late day on the project proposal or milestone, it is allowable to pool late days between team members: in order words, one can use any single team member’s late day (e.g. team member A can use her late day, and team member B can use his late day, and that yields 2 total late days for the project proposal).

Homework submissions

Regrading Requests

  • If you think that the course staff made a quantifiable error in grading your assignment or exam, then you are welcome to submit a regrade request. If you wish to do so, you must come in person to one of the graders for the assignment or exam question -- the owners will be clearly stated on an assignment webpage or in the exam feedback. In considering whether to make a request, we encourage you to consider that even if the grading may seem overly strict to you, we are applying the same rubric to all students for fairness, so the strictness of the grading is not a suitable justification request for a regrade. Regrade requests will only be accepted for three days after assignments are returned.
  • Note that while doing a regrade we may review your entire assignment, not just the part you bring to our attention (i.e. we may find errors in your work that we missed before).

Office Hours

All office hours will be held in McLaughlin 109 at TBD


Attendance is not required but is encouraged. Lectures are not recorded. Sometimes we may do in class exercises or discussions and these are harder to do and benefit from by yourself.


We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Piazza for all questions related to lectures, homework and projects. When discussing solutions on Piazza, take care not to post words, code, or math that directly leads to solutions.

You will be awarded with up to 2% extra credit if you answer other students' questions in a substantial and helpful way on Piazza.


Announcements will be posted via email.

Class Website