Course Description
In this class, students will learn the fundamental techniques of machine learning (ML) / reinforcement learning (RL) required to train multiagent systems to accomplish autonomous tasks in complex environments. Foundations include reinforcement learning, dynamical systems, control, neural networks, state estimation, and partially observed Markov decision processes (POMDPs). Core methods include Deep Q Networks (DQN), actorcritic methods, and derivativefree methods. Multiagent reinforcement learning topics include independent learners, actiondependent baselines, MADDPG, QMIX, shared policies, multiheaded policies, feudal reinforcement learning, switching policies, and adversarial training. The students will have the opportunity to implement the techniques learned on a multiagent simulation platform, called Flow, which integrates RL libraries and SUMO (a stateoftheart microsimulation software) on AWS EC2. The students may alternatively implement the techniques learned on their own platforms or platforms of their choice (in which case they are responsible for implementation). The class will teach applications of the ML/RL methods in the context of urban mobility and mixed autonomy, i.e., insertion of self driving vehicles in humandriven traffic. Thus the class will also includes an introduction to traffic modeling to enable the students to perform meaningful simulations, on benchmark cases as well as concrete calibrated models with field data.
Course Instructors
Alexandre Bayen 
Eugene Vinitsky 
Aboudy Kreidieh 
Yashar Zeiynali Farid 
Cathy Wu 
Prerequisites for this class

Proficiency in Python
All class assignments will be in Python (using numpy and Tensorflow and optionally Keras). There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Javascript) you will probably be fine.

College Calculus, Linear Algebra
You should be comfortable taking derivatives and understanding matrix vector operations and notation.

Basic Probability and Statistics
You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
Not required but helpful

Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either UC Berkeley CS 189/289 or Stanford CS 229 covers this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.
Artificial Intelligence
We will be covering advanced search methods. UC Berkeley CS188 covers the background. We will not assume knowledge of Markov Decision Processes and exact reinforcement learning methods.
Learning Outcomes
By the end of the class students should be able to:
 Define the key features of RL that distinguishes it from artificial intelligence and noninteractive ML (as assessed by homework).
 Given an application problem (e.g. from transportation, computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes, be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it, and justify your answer (as assessed by the project).
 Implement in code common RL algorithms such as imitation learning (as assessed by the homework).
 Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by homework).
 Identify key problems in vehicle transportation that are worth future study.
 Understand challenges in multiagent RL and be able to formulate research solutions to them.
Class Time and Location
Fall Semester (August 23  December ??, 2018)
Lecture: Tuesday, Thursday 3:305:00pm
Location: 531 Cory Hall
Course Schedule / Syllabus (Including Due Dates)
See the Course Schedule page.
Piazza
Textbooks
There is no official textbook for the class but a number of the supporting readings will come from:
 Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for free here and references will refer to the January 1 2018 draft available here.
Some other additional references that may be useful are listed below:
 Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]
 Traffic Flow Dynamics, Martin Treiber and Arne Kesting. [link]
 Reinforcement Learning: StateoftheArt, Marco Wiering and Martijn van Otterlo, Eds. [link]
 Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig. [link]
Grade Breakdown
 Assignment 1: 15%
 Assignment 2: 15%
 Assignment 3: 15%
 Assignment 4: 15%
 Course Project: 40%
 Proposal: 1%
 Milestone: 8%
 Poster Presentation: 10%
 Paper: 21%
Late Day Policy
 You can use 6 late days.
 A late day extends the deadline by 24 hours.
 You are allowed up to 2 late days per assignment. If you hand an assignment in after 48 hours, it will be worth at most 50% of the full credit. No credit will be given to assignments handed in after 72 hours — contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.
 You can use late days on the project proposal (up to 2) and milestone (up to 2). No late days are allowed for the poster presentation and final report. Any late days on the project writeup will decrease the potential score on the project by 25% of the full credit. To use a late day on the project proposal or milestone, it is allowable to pool late days between team members: in order words, one can use any single team member’s late day (e.g. team member A can use her late day, and team member B can use his late day, and that yields 2 total late days for the project proposal).
Homework submissions
Regrading Requests
 If you think that the course staff made a quantifiable error in grading your assignment or exam, then you are welcome to submit a regrade request. If you wish to do so, you must come in person to one of the graders for the assignment or exam question  the owners will be clearly stated on an assignment webpage or in the exam feedback. In considering whether to make a request, we encourage you to consider that even if the grading may seem overly strict to you, we are applying the same rubric to all students for fairness, so the strictness of the grading is not a suitable justification request for a regrade. Regrade requests will only be accepted for three days after assignments are returned.
 Note that while doing a regrade we may review your entire assignment, not just the part you bring to our attention (i.e. we may find errors in your work that we missed before).
Office Hours
All office hours will be held in McLaughlin 109 at TBD
Attendance
Attendance is not required but is encouraged. Lectures are not recorded. Sometimes we may do in class exercises or discussions and these are harder to do and benefit from by yourself.
Communication
We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Piazza for all questions related to lectures, homework and projects. When discussing solutions on Piazza, take care not to post words, code, or math that directly leads to solutions.
You will be awarded with up to 2% extra credit if you answer other students' questions in a substantial and helpful way on Piazza.