ESE 680, Fall 2019 – Schedule and Course Materials

Course Materials

There is no textbook for the course. Our discussions will be guided by papers, monographs, and lecture notes that are available online. The following incomplete list will grow:

[Recht19] A Tour of Reinforcement Learning: The View from Continuous Control, Recht 2019
[Viberg95] Subspace-based Methods for the Identification of Linear Time-invariant Systems, Viberg 1995
[MatniAndTu19] A Tutorial on Concentration Bounds for System Identification, Matni and Tu, 2019
[Rigollet] Lecture Notes on High-Dimensional Statistics, Rigollet
[DeanEtAl17] On the Sample Complexity of the Linear Quadratic Regulator, Dean, Mania, Matni, Recht, and Tu, 2017
[Wasserman] CMU Stats 705, Lecture 13: The Boostrap, Wasserman
[SarkarAndRakhlin19] Near optimal finite time identification of arbitrary linear dynamical systems, Sarkar and Rakhlin, 2019
[SimchowitzEtAl18] Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification, Simchowitz, Mania, Tu, Jordan, and Recht, 2018
[OymakAndOzay19] Non-asymptotic Identification of LTI Systems from a Single Trajectory, Oymak and Ozay, 2019
[TsiamisAndPappas19] Finite Sample Analysis of Stochastic System Identification, Tsiamis and Pappas, 2019
[DFT] Feedback Control Theory, Doyle, Francis, and Tannenbaum
[Lall] Stanford Engr210a, Lecture 17: LFTs and robustness, Lall
[LessardRechtPackard16] Analysis and design of optimization algorithms via integral quadratic constraints, Lessard, Recth, and Packard, 2016
[Jonsson] Lecture Notes on Integral Quadratic Constraints, Jonsson
[LeongDoyle16] Understanding Robust Control Theory Via Stick Balancing, Leong and Doyle, 2016
[LeongDoyle17] Effects of Delays, Poles, and Zeros on Time Domain Waterbed Tradeoffs and Oscillations, Leong and Doyle, 2017
[SmithDoyle92] Model Validation: A Connection between Robust Control and System Identification, Smith and Doyle, 1992
[PoollaEtAl94] A Time-Domain Approach to Model Validation, Poolla, Khargonekar, Tikku, Krause, and Nagpal, 1994
[Prajna05] Barrier Certificates for nonlinear model validation
[AndersonEtAl19] System Level Synthesis, Anderson, Doyle, Low, and Matni, 2019
[MatniEtAl19] From self-tuning regulators to reinforcement learning and back again, Matni, Proutiere, Rantzer, and Tu, 2019
[DannLattimoreBrunskill17] Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning, Dann, Lattimore, and Brunskill, 2017
[AbbasiYadkoriSzepesvari11] Regret Bounds for the Adaptive Control of Linear Quadratic Systems, Abbasi-Yadkori and Szepesari, 2011
[AbeilleLazaric18] Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems, Abeille and Lazaric, 2018
[BousquetElisseeff02] Stability and Generalization
[HardtRechtSinger16] Train faster, generalize better: Stability of stochastic gradient descent, Hardt, Recht, and Singer, 2016
[BartlettMendelson02] Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
[SrebroSridharanTewari10] Smoothness, Low Noise, and Fast Rates, Srebro, Sridharan, and Tewari, 2010
[SuttonBarto] Reinforcement Learning Sutton and Barto, 2017
[BradtkeYdstieBarto94] Adaptive linear quadratic control using policy iteration, Bradtke, Ydstie, and Barto, 1994
[DuEtAl2019] Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?, Du, Kakade, Wang, and Yang, 2019
[FazelGeKakadeMesbahi2019] Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator Fazel, Ge, Kakade, and Mesbahi, 2019
[TuRecht2018] The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint, Tu and Recht, 2018
[AmesEtAl19] Control Barrier Functions: Theory and Applications, Ames, Coogan, Egerstedt, Notomista, Sreenath, and Tabuada, 2019
[BerkenkampEtAl17] Safe Model-based Reinforcement Learning with Stability Guarantees, Berkenkamp, Turchetta, Schoellig, and Krause, 2017
[FazlyabRobeyMorariPappas19] Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks Fazlyab, Robey, Hassani, Morari, and Pappas, 2019
[FazlyabMorariPappas19a] Probabilistic Verification and Reachability Analysis of Neural Networks via Semidefinite Programming, Fazlyab, Morari, and Pappas, 2019
[FazlyabMorariPappas19b] Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming Fazlyab, Morari, and Pappas, 2019

Additional Resources

[Ljung] System Identification: Theory for the User, Ljung (survey paper)
[Vershynin] High-Dimensional Probability, Verhsynin
Sanjay Lall's Engr201a Robust Control course at Stanford
[ZhouDoyleGlover] Robust and Optimal Control, Zhou, Doyle, and Glover
[ZhouDoyle] Essentials of Robust Control, Zhou and Doyle
[DullerudPaganini] A course in robust control: a convex approach, Dullerud and Paganini
[RussoEtAl17] A Tutorial on Thompson Sampling, Russo et al, 2017
[Ioannou] Robust Adaptive Control, Ioannou, 1995
[ShalevSchwartzAndBenDavid] Understanding Machine Learning: from Theory to Algorithms
[BousquetEtAl] Introduction to Statistical Learning Theory
[CuckerSmale01] On the Mathematical Foundations of Learning
MIT's 9.520 Statistical Learning Theory and Applications
[Bertsekas] Reinforcement Learning and Optimal Control
[BersekasTsitsiklis] Neuro-dynamic Programming
[KrauthTuRecht2019] Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator, Krauth, Tu, and Recth, 2019
[GaussianProcesses] Gaussian Processes for Machine Learning, Rasmussen and Williams, 2006.

Schedule (subject to change)

Logistics

Lecture 1: Aug 27
- Welcome, logistics, and overview (Slides)
  - Reading: Sections 1-3 of Recht19
  - Background Poll
  - Sign up sheet

System Identification

Lecture 2: Aug 29
- System identification 1: Identification of Linear-Time-Invariant systems
  - Reading: Viberg95
  - Methods/algorithms: Subspace methods, Ho-Kalman
  - Scribe: Duc Nguyen, lecture notes

Lecture 3: Sep 03
- System identification 2: concentration bounds and finite-data guarantees, full state iid case
  - Reading: Sections 1-3 of MatniAndTu19, Ch1 of Rigollet Lecture Notes
  - Methods/algorithms: Ordinary Least Squares error analysis
  - Scribe: Zongyu Dai, lecture notes

Lecture 4: Sep 05
- System identification 3: finite-data guarantees (cont'd) & data-dependent bounds
  - Reading: Section 5 of MatniAndTu19, Prop 2.4 and Section 2.3 of DeanEtAl17, Wasserman
  - Methods/Algorithms: Ordinary Least Squares error analysis, The Bootstrap
  - Scribe: Kuk Jang, lecture notes

Lecture 5: Sep 10
- System identification 4: student led discussion, full state single trajectory case
  - Paper: Choose one of SarkarAndRakhlin19, SimchowitzEtAl18
  - Presenter: Bernadette K. Bucher

Lecture 6: Sep 12
- System identification 5: student led discussion, partial state single trajectory case
  - Paper: Choose one of OymakAndOzay19, TsiamisAndPappas19
  - Presenter: Alex Robey

Additional resources:
- [Ljung] System Identification: Theory for the User, Ljung (survey paper)
- [Vershynin] High-Dimensional Probability, Verhsynin

Control of Uncertain Systems

Lecture 7: Sep 17
- Control of Uncertain Systems 1: introduction to optimal/robust control, modeling uncertainty, small gain theorem
  - Reading: DFT Chapter 2, Lecture 17 of Lall
  - Methods/Algorithms: LQR, H-inf and L1 optimal control, LFT respresentation of uncertainty, small gain theorem
  - Scribe: Christopher Hsu, lecture notes

Lecture 8: Sep 19
- Control of Uncertain Systems 2: small gain theorem, a very brief introduction to the structured singular value (mu) and integral quadratic constraints (IQCs)
  - Reading: Chapter 8 of [DullerudPaganini], Sections 3.1-3.3 of LessardRechtPackard16, (optional reading: Jonsson IQC Lecture Notes)
  - Methods/Algorithms: small gain theorem, structured singular value, KYP Lemma, IQCs
  - Scribe: Kendall Queen, lecture notes

Lecture 9: Sep 24
- Control of Uncertain Systems 3: student led presentation on fundamental limits of robust control
  - Paper: Both LeongDoyle16 and LeongDoyle17
  - Presenter: Anton Xue

Lecture 10: Sep 26
- Control of Uncertain Systems 4: student led presentation on model (in)validation from a robust control perspective
  - Paper: either SmithDoyle92 or PoollaEtAl94 and Prajna05
  - Presenter: Laura Jarin-Lipschitz

Additional Resources:
- Sanjay Lall's Engr201a course at Stanford
- [ZhouDoyleGlover] Robust and Optimal Control, Zhou, Doyle, and Glover
- [ZhouDoyle] Essentials of Robust Control, Zhou and Doyle

Model-based control of learned systems

Lecture 11: Oct 01
- Model-based control of learned systems 1: interpretable robust control with System Level Synthesis and end-to-end bounds for learning to control an unknown linear dynamical system
  - Reading: Sections 2, 3, 4.0 and 4.5 of AndersonEtAl19, Sections 3 and 4 of DeanEtAl17
  - Methods/Algorithms: system level synthesis, quantitative performance bounds, end-to-end bounds
  - Scribe: Haimin Hu, lecture notes

Lecture 12: 0ct 03
- Model-based control of learned systems 2: PAC, regret, and beyond, and what do we know about learning to control the linear quadratic regulator
  - Reading: Section 3 of MatniEtAl19, Sections 1 and 2 of DannLattimoreBrunskill17
  - Methods/Algorithms: episodic and single-trajecotry PAC and Regret bounds
  - Scribe: Klayton Wittler, lecture notes

Lecture 13: Oct 08
- Model-based control of learned systems 3: student led presentation on Optimism in the Face of Uncertainty (OFU) for LQR
- Paper: AbbasiYadkoriSzepesvari11
- Presenter: Shaoru Chen

Lecture 14: Oct 15
- Model-based control of learned systems 4: student led presentation on Thompson Sampling for LQR
- Paper: AbeilleLazaric18
- Presenter: Rebecca Li

Additional Resources:
- [RussoEtAl17] A Tutorial on Thompson Sampling, Russo et al, 2017
- [Ioannou] Robust Adaptive Control, Ioannou, 1995

Learning Theory

Lecture 15: Oct 17
- Learning theory 1: Empirical Risk Minimization and Uniform Convergence
  - Reading: Chapters 2-4 ShalevSchwartzAndBenDavid,
  - Methods/Algorithms: ERM, uniform convergence for bounded loss functions and finite hypothesis classes, and bounded and Lipschitz loss functions and compact hypothesis classes.
  - Scribe: Shuo Li, lecture notes

Lecture 16: Oct 22
- Learning theory 2: Algorithmic Stability and Stochastic Gradient Descent
  - Reading: BousquetElisseeff02, HardtRechtSinger16
  - Methods/Algorithms: Generalization through Algorithmic Stability, Stability and generalization of SGD
  - Scribe: Jialin Mao, lecture notes

Lecture 17: Oct 24
- Learning theory 3: student led presentation on Rademacher and Gaussian Complexities for Risk Bounds
  - Paper: BartlettMendelson02
  - Presenter: Alexandre Amice

Lecture 18: Oct 29
- Learning theory 4: student led presentation on Smoothness, Low Noise, and Fast Rates
  - Paper: SrebroSridharanTewari10
  - Presenter: Han Wang

Additional Resources:
- [BousquetEtAl] Introduction to Statistical Learning Theory
- [CuckerSmale01] On the Mathematical Foundations of Learning
- MIT's 9.520 Statistical Learning Theory and Applications

Model Free Methods

Lecture 19: Oct 31
- Model free methods 1
  - Reading: SuttonBarto Chapter 6.1-6.5, BradtkeYdstieBarto94, Optional: KrauthTuRecht2019
  - Methods/Algorithms: TD-learning, Q-learning, policy iteration
  - Scribe: Raphael Van Hoffelen, lecture notes

Lecture 20: Nov 05
- Model free methods 2: student led presentation on if good representations are sufficient for efficient reinforcement living.
  - Paper: DuEtAl2019
  - Presenter: Karl Schmeckpeper

Lecture 21: Nov 07
- Model free methods 3
  - Reading: Sections 3.3, 4.2, 5 of Recht19, FazelGeKakadeMesbahi2019, Optional: a great talk by Maryam Fazel on gradient based methods for linear control from L4DC 2019.
  - Methods/Algorithms: REINFORCE, Policy Gradient, Random Search
  - Scribe: Walker Gosrich, lecture notes

Lecture 22: Nov 12
- Model free methods 4: student led presentation on the gap between model-free and model-based methods for LQR
  - Reading: TuRecht2018
  - Presenter: Maria-Elisabeth Tzes

Additional Resources:
- [Bertsekas] Reinforcement Learning and Optimal Control
- [BersekasTsitsiklis] Neuro-dynamic Programming
- [KrauthTuRecht2019] Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator, Krauth, Tu, and Recth, 2019

Safe Learning and Control

Lecture 23: Nov 14
- Safe learning and control 1
  - Reading: Basic Lyapunov Theory I, Basic Lyapunov Theory II, Control Lyapunov Functions
  - Methods/Algorithms: Lyapunov functions, Control Lyapunov functions, Global Asymptotic Stability, Global Exponential Stability
  - Scribe: Siddharth Singh,

Lecture 24: Nov 19
- Safe learning and control 2
  - Reading: Estimation, Regression using Gaussian Processes
  - Methods/Algorithms: Gaussian Processes
  - Scribe: Alena Rodionova, lecture notes

Lecture 25: Nov 21
- Safe learning and control 3: student led presentation on control barrier functions
  - Paper: AmesEtAl19
  - Presenter: Hyunjoo Oh

Lecture 26: Nov 26
- Safe learning and control 4: student led presentation on safe reinforcement learning with stability guarantees
  - Reading: BerkenkampEtAl17
  - Presenter: Mengyuan Li

Lecture 26*: Nov 27
- Bonus lecture on safety and verification of Deep Neural Nets
  - Reading: FazlyabRobeyMorariPappas19 FazlyabMorariPappas19a, FazlyabMorariPappas19b
  - Presenter: Mahyar Fazlyab

Final Project Presentations

Lecture 27: Dec 03
- Final project presentations 1

Lecture 28: Dec 05
- Final project presentations 2