Skip to content

E.1 Mathematical Objects and Linear Algebra

Chapter 3 introduced the Bellman equation V(s)=R(s)+γsP(ss,a)V(s)V(s) = R(s) + \gamma\sum_{s'}P(s'|s,a)V(s'), which describes the value of a single state. In actual computation, however, three problems appear in sequence: how to express the equations for all states at once, how to approximate values when the state space is too large, and how to keep the iterative process stable. Module E.1 shows the linear algebra tools that answer each problem and how those tools build on one another.

Two-state Bellman equation diagram

Content Overview

ProblemDifficultyMathematical tool introducedKey formulaLink to Chapter 3
Too many equations1000 states = 1000 equationsVectors, matrices, linear systemsv = (I - gamma P)^-1 rMathematical core of DP
State space too largeToo many states for a value tableDot products, norms, function approximationv_hat(s) = w^T x(s)Mathematical core of DQN
Training stabilityTraining may diverge, explode, or driftEigenvalues, weighted norms, trust regionsrho(gamma P) <= gamma < 1, Delta theta^T F Delta theta <= deltaMathematical core of PPO

Reading Path

ArticleQuestion it answersCorresponding problem
E.1.1 Scalars, Vectors, and MatricesHow do we represent states, values, and transitions?Too many equations, basics
E.1.2 Matrix Form of the Bellman EquationCan 1000 Bellman equations be compressed into one?Too many equations
E.1.3 Dot Products, Norms, and Function ApproximationWhat if there are too many states to store? How do we measure update size?State space too large
E.1.4 Convergence, Eigenvalues, and Trust RegionsWill training explode? How can parameters be updated safely?Training stability
E.1.5 Formula Review and ExercisesRevisit Chapter 3 from this perspectiveFull review

Read E.1.1 through E.1.4 in order, then use E.1.5 for review and practice. If a concept is already familiar, you can jump directly to the corresponding article.

现代强化学习实战课程