E.1 Mathematical Objects and Linear Algebra

Chapter 3 introduced the Bellman equation $V(s) = R(s) + \gamma\sum_{s'}P(s'|s,a)V(s')$ , which describes the value of a single state. In actual computation, however, three problems appear in sequence: how to express the equations for all states at once, how to approximate values when the state space is too large, and how to keep the iterative process stable. Module E.1 shows the linear algebra tools that answer each problem and how those tools build on one another.

Two-state Bellman equation diagram

Content Overview

Problem	Difficulty	Mathematical tool introduced	Key formula	Link to Chapter 3
Too many equations	1000 states = 1000 equations	Vectors, matrices, linear systems	v = (I - gamma P)^-1 r	Mathematical core of DP
State space too large	Too many states for a value table	Dot products, norms, function approximation	v_hat(s) = w^T x(s)	Mathematical core of DQN
Training stability	Training may diverge, explode, or drift	Eigenvalues, weighted norms, trust regions	rho(gamma P) <= gamma < 1, Delta theta^T F Delta theta <= delta	Mathematical core of PPO

Reading Path

Article	Question it answers	Corresponding problem
E.1.1 Scalars, Vectors, and Matrices	How do we represent states, values, and transitions?	Too many equations, basics
E.1.2 Matrix Form of the Bellman Equation	Can 1000 Bellman equations be compressed into one?	Too many equations
E.1.3 Dot Products, Norms, and Function Approximation	What if there are too many states to store? How do we measure update size?	State space too large
E.1.4 Convergence, Eigenvalues, and Trust Regions	Will training explode? How can parameters be updated safely?	Training stability
E.1.5 Formula Review and Exercises	Revisit Chapter 3 from this perspective	Full review

Read E.1.1 through E.1.4 in order, then use E.1.5 for review and practice. If a concept is already familiar, you can jump directly to the corresponding article.

1. CartPole Balancing

2. DPO Preference Tuning

3. MDP and Value Functions

4. Deep Q-Networks

5. Policy-Based Methods

6. Actor-Critic

7. PPO

8. The RLHF Pipeline

9. Post-Training Alignment

10. Agentic RL

11. VLM Reinforcement Learning

12. Future Trends

B. RL Engineering Practice

C. Code Cheatsheet

E. Math Foundations for RL

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

E.1 Mathematical Objects and Linear Algebra

Content Overview

Reading Path

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

E.1 Mathematical Objects and Linear Algebra ​

Content Overview ​

Reading Path ​

E.1 Mathematical Objects and Linear Algebra

Content Overview

Reading Path