E.2 Probability, Expectation, and Random Estimation

Data in reinforcement learning comes from random interaction: the policy may choose actions randomly, and the environment may return random feedback. To understand this randomness, we need probability theory. This section follows the natural order of probability: first sample spaces, events, and random variables; then probability, conditional probability, expectation, and variance; and finally Monte Carlo estimation, importance sampling, and the Bellman expectation equation.

Random trajectories and expected value

Roadmap

Article	Mathematical path	Role in reinforcement learning
E.2.1 Probability, Conditional Probability, and Expectation	sample space -> event -> random variable -> probability -> expectation	Describe stochastic policies and stochastic environments
E.2.2 Random Variables, Returns, and State Values	random return -> conditional expectation -> variance	Define value functions and the stability of learning signals
E.2.3 Variance, Monte Carlo, and Sample Averages	sample mean -> incremental average -> importance sampling	Estimate unknown expectations from data
E.2.4 Trajectory Probability, Baselines, and GAE	trajectory probability -> baseline invariance -> accumulated TD errors	Connect policy gradients with advantage estimation
E.2.5 Bellman Expectation Equation	take expectations over actions, rewards, and next states layer by layer	Derive the full Bellman expectation equation
E.2.6 Summary, Formulas, and Exercises	formula review -> common pitfalls -> exercises	Review and check understanding

1. CartPole Balancing

2. DPO Preference Tuning

3. MDP and Value Functions

4. Deep Q-Networks

5. Policy-Based Methods

6. Actor-Critic

7. PPO

8. The RLHF Pipeline

9. Post-Training Alignment

10. Agentic RL

11. VLM Reinforcement Learning

12. Future Trends

B. RL Engineering Practice

C. Code Cheatsheet

E. Math Foundations for RL

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

E.2 Probability, Expectation, and Random Estimation

Roadmap

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

E.2 Probability, Expectation, and Random Estimation ​

Roadmap ​

E.2 Probability, Expectation, and Random Estimation

Roadmap