Future Trends

From Chapter 1 (CartPole) to Chapter 9 (GRPO), we walked through the core arc of modern reinforcement learning:

Q-learning and DQN: learning from trial-and-error,
policy gradients: optimizing behavior directly,
PPO: stable post-training for large models,
GRPO: using verifiable rewards to drive reasoning,
Agentic RL: moving from single-turn answers to multi-turn tool-using interaction.

But the RL story is not finished. In 2025-2026, several shifts have become increasingly clear:

RL is moving into the physical world (embodied intelligence).
RL is not only "training-time optimization" (it increasingly interacts with test-time search and planning).
RL is no longer only single-agent (multi-agent collaboration and self-play are re-emerging as central drivers).

This chapter does not attempt to cover every frontier direction. That is not realistic. Instead, we pick representative themes that connect directly back to the concepts you already learned in earlier chapters. The goal is to help you recognize recurring structure: the same foundations reappear under new labels.

Section	Core question
Embodied Intelligence	How does RL enter the physical world (perception, action, safety)?
Model-Based RL	Can a world model reduce real-world trial-and-error via planning and imagination?
Self-Play	Can models improve beyond human data by competing with themselves?
Multi-Agent RL for LLMs	How do role-specialized agents learn to collaborate and coordinate?
Offline RL	If you cannot explore online, how do you learn from historical data safely?
Scaling Trends	Where is the ceiling: training-time scaling, test-time scaling, process rewards?

We begin with the first step of RL entering the physical world:

Embodied Intelligence.

1. CartPole Balancing

2. DPO Preference Tuning

3. MDP and Value Functions

4. Deep Q-Networks

5. Policy-Based Methods

6. Actor-Critic

7. PPO

8. The RLHF Pipeline

9. Post-Training Alignment

10. Agentic RL

11. VLM Reinforcement Learning

12. Future Trends

B. RL Engineering Practice

C. Code Cheatsheet

E. Math Foundations for RL

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

Future Trends

E.1 Linear Algebra

E.2 Probability & Estimation

E.3 Calculus & Optimization

E.4 Information Theory

Future Trends ​

Future Trends