Machine Learning Could Enable Automated Microreactor Control
A reinforcement learning framework might pave the way for autonomous nuclear reactors in remote or mobile applications.
University of Michigan researchers have developed a machine learning method that could bring autonomous control to nuclear microreactors. These compact systems deliver carbon-free power to remote locations and critical infrastructure.
The study explores how deep reinforcement learning, a trial-and-error AI training strategy, can adjust the power output of a novel microreactor design through real-time control of internal mechanisms.
The approach could enable smarter, more efficient reactors that don't need constant human oversight—a critical step toward practical deployment in harsh or inaccessible environments.
A mobile nuclear microreactor. Image used courtesy of Idaho National Laboratory
Why Microreactors Need Smarter Control
Microreactors like HolosGen’s Holos-Quad design are emerging as a flexible, scalable alternative to traditional nuclear plants. These sealed, high-temperature gas-cooled systems can fit in a standard shipping container, run for years without refueling, and be airlifted to off-grid sites. But their small power output (typically up to 20 MW thermal) makes operational efficiency a key factor for economic viability.
Up until now, staffing costs and control complexity have been major challenges. That’s where AI comes in.
Researchers used machine learning models in the Holos-Quad microreactor. Image used courtesy of University of Michigan/HolosGen LLC
Traditionally, reactor control has relied on proportional-integral-derivative (PID) controllers, which are simple, stable, and well-understood. But PID requires careful tuning and struggles to respond optimally in complex, nonlinear, or uncertain conditions. Enter reinforcement learning, which learns control policies by interacting with a simulated environment and improving decisions over time.
The reinforcement learning enables real-time control, according to Majdi Radaideh, the research article’s senior author.
Drums, Not Rods
Unlike conventional reactors using axial control rods, the Holos-Quad uses rotating neutron-absorbing drums around its core. Rotating a drum inward decreases reactor power by absorbing more neutrons; rotating it outward increases power. The challenge is synchronizing eight drums in response to dynamic load demands.
To handle this, the research team built a physics-based simulation of the reactor’s core, incorporating thermal feedback and xenon poisoning, factors that influence neutron population and reactor behavior over time. Using this environment, they trained multiple reinforcement learning agents to control the drums under changing power demands.
The novelty, however, lies in the multi-agent reinforcement learning approach. A dedicated agent controls each drum, but all agents learn from the shared reactor state. This structure mirrors the physical symmetry of the reactor itself, allowing the model to multiply its training efficiency.
Axial slice of the Holos-Quad design with control drums out (left) and control drums inserted (right). Coloring delineates objects and assemblies. Image used courtesy of Tunkle et al.
Smarter, Faster, More Efficient
The reinforcement learning framework was tested in three configurations:
- Single-agent RL, where one agent controlled all eight drums.
- Multi-agent RL, with one agent per drum.
- Traditional PID control as a baseline.
All controllers were tested on multiple power-following scenarios, including short transients with rapid power ramps, extended runs where xenon buildup plays a significant role, and noisy environments simulating imperfect sensor data.
In short-term scenarios, the RL agent cut tracking error by more than 50% compared to PID, with comparable or lower control effort. While PID slightly outperformed in very long-duration transients, RL maintained power output within a 1% error margin, even though it was only trained on short-term profiles.
And the multi-agent RL configuration stood out. It reached similar performance with at least 2x faster training than the single-agent RL system and achieved up to 150% lower actuation effort than PID, meaning less wear and greater efficiency.
Real-World Readiness and What’s Next
Despite the promising results, the RL framework is still confined to simulation. Before it can be applied in a real-world microreactor, the team said it needs to be validated against high-fidelity multiphysics models and eventually tested in experimental systems.
The long-term vision is a forward digital twin, a self-learning control loop that can monitor the physical system, predict its behavior, and adapt real-time control strategies.



