Publications
Publications in reversed chronological order.
2025
- Hierarchical Message-Passing Policies for Multi-Agent Reinforcement LearningTommaso Marzi, Cesare Alippi, and Andrea CiniArxiv Preprint, 2025
Decentralized Multi-Agent Reinforcement Learning (MARL) methods allow for learning scalable multi-agent policies, but suffer from partial observability and induced non-stationarity. These challenges can be addressed by introducing mechanisms that facilitate coordination and high-level planning. Specifically, coordination and temporal abstraction can be achieved through communication (e.g., message passing) and Hierarchical Reinforcement Learning (HRL) approaches to decision-making. However, optimization issues limit the applicability of hierarchical policies to multi-agent systems. As such, the combination of these approaches has not been fully explored. To fill this void, we propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies. We adopt the feudal HRL framework and rely on a hierarchical graph structure for planning and coordination among agents. Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level. To learn hierarchical multi-agent policies, we design a novel reward-assignment method based on training the lower-level policies to maximize the advantage function associated with the upper levels. Results on relevant benchmarks show that our method performs favorably compared to the state of the art.
2024
- Feudal Graph Reinforcement LearningTommaso Marzi, Arshjot Singh Khehra, Andrea Cini, and Cesare AlippiTransactions on Machine Learning Research, 2024
Graph-based representations and message-passing modular policies constitute prominent approaches to tackling composable control problems in reinforcement learning (RL). However, as shown by recent graph deep learning literature, such local message-passing operators can create information bottlenecks and hinder global coordination. The issue becomes more serious in tasks requiring high-level planning. In this work, we propose a novel methodology, named Feudal Graph Reinforcement Learning (FGRL), that addresses such challenges by relying on hierarchical RL and a pyramidal message-passing architecture. In particular, FGRL defines a hierarchy of policies where high-level commands are propagated from the top of the hierarchy down through a layered graph structure. The bottom layers mimic the morphology of the physical system, while the upper layers correspond to higher-order sub-modules. The resulting agents are then characterized by a committee of policies where actions at a certain level set goals for the level below, thus implementing a hierarchical decision-making structure that can naturally implement task decomposition. We evaluate the proposed framework on a graph clustering problem and MuJoCo locomotion tasks; simulation results show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors learning hierarchical decision-making policies.
2023
- Random Walk Approximation for Stochastic Processes on GraphsStefano Polizzi, Tommaso Marzi, Tommaso Matteuzzi, Gastone Castellani, and Armando BazzaniEntropy, 2023
We introduce the Random Walk Approximation (RWA), a new method to approximate the stationary solution of master equations describing stochastic processes taking place on graphs. Our approximation can be used for all processes governed by non-linear master equations without long-range interactions and with a conserved number of entities, which are typical in biological systems, such as gene regulatory or chemical reaction networks, where no exact solution exists. For linear systems, the RWA becomes the exact result obtained from the maximum entropy principle. The RWA allows having a simple analytical, even though approximated, form of the solution, which is global and easier to deal with than the standard System Size Expansion (SSE). Here, we give some theoretically sufficient conditions for the validity of the RWA and estimate the order of error calculated by the approximation with respect to the number of particles. We compare RWA with SSE for two examples, a toy model and the more realistic dual phosphorylation cycle, governed by the same underlying process. Both approximations are compared with the exact integration of the master equation, showing for the RWA good performances of the same order or better than the SSE, even in regions where sufficient conditions are not met.