Adaptive Runtime Validation of a Responsive Neurostimulation Device in a Body Area Network Using Hierarchical Colored Timed Petri Nets and Hierarchical Reinforcement Learning

 

Safaa Majid Fakhry Al-Sherify, Negar Majma*, Asaad Noori Hashim Al-Shareefi, Zohreh Fotouhi

DOI: 10.5281/zenodo. 18042162

Abstract:

Recent advances in computing, networking, and medical sensing technologies have enabled the design of brain neural stimulation devices for the detection, monitoring, and treatment of epileptic seizures. However, online validation of these devices under critical conditions remains a major challenge from both safety and dependability perspectives. In this paper, we propose an adaptive runtime validation framework for a brain neural stimulation device that combines Hieratical Timed Colored Petri Nets (HTCPN), fuzzy logic, and hierarchical reinforcement learning. The execution structure of the device and the transitions between brain states are formalized using a HTCPN model, while a fuzzy system maps continuous brain and physiological measurements to four discrete risk levels (S1–S3) corresponding to normal activity, probable epileptic activity, and seizure onset which are then used by the hierarchical reinforcement learning agent. On top of this model, we design a two-layer reinforcement learning agent: at the upper level, a Manager uses the fuzzy output and physiological sensors (heart rate, temperature, and blood pressure) to choose among three actions—no intervention, delegating the decision to the stimulation layer (Worker), or inhibiting stimulation—while at the lower level, the Worker generates the stimulation pattern whenever it is activated. The reward function is defined based on driving physiological variables toward safe ranges and on agreement or disagreement with expert decisions. For training, starting from 10 expert-defined reference scenarios, approximately 1,500 augmented scenarios are generated by adding Gaussian noise to brain and physiological signals, systematically varying physiological profiles, and interpolating between same-class scenarios; the hierarchical agent is then trained on this augmented set and subsequently evaluated on the original 10 reference scenarios. Simulation results show that the proposed hierarchical agent learns a stable policy that, in critical scenarios, delegates stimulation control to the Worker and achieves positive and clinically meaningful cumulative rewards. A comparison with a simple Q-learning baseline demonstrates that embedding a hierarchical structure on top of the TCPN–fuzzy model leads to more structured and effective decision-making for adaptive runtime validation of the neural stimulation device within a Body Area Network environment.

Keywords:

Responsive Neurostimulation (RNS) device, Body Area Network (BAN), Hierarchical Timed Colored Petri nets (HTCPN), Fuzzy logic, Hierarchical reinforcement learning, Data augmentation for medical scenarios