MyArxiv
Robotics 63
MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Vision-language-action models (VLAs) have shown generalization capabilities in robotic manipulation tasks by inheriting from vision-language models (VLMs) and learning action generation. Most VLA models focus on interpreting vision and language to generate actions, whereas robots must perceive and interact within the spatial-physical world. This gap highlights the need for a comprehensive understanding of robotic-specific multisensory information, which is crucial for achieving complex and contact-rich control. To this end, we introduce a multisensory language-action (MLA) model that collaboratively perceives heterogeneous sensory modalities and predicts future multisensory objectives to facilitate physical world modeling. Specifically, to enhance perceptual representations, we propose an encoder-free multimodal alignment scheme that innovatively repurposes the large language model itself as a perception module, directly interpreting multimodal cues by aligning 2D images, 3D point clouds, and tactile tokens through positional correspondence. To further enhance MLA's understanding of physical dynamics, we design a future multisensory generation post-training strategy that enables MLA to reason about semantic, geometric, and interaction information, providing more robust conditions for action generation. For evaluation, the MLA model outperforms the previous state-of-the-art 2D and 3D VLA methods by 12% and 24% in complex, contact-rich real-world tasks, respectively, while also demonstrating improved generalization to unseen configurations. Project website: https://sites.google.com/view/open-mla
Benchmarking Egocentric Visual-Inertial SLAM at City Scale ICCV 2025
Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.
comment: ICCV 2025
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
☆ TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
☆ Graphite: A GPU-Accelerated Mixed-Precision Graph Optimization Framework
We present Graphite, a GPU-accelerated nonlinear graph optimization framework. It provides a CUDA C++ interface to enable the sharing of code between a realtime application, such as a SLAM system, and its optimization tasks. The framework supports techniques to reduce memory usage, including in-place optimization, support for multiple floating point types and mixed-precision modes, and dynamically computed Jacobians. We evaluate Graphite on well-known bundle adjustment problems and find that it achieves similar performance to MegBA, a solver specialized for bundle adjustment, while maintaining generality and using less memory. We also apply Graphite to global visual-inertial bundle adjustment on maps generated from stereo-inertial SLAM datasets, and observe speed ups of up to 59x compared to a CPU baseline. Our results indicate that our solver enables faster large-scale optimization on both desktop and resource-constrained devices.
The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization
We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.
☆ Radio-based Multi-Robot Odometry and Relative Localization
Radio-based methods such as Ultra-Wideband (UWB) and RAdio Detection And Ranging (radar), which have traditionally seen limited adoption in robotics, are experiencing a boost in popularity thanks to their robustness to harsh environmental conditions and cluttered environments. This work proposes a multi-robot UGV-UAV localization system that leverages the two technologies with inexpensive and readily-available sensors, such as Inertial Measurement Units (IMUs) and wheel encoders, to estimate the relative position of an aerial robot with respect to a ground robot. The first stage of the system pipeline includes a nonlinear optimization framework to trilaterate the location of the aerial platform based on UWB range data, and a radar pre-processing module with loosely coupled ego-motion estimation which has been adapted for a multi-robot scenario. Then, the pre-processed radar data as well as the relative transformation are fed to a pose-graph optimization framework with odometry and inter-robot constraints. The system, implemented for the Robotic Operating System (ROS 2) with the Ceres optimizer, has been validated in Software-in-the-Loop (SITL) simulations and in a real-world dataset. The proposed relative localization module outperforms state-of-the-art closed-form methods which are less robust to noise. Our SITL environment includes a custom Gazebo plugin for generating realistic UWB measurements modeled after real data. Conveniently, the proposed factor graph formulation makes the system readily extensible to full Simultaneous Localization And Mapping (SLAM). Finally, all the code and experimental data is publicly available to support reproducibility and to serve as a common open dataset for benchmarking.
☆ OceanGym: A Benchmark Environment for Underwater Embodied Agents
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
comment: Work in progress
☆ Memory-Efficient 2D/3D Shape Assembly of Robot Swarms
Mean-shift-based approaches have recently emerged as the most effective methods for robot swarm shape assembly tasks. These methods rely on image-based representations of target shapes to compute local density gradients and perform mean-shift exploration, which constitute their core mechanism. However, such image representations incur substantial memory overhead, which can become prohibitive for high-resolution or 3D shapes. To overcome this limitation, we propose a memory-efficient tree map representation that hierarchically encodes user-specified shapes and is applicable to both 2D and 3D scenarios. Building on this representation, we design a behavior-based distributed controller that enables assignment-free shape assembly. Comparative 2D and 3D simulations against a state-of-the-art mean-shift algorithm demonstrate one to two orders of magnitude lower memory usage and two to three times faster shape entry while maintaining comparable uniformity. Finally, we validate the framework through physical experiments with 6 to 7 UAVs, confirming its real-world practicality.
☆ Learning from Hallucinating Critical Points for Navigation in Dynamic Environments
Generating large and diverse obstacle datasets to learn motion planning in environments with dynamic obstacles is challenging due to the vast space of possible obstacle trajectories. Inspired by hallucination-based data synthesis approaches, we propose Learning from Hallucinating Critical Points (LfH-CP), a self-supervised framework for creating rich dynamic obstacle datasets based on existing optimal motion plans without requiring expensive expert demonstrations or trial-and-error exploration. LfH-CP factorizes hallucination into two stages: first identifying when and where obstacles must appear in order to result in an optimal motion plan, i.e., the critical points, and then procedurally generating diverse trajectories that pass through these points while avoiding collisions. This factorization avoids generative failures such as mode collapse and ensures coverage of diverse dynamic behaviors. We further introduce a diversity metric to quantify dataset richness and show that LfH-CP produces substantially more varied training data than existing baselines. Experiments in simulation demonstrate that planners trained on LfH-CP datasets achieves higher success rates compared to a prior hallucination method.
☆ Analytic Conditions for Differentiable Collision Detection in Trajectory Optimization IROS
Optimization-based methods are widely used for computing fast, diverse solutions for complex tasks such as collision-free movement or planning in the presence of contacts. However, most of these methods require enforcing non-penetration constraints between objects, resulting in a non-trivial and computationally expensive problem. This makes the use of optimization-based methods for planning and control challenging. In this paper, we present a method to efficiently enforce non-penetration of sets while performing optimization over their configuration, which is directly applicable to problems like collision-aware trajectory optimization. We introduce novel differentiable conditions with analytic expressions to achieve this. To enforce non-collision between non-smooth bodies using these conditions, we introduce a method to approximate polytopes as smooth semi-algebraic sets. We present several numerical experiments to demonstrate the performance of the proposed method and compare the performance with other baseline methods recently proposed in the literature.
comment: 8 pages, 8 figures. Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
☆ Unwinding Rotations Reduces VR Sickness in Nonsimulated Immersive Telepresence
Immersive telepresence, when a user views the video stream of a $360^\circ$ camera in a remote environment using a Head Mounted Display (HMD), has great potential to improve the sense of being in a remote environment. In most cases of immersive robotic telepresence, the camera is mounted on a mobile robot which increases the portion of the environment that the remote user can explore. However, robot motions can induce unpleasant symptoms associated with Virtual Reality (VR) sickness, degrading the overall user experience. Previous research has shown that unwinding the rotations of the robot, that is, decoupling the rotations that the camera undergoes due to robot motions from what is seen by the user, can increase user comfort and reduce VR sickness. However, that work considered a virtual environment and a simulated robot. In this work, to test whether the same hypotheses hold when the video stream from a real camera is used, we carried out a user study $(n=36)$ in which the unwinding rotations method was compared against coupled rotations in a task completed through a panoramic camera mounted on a robotic arm. Furthermore, within an inspection task which involved translations and rotations in three dimensions, we tested whether unwinding the robot rotations impacted the performance of users. The results show that the users found the unwinding rotations method to be more comfortable and preferable, and that a reduced level of VR sickness can be achieved without a significant impact on task performance.
comment: 24th IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
☆ Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
☆ SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning
Embodied task planning requires agents to produce executable actions in a close-loop manner within the environment. With progressively improving capabilities of LLMs in task decomposition, planning, and generalization, current embodied task planning methods adopt LLM-based architecture.However, existing LLM-based planners remain limited in three aspects, i.e., fixed planning paradigms, lack of action sequence constraints, and error-agnostic. In this work, we propose SDA-PLANNER, enabling an adaptive planning paradigm, state-dependency aware and error-aware mechanisms for comprehensive embodied task planning. Specifically, SDA-PLANNER introduces a State-Dependency Graph to explicitly model action preconditions and effects, guiding the dynamic revision. To handle execution error, it employs an error-adaptive replanning strategy consisting of Error Backtrack and Diagnosis and Adaptive Action SubTree Generation, which locally reconstructs the affected portion of the plan based on the current environment state. Experiments demonstrate that SDA-PLANNER consistently outperforms baselines in success rate and goal completion, particularly under diverse error conditions.
☆ Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models RSS
Mobile ground robots lacking prior knowledge of an environment must rely on sensor data to develop a model of their surroundings. In these scenarios, consistent identification of obstacles and terrain features can be difficult due to noise and algorithmic shortcomings, which can make it difficult for motion planning systems to generate safe motions. One particular difficulty to overcome is when regions of the cost map switch between being marked as obstacles and free space through successive planning cycles. One potential solution to this, which we refer to as Valid in Every Hypothesis (VEH), is for the planning system to plan motions that are guaranteed to be safe through a history of world models. Another approach is to track a history of world models, and adjust node costs according to the potential penalty of needing to reroute around previously hazardous areas. This work discusses three major iterations on this idea. The first iteration, called PEH, invokes a sub-search for every node expansion that crosses through a divergence point in the world models. The second and third iterations, called GEH and GEGRH respectively, defer the sub-search until after an edge expands into the goal region. GEGRH uses an additional step to revise the graph based on divergent nodes in each world. Initial results showed that, although PEH and GEH find more optimistic solutions than VEH, they are unable to generate solutions in less than one-second, which exceeds our requirements for field deployment. Analysis of results from a field experiment in an unstructured, off-road environment on a Clearpath Robotics Warthog UGV indicate that GEGRH finds lower cost trajectories and has faster average planning times than VEH. Compared to single-hypothesis (SH) search, where only the latest world model is considered, GEGRH generates more conservative plans with a small increase in average planning time.
comment: Presented at the Robotics: Science and Systems (RSS) 2025 Workshop on Resilient Off-road Autonomous Robotics (ROAR)
☆ LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
☆ Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation
Out-of-distribution states in robot manipulation often lead to unpredictable robot behavior or task failure, limiting success rates and increasing risk of damage. Anomaly detection (AD) can identify deviations from expected patterns in data, which can be used to trigger failsafe behaviors and recovery strategies. Prior work has applied data-driven AD to time series data in specific robotic tasks, but its transferability across control strategies and task types has not been shown. Leveraging time series data, such as force/torque signals, allows to directly capture robot-environment interactions, crucial for manipulation and online failure detection. Their broad availability, high sampling rates, and low dimensionality enable high temporal resolution and efficient processing. As robotic tasks can have widely signal characteristics and requirements, AD methods which can be applied in the same way to a wide range of tasks is needed, ideally with good data efficiency. We examine three industrial robotic tasks, each presenting several anomalies. Test scenarios in robotic cabling, screwing, and sanding are built, and multimodal time series data is gathered. Several autoencoder-based methods are compared, evaluating generalization across tasks and control methods (diffusion policy, position, and impedance control). This allows us to validate the integration of AD in complex tasks involving tighter tolerances and variation from both the robot and its environment. Additionally, we evaluate data efficiency, detection latency, and task characteristics which support robust detection. The results indicate reliable detection with AUROC exceeding 0.93 in failures in the cabling and screwing task, such as incorrect or misaligned parts and obstructed targets. In the polishing task, only severe failures were reliably detected, while more subtle failure types remained undetected.
☆ ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm
The rapid increase in the development of humanoid robots and customized manufacturing solutions has brought dexterous manipulation to the forefront of modern robotics. Over the past decade, several expensive dexterous hands have come to market, but advances in hardware design, particularly in servo motors and 3D printing, have recently facilitated an explosion of cheaper open-source hands. Most hands are anthropomorphic to allow use of standard human tools, and attempts to increase dexterity often sacrifice anthropomorphism. We introduce the open-source ISyHand (pronounced easy-hand), a highly dexterous, low-cost, easy-to-manufacture, on-joint servo-driven robot hand. Our hand uses off-the-shelf Dynamixel motors, fasteners, and 3D-printed parts, can be assembled within four hours, and has a total material cost of about 1,300 USD. The ISyHands's unique articulated-palm design increases overall dexterity with only a modest sacrifice in anthropomorphism. To demonstrate the utility of the articulated palm, we use reinforcement learning in simulation to train the hand to perform a classical in-hand manipulation task: cube reorientation. Our novel, systematic experiments show that the simulated ISyHand outperforms the two most comparable hands in early training phases, that all three perform similarly well after policy convergence, and that the ISyHand significantly outperforms a fixed-palm version of its own design. Additionally, we deploy a policy trained on cube reorientation on the real hand, demonstrating its ability to perform real-world dexterous manipulation.
comment: Accepted at IEEE Humanoids 2025
☆ Terrain-Awared LiDAR-Inertial Odometry for Legged-Wheel Robots Based on Radial Basis Function Approximation
An accurate odometry is essential for legged-wheel robots operating in unstructured terrains such as bumpy roads and staircases. Existing methods often suffer from pose drift due to their ignorance of terrain geometry. We propose a terrain-awared LiDAR-Inertial odometry (LIO) framework that approximates the terrain using Radial Basis Functions (RBF) whose centers are adaptively selected and weights are recursively updated. The resulting smooth terrain manifold enables ``soft constraints" that regularize the odometry optimization and mitigates the $z$-axis pose drift under abrupt elevation changes during robot's maneuver. To ensure the LIO's real-time performance, we further evaluate the RBF-related terms and calculate the inverse of the sparse kernel matrix with GPU parallelization. Experiments on unstructured terrains demonstrate that our method achieves higher localization accuracy than the state-of-the-art baselines, especially in the scenarios that have continuous height changes or sparse features when abrupt height changes occur.
☆ Side Scan Sonar-based SLAM for Autonomous Algae Farm Monitoring
The transition of seaweed farming to an alternative food source on an industrial scale relies on automating its processes through smart farming, equivalent to land agriculture. Key to this process are autonomous underwater vehicles (AUVs) via their capacity to automate crop and structural inspections. However, the current bottleneck for their deployment is ensuring safe navigation within farms, which requires an accurate, online estimate of the AUV pose and map of the infrastructure. To enable this, we propose an efficient side scan sonar-based (SSS) simultaneous localization and mapping (SLAM) framework that exploits the geometry of kelp farms via modeling structural ropes in the back-end as sequences of individual landmarks from each SSS ping detection, instead of combining detections into elongated representations. Our method outperforms state of the art solutions in hardware in the loop (HIL) experiments on a real AUV survey in a kelp farm. The framework and dataset can be found at https://github.com/julRusVal/sss_farm_slam.
☆ Autonomous Multi-Robot Infrastructure for AI-Enabled Healthcare Delivery and Diagnostics
This research presents a multi-robot system for inpatient care, designed using swarm intelligence principles and incorporating wearable health sensors, RF-based communication, and AI-driven decision support. Within a simulated hospital environment, the system adopts a leader-follower swarm configuration to perform patient monitoring, medicine delivery, and emergency assistance. Due to ethical constraints, live patient trials were not conducted; instead, validation was carried out through controlled self-testing with wearable sensors. The Leader Robot acquires key physiological parameters, including temperature, SpO2, heart rate, and fall detection, and coordinates other robots when required. The Assistant Robot patrols corridors for medicine delivery, while a robotic arm provides direct drug administration. The swarm-inspired leader-follower strategy enhanced communication reliability and ensured continuous monitoring, including automated email alerts to healthcare staff. The system hardware was implemented using Arduino, Raspberry Pi, NRF24L01 RF modules, and a HuskyLens AI camera. Experimental evaluation showed an overall sensor accuracy above 94%, a 92% task-level success rate, and a 96% communication reliability rate, demonstrating system robustness. Furthermore, the AI-enabled decision support was able to provide early warnings of abnormal health conditions, highlighting the potential of the system as a cost-effective solution for hospital automation and patient safety.
comment: 11 pages, 5 figures, MSc dissertation submission draft, prepared for conference/journal consideration
☆ Evolutionary Continuous Adaptive RL-Powered Co-Design for Humanoid Chin-Up Performance
Humanoid robots have seen significant advancements in both design and control, with a growing emphasis on integrating these aspects to enhance overall performance. Traditionally, robot design has followed a sequential process, where control algorithms are developed after the hardware is finalized. However, this can be myopic and prevent robots to fully exploit their hardware capabilities. Recent approaches advocate for co-design, optimizing both design and control in parallel to maximize robotic capabilities. This paper presents the Evolutionary Continuous Adaptive RL-based Co-Design (EA-CoRL) framework, which combines reinforcement learning (RL) with evolutionary strategies to enable continuous adaptation of the control policy to the hardware. EA-CoRL comprises two key components: Design Evolution, which explores the hardware choices using an evolutionary algorithm to identify efficient configurations, and Policy Continuous Adaptation, which fine-tunes a task-specific control policy across evolving designs to maximize performance rewards. We evaluate EA-CoRL by co-designing the actuators (gear ratios) and control policy of the RH5 humanoid for a highly dynamic chin-up task, previously unfeasible due to actuator limitations. Comparative results against state-of-the-art RL-based co-design methods show that EA-CoRL achieves higher fitness score and broader design space exploration, highlighting the critical role of continuous policy adaptation in robot co-design.
☆ Conflict-Based Search and Prioritized Planning for Multi-Agent Path Finding Among Movable Obstacles
This paper investigates Multi-Agent Path Finding Among Movable Obstacles (M-PAMO), which seeks collision-free paths for multiple agents from their start to goal locations among static and movable obstacles. M-PAMO arises in logistics and warehouses where mobile robots are among unexpected movable objects. Although Multi-Agent Path Finding (MAPF) and single-agent Path planning Among Movable Obstacles (PAMO) were both studied, M-PAMO remains under-explored. Movable obstacles lead to new fundamental challenges as the state space, which includes both agents and movable obstacles, grows exponentially with respect to the number of agents and movable obstacles. In particular, movable obstacles often closely couple agents together spatially and temporally. This paper makes a first attempt to adapt and fuse the popular Conflict-Based Search (CBS) and Prioritized Planning (PP) for MAPF, and a recent single-agent PAMO planner called PAMO*, together to address M-PAMO. We compare their performance with up to 20 agents and hundreds of movable obstacles, and show the pros and cons of these approaches.
☆ Towards Human Engagement with Realistic AI Combat Pilots
We present a system that enables real-time interaction between human users and agents trained to control fighter jets in simulated 3D air combat scenarios. The agents are trained in a dedicated environment using Multi-Agent Reinforcement Learning. A communication link is developed to allow seamless deployment of trained agents into VR-Forces, a widely used defense simulation tool for realistic tactical scenarios. This integration allows mixed simulations where human-controlled entities engage with intelligent agents exhibiting distinct combat behaviors. Our interaction model creates new opportunities for human-agent teaming, immersive training, and the exploration of innovative tactics in defense contexts.
comment: 13th International Conference on Human-Agent Interaction (HAI) 2025
☆ On the Conic Complementarity of Planar Contacts
We present a unifying theoretical result that connects two foundational principles in robotics: the Signorini law for point contacts, which underpins many simulation methods for preventing object interpenetration, and the center of pressure (also known as the zero-moment point), a key concept used in, for instance, optimization-based locomotion control. Our contribution is the planar Signorini condition, a conic complementarity formulation that models general planar contacts between rigid bodies. We prove that this formulation is equivalent to enforcing the punctual Signorini law across an entire contact surface, thereby bridging the gap between discrete and continuous contact models. A geometric interpretation reveals that the framework naturally captures three physical regimes -sticking, separating, and tilting-within a unified complementarity structure. This leads to a principled extension of the classical center of pressure, which we refer to as the extended center of pressure. By establishing this connection, our work provides a mathematically consistent and computationally tractable foundation for handling planar contacts, with implications for both the accurate simulation of contact dynamics and the design of advanced control and optimization algorithms in locomotion and manipulation.
☆ Emotionally Expressive Robots: Implications for Children's Behavior toward Robot
The growing development of robots with artificial emotional expressiveness raises important questions about their persuasive potential in children's behavior. While research highlights the pragmatic value of emotional expressiveness in human social communication, the extent to which robotic expressiveness can or should influence empathic responses in children is grounds for debate. In a pilot study with 22 children (aged 7-11) we begin to explore the ways in which different levels of embodied expressiveness (body only, face only, body and face) of two basic emotions (happiness and sadness) displayed by an anthropomorphic robot (QTRobot) might modify children's behavior in a child-robot cooperative turn-taking game. We observed that children aligned their behavior to the robot's inferred emotional state. However, higher levels of expressiveness did not result in increased alignment. The preliminary results reported here provide a starting point for reflecting on robotic expressiveness and its role in shaping children's social-emotional behavior toward robots as social peers in the near future.
☆ S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose S$^3$E, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. S$^3$E fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.
☆ MUVLA: Learning to Explore Object Navigation via Map Understanding
In this paper, we present MUVLA, a Map Understanding Vision-Language-Action model tailored for object navigation. It leverages semantic map abstractions to unify and structure historical information, encoding spatial context in a compact and consistent form. MUVLA takes the current and history observations, as well as the semantic map, as inputs and predicts the action sequence based on the description of goal object. Furthermore, it amplifies supervision through reward-guided return modeling based on dense short-horizon progress signals, enabling the model to develop a detailed understanding of action value for reward maximization. MUVLA employs a three-stage training pipeline: learning map-level spatial understanding, imitating behaviors from mixed-quality demonstrations, and reward amplification. This strategy allows MUVLA to unify diverse demonstrations into a robust spatial representation and generate more rational exploration strategies. Experiments on HM3D and Gibson benchmarks demonstrate that MUVLA achieves great generalization and learns effective exploration behaviors even from low-quality or partially successful trajectories.
☆ Towards Intuitive Human-Robot Interaction through Embodied Gesture-Driven Control with Woven Tactile Skins
This paper presents a novel human-robot interaction (HRI) framework that enables intuitive gesture-driven control through a capacitance-based woven tactile skin. Unlike conventional interfaces that rely on panels or handheld devices, the woven tactile skin integrates seamlessly with curved robot surfaces, enabling embodied interaction and narrowing the gap between human intent and robot response. Its woven design combines fabric-like flexibility with structural stability and dense multi-channel sensing through the interlaced conductive threads. Building on this capability, we define a gesture-action mapping of 14 single- and multi-touch gestures that cover representative robot commands, including task-space motion and auxiliary functions. A lightweight convolution-transformer model designed for gesture recognition in real time achieves an accuracy of near-100%, outperforming prior baseline approaches. Experiments on robot arm tasks, including pick-and-place and pouring, demonstrate that our system reduces task completion time by up to 57% compared with keyboard panels and teach pendants. Overall, our proposed framework demonstrates a practical pathway toward more natural and efficient embodied HRI.
☆ State Estimation for Compliant and Morphologically Adaptive Robots ICRA 2026
Locomotion robots with active or passive compliance can show robustness to uncertain scenarios, which can be promising for agricultural, research and environmental industries. However, state estimation for these robots is challenging due to the lack of rigid-body assumptions and kinematic changes from morphing. We propose a method to estimate typical rigid-body states alongside compliance-related states, such as soft robot shape in different morphologies and locomotion modes. Our neural network-based state estimator uses a history of states and a mechanism to directly influence unreliable sensors. We test our framework on the GOAT platform, a robot capable of passive compliance and active morphing for extreme outdoor terrain. The network is trained on motion capture data in a novel compliance-centric frame that accounts for morphing-related states. Our method predicts shape-related measurements within 4.2% of the robot's size, velocities within 6.3% and 2.4% of the top linear and angular speeds, respectively, and orientation within 1.5 degrees. We also demonstrate a 300% increase in travel range during a motor malfunction when using our estimator for closed-loop autonomous outdoor operation.
comment: 8 pages, 10 figures, 1 table, submitted to ICRA 2026
☆ Preemptive Spatiotemporal Trajectory Adjustment for Heterogeneous Vehicles in Highway Merging Zones
Aiming at the problem of driver's perception lag and low utilization efficiency of space-time resources in expressway ramp confluence area, based on the preemptive spatiotemporal trajectory Adjustment system, from the perspective of coordinating spatiotemporal resources, the reasonable value of safe space-time distance in trajectory pre-preparation is quantitatively analyzed. The minimum safety gap required for ramp vehicles to merge into the mainline is analyzed by introducing double positioning error and spatiotemporal trajectory tracking error. A merging control strategy for autonomous driving heterogeneous vehicles is proposed, which integrates vehicle type, driving intention, and safety spatiotemporal distance. The specific confluence strategies of ramp target vehicles and mainline cooperative vehicles under different vehicle types are systematically expounded. A variety of traffic flow and speed scenarios are used for full combination simulation. By comparing the time-position-speed diagram, the vehicle operation characteristics and the dynamic difference of confluence are qualitatively analyzed, and the average speed and average delay are used as the evaluation indices to quantitatively evaluate the performance advantages of the preemptive cooperative confluence control strategy. The results show that the maximum average delay improvement rates of mainline and ramp vehicles are 90.24 % and 74.24 %, respectively. The proposed strategy can effectively avoid potential vehicle conflicts and emergency braking behaviors, improve driving safety in the confluence area, and show significant advantages in driving stability and overall traffic efficiency optimization.
☆ Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation
Enabling robots to execute long-horizon manipulation tasks from free-form language instructions remains a fundamental challenge in embodied AI. While vision-language models (VLMs) have shown promise as high-level planners, their deployment in the real world is hindered by two gaps: (i) the scarcity of large-scale, sequential manipulation data that couples natural language with multi-step action plans, and (ii) the absence of dense, interpretable rewards for fine-tuning VLMs on planning objectives. To address these issues, we propose REVER, a framework that empowers VLMs to generate and validate long-horizon manipulation plans from natural language instructions in real-world scenarios. Under REVER we train and release RoboFarseer, a VLM incentivized to emit chain-of-thought that perform temporal and spatial reasoning, ensuring physically plausible and logically coherent plans. To obtain training data, we leverage the Universal Manipulation Interface framework to capture hardware-agnostic demonstrations of atomic skills. An automated annotation engine converts each demonstration into vision-instruction-plan triplet. We introduce a verifiable reward that scores the generated plan by its ordered bipartite matching overlap with the ground-truth skill sequence. At run time, the fine-tuned VLM functions both as a planner and as a monitor, verifying step-wise completion. RoboFarseer matches or exceeds the performance of proprietary models that are orders of magnitude larger, while on open-ended planning it surpasses the best baseline by more than 40%. In real-world, long-horizon tasks, the complete system boosts overall success by roughly 60% compared with the same low-level controller without the planner. We will open-source both the dataset and the trained model upon publication.
☆ Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies NeurIPS 2025
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action--Guided Diffusion Policy (DP--AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP--AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle--consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception--action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP--AG significantly outperforms state--of--the--art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP--AG offers a promising step toward bridging biological adaptability and artificial policy learning.
comment: 42 pages, 17 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
☆ SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process. We trace this instability to a fundamental connection: the flow rollout is algebraically equivalent to a residual recurrent computation, making it susceptible to the same vanishing and exploding gradients as RNNs. To address this, we reparameterize the velocity network using principles from modern sequential models, introducing two stable architectures: Flow-G, which incorporates a gated velocity, and Flow-T, which utilizes a decoded velocity. We then develop a practical SAC-based algorithm, enabled by a noise-augmented rollout, that facilitates direct end-to-end training of these policies. Our approach supports both from-scratch and offline-to-online learning and achieves state-of-the-art performance on continuous control and robotic manipulation benchmarks, eliminating the need for common workarounds like policy distillation or surrogate objectives.
☆ Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real
Sim-to-real transfer remains a fundamental challenge in robot manipulation due to the entanglement of perception and control in end-to-end learning. We present a decoupled framework that learns each component where it is most reliable: control policies are trained in simulation with privileged state to master spatial layouts and manipulation dynamics, while perception is adapted only at deployment to bridge real observations to the frozen control policy. Our key insight is that control strategies and action patterns are universal across environments and can be learned in simulation through systematic randomization, while perception is inherently domain-specific and must be learned where visual observations are authentic. Unlike existing end-to-end approaches that require extensive real-world data, our method achieves strong performance with only 10-20 real demonstrations by reducing the complex sim-to-real problem to a structured perception alignment task. We validate our approach on tabletop manipulation tasks, demonstrating superior data efficiency and out-of-distribution generalization compared to end-to-end baselines. The learned policies successfully handle object positions and scales beyond the training distribution, confirming that decoupling perception from control fundamentally improves sim-to-real transfer.
comment: 10 pages, 6 figures
☆ TacRefineNet: Tactile-Only Grasp Refinement Between Arbitrary In-Hand Object Poses
Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. To address this "last-mile" challenge, we propose TacRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. To train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. Extensive real-world experiments validate the effectiveness of the method, achieving millimeter-level grasp accuracy using only tactile input. To our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone. Project website is available at https://sites.google.com/view/tacrefinenet
comment: 9 pages, 9 figures
☆ Boundary-to-Region Supervision for Offline Safe Reinforcement Learning NeurIPS 2025
Offline safe reinforcement learning aims to learn policies that satisfy predefined safety constraints from static datasets. Existing sequence-model-based methods condition action generation on symmetric input tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry: return-to-go (RTG) serves as a flexible performance target, while cost-to-go (CTG) should represent a rigid safety boundary. This symmetric conditioning leads to unreliable constraint satisfaction, especially when encountering out-of-distribution cost trajectories. To address this, we propose Boundary-to-Region (B2R), a framework that enables asymmetric conditioning through cost signal realignment . B2R redefines CTG as a boundary constraint under a fixed safety budget, unifying the cost distribution of all feasible trajectories while preserving reward structures. Combined with rotary positional embeddings , it enhances exploration within the safe region. Experimental results show that B2R satisfies safety constraints in 35 out of 38 safety-critical tasks while achieving superior reward performance over baseline methods. This work highlights the limitations of symmetric token conditioning and establishes a new theoretical and practical approach for applying sequence models to safe RL. Our code is available at https://github.com/HuikangSu/B2R.
comment: NeurIPS 2025
☆ VLA Model Post-Training via Action-Chunked PPO and Self Behavior Cloning
Reinforcement learning (RL) is a promising avenue for post-training vision-language-action (VLA) models, but practical deployment is hindered by sparse rewards and unstable training. This work mitigates these challenges by introducing an action chunk based on proximal policy optimization (PPO) with behavior cloning using self-collected demonstrations. Aggregating consecutive actions into chunks improves the temporal consistency of the policy and the density of informative feedback. In addition, an auxiliary behavior cloning loss is applied with a dynamically updated demonstration buffer that continually collects high-quality task trials during training. The relative weight between the action-chunked PPO objective and the self behavior clone auxiliary loss is adapted online to stabilize the post-training process. Experiments on the MetaWorld benchmark indicate improved performance over supervised fine-tuning, achieving a high success rate (0.93) and few steps to success (42.17). These results demonstrate the viability of RL for VLA post-training and help lay the groundwork for downstream VLA applications.
☆ OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.
☆ Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors
We propose a novel hierarchical diffusion planner that embeds task and motion structure directly in the noise model. Unlike standard diffusion-based planners that use zero-mean, isotropic Gaussian noise, we employ a family of task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP): sparse, task-centric key states or their associated timings (or both) are treated as noisy observations to produce a prior instance. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this, our hierarchy separates prior instantiation from trajectory denoising: the upper level instantiates a task-conditioned structured Gaussian (mean and covariance), and the lower level denoises the full trajectory under that fixed prior. Experiments on Maze2D goal-reaching and KUKA block stacking show improved success rates, smoother trajectories, and stronger task alignment compared to isotropic baselines. Ablation studies indicate that explicitly structuring the corruption process offers benefits beyond simply conditioning the neural network. Overall, our method concentrates probability mass of prior near feasible, smooth, and semantically meaningful trajectories while maintaining tractability. Our project page is available at https://hta-diffusion.github.io.
dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
Vision-Language-Action (VLA) models are emerging as a next-generation paradigm for robotics. We introduce dVLA, a diffusion-based VLA that leverages a multimodal chain-of-thought to unify visual perception, language reasoning, and robotic control in a single system. dVLA jointly optimizes perception, language understanding, and action under a single diffusion objective, enabling stronger cross-modal reasoning and better generalization to novel instructions and objects. For practical deployment, we mitigate inference latency by incorporating two acceleration strategies, a prefix attention mask and KV caching, yielding up to around times speedup at test-time inference. We evaluate dVLA in both simulation and the real world: on the LIBERO benchmark, it achieves state-of-the-art performance with a 96.4% average success rate, consistently surpassing both discrete and continuous action policies; on a real Franka robot, it succeeds across a diverse task suite, including a challenging bin-picking task that requires multi-step planning, demonstrating robust real-world performance. Together, these results underscore the promise of unified diffusion frameworks for practical, high-performance VLA robotics.
comment: technique report
☆ Field Calibration of Hyperspectral Cameras for Terrain Inference
Intra-class terrain differences such as water content directly influence a vehicle's ability to traverse terrain, yet RGB vision systems may fail to distinguish these properties. Evaluating a terrain's spectral content beyond red-green-blue wavelengths to the near infrared spectrum provides useful information for intra-class identification. However, accurate analysis of this spectral information is highly dependent on ambient illumination. We demonstrate a system architecture to collect and register multi-wavelength, hyperspectral images from a mobile robot and describe an approach to reflectance calibrate cameras under varying illumination conditions. To showcase the practical applications of our system, HYPER DRIVE, we demonstrate the ability to calculate vegetative health indices and soil moisture content from a mobile robot platform.
comment: Accepted to IEEE Robotics & Automation Letters
♻ ☆ Visual-auditory Extrinsic Contact Estimation
Robust manipulation often hinges on a robot's ability to perceive extrinsic contacts-contacts between a grasped object and its surrounding environment. However, these contacts are difficult to observe through vision alone due to occlusions, limited resolution, and ambiguous near-contact states. In this paper, we propose a visual-auditory method for extrinsic contact estimation that integrates global scene information from vision with local contact cues obtained through active audio sensing. Our approach equips a robotic gripper with contact microphones and conduction speakers, enabling the system to emit and receive acoustic signals through the grasped object to detect external contacts. We train our perception pipeline entirely in simulation and zero-shot transfer to the real world. To bridge the sim-to-real gap, we introduce a real-to-sim audio hallucination technique, injecting real-world audio samples into simulated scenes with ground-truth contact labels. The resulting multimodal model accurately estimates both the location and size of extrinsic contacts across a range of cluttered and occluded scenarios. We further demonstrate that explicit contact prediction significantly improves policy learning for downstream contact-rich manipulation tasks.
comment: 8 pages, 7 figures
♻ ☆ AuDeRe: Automated Strategy Decision and Realization in Robot Planning and Control via LLMs
Recent advancements in large language models (LLMs) have shown significant promise in various domains, especially robotics. However, most prior LLM-based work in robotic applications either directly predicts waypoints or applies LLMs within fixed tool integration frameworks, offering limited flexibility in exploring and configuring solutions best suited to different tasks. In this work, we propose a framework that leverages LLMs to select appropriate planning and control strategies based on task descriptions, environmental constraints, and system dynamics. These strategies are then executed by calling the available comprehensive planning and control APIs. Our approach employs iterative LLM-based reasoning with performance feedback to refine the algorithm selection. We validate our approach through extensive experiments across tasks of varying complexity, from simple tracking to complex planning scenarios involving spatiotemporal constraints. The results demonstrate that using LLMs to determine planning and control strategies from natural language descriptions significantly enhances robotic autonomy while reducing the need for extensive manual tuning and expert knowledge. Furthermore, our framework maintains generalizability across different tasks and notably outperforms baseline methods that rely on LLMs for direct trajectory, control sequence, or code generation.
comment: 8 pages, 14 figures, submitted to the 2026 American Control Conference
♻ ☆ Robot Conga: A Leader-Follower Walking Approach to Sequential Path Following in Multi-Agent Systems
Coordinated path following in multi-agent systems is a key challenge in robotics, with applications in automated logistics, surveillance, and collaborative exploration. Traditional formation control techniques often rely on time-parameterized trajectories and path integrals, which can result in synchronization issues and rigid behavior. In this work, we address the problem of sequential path following, where agents maintain fixed spatial separation along a common trajectory, guided by a leader under centralized control. We introduce Robot Conga, a leader-follower control strategy that updates each agent's desired state based on the leader's spatial displacement rather than time, assuming access to a global position reference, an assumption valid in indoor environments equipped with motion capture, vision-based tracking, or UWB localization systems. The algorithm was validated in simulation using both TurtleBot3 and quadruped (Laikago) robots. Results demonstrate accurate trajectory tracking, stable inter-agent spacing, and fast convergence, with all agents aligning within 250 time steps (approx. 0.25 seconds) in the quadruped case, and almost instantaneously in the TurtleBot3 implementation.
comment: 6 Pages, 8 Figures. Both authors have contributed equally
♻ ☆ Apple: Toward General Active Perception via Reinforcement Learning
Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics.
comment: 16 pages; 13 figures Under Review
♻ ☆ Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation
Autonomous harvesting in the open presents a complex manipulation problem. In most scenarios, an autonomous system has to deal with significant occlusion and require interaction in the presence of large structural uncertainties (every plant is different). Perceptual and modeling uncertainty make design of reliable manipulation controllers for harvesting challenging, resulting in poor performance during deployment. We present a sim2real reinforcement learning (RL) framework for occlusion-aware plant manipulation, where a policy is learned entirely in simulation to reposition stems and leaves to reveal target fruit(s). In our proposed approach, we decouple high-level kinematic planning from low-level compliant control which simplifies the sim2real transfer. This decomposition allows the learned policy to generalize across multiple plants with different stiffness and morphology. In experiments with multiple real-world plant setups, our system achieves up to 86.7% success in exposing target fruits, demonstrating robustness to occlusion variation and structural uncertainty.
comment: 9 Pages, 3 Figures, 1 Table
♻ ☆ DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion
We introduce DreamControl, a novel methodology for learning autonomous whole-body humanoid skills. DreamControl leverages the strengths of diffusion models and Reinforcement Learning (RL): our core innovation is the use of a diffusion prior trained on human motion data, which subsequently guides an RL policy in simulation to complete specific tasks of interest (e.g., opening a drawer or picking up an object). We demonstrate that this human motion-informed prior allows RL to discover solutions unattainable by direct RL, and that diffusion models inherently promote natural looking motions, aiding in sim-to-real transfer. We validate DreamControl's effectiveness on a Unitree G1 robot across a diverse set of challenging tasks involving simultaneous lower and upper body control and object interaction. Project website at https://genrobo.github.io/DreamControl/
comment: https://genrobo.github.io/DreamControl/ (under submission)
♻ ☆ Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
This paper presents Symbiosis of Agents, is a large-scale installation by Baoyang Chen (baoyangchen.com), that embeds AI-driven robots in an immersive, mirror-lined arena, probing the tension between machine agency and artistic authorship. Drawing on early cybernetics, rule-based conceptual art, and seminal robotic works, it orchestrates fluid exchanges among robotic arms, quadruped machines, their environment, and the public. A three tier faith system pilots the ecology: micro-level adaptive tactics, meso-level narrative drives, and a macro-level prime directive. This hierarchy lets behaviors evolve organically in response to environmental cues and even a viewer's breath, turning spectators into co-authors of the unfolding drama. Framed by a speculative terraforming scenario that recalls the historical exploitation of marginalized labor, the piece asks who bears responsibility in AI-mediated futures. Choreographed motion, AI-generated scripts, reactive lighting, and drifting fog cast the robots as collaborators rather than tools, forging a living, emergent artwork. Exhibited internationally, Symbiosis of Agents shows how cybernetic feedback, robotic experimentation, and conceptual rule-making can converge to redefine agency, authorship, and ethics in contemporary art.
♻ ☆ Ocean Diviner: A Diffusion-Augmented Reinforcement Learning Framework for AUV Robust Control in Underwater Tasks
Autonomous Underwater Vehicles (AUVs) are essential for marine exploration, yet their control remains highly challenging due to nonlinear dynamics and uncertain environmental disturbances. This paper presents a diffusion-augmented Reinforcement Learning (RL) framework for robust AUV control, aiming to improve AUV's adaptability in dynamic underwater environments. The proposed framework integrates two core innovations: (1) A diffusion-based action generation framework that produces physically feasible and high-quality actions, enhanced by a high-dimensional state encoding mechanism combining current observations with historical states and actions through a novel diffusion U-Net architecture, significantly improving long-horizon planning capacity for robust control. (2) A sample-efficient hybrid learning architecture that synergizes diffusion-guided exploration with RL policy optimization, where the diffusion model generates diverse candidate actions and the RL critic selects the optimal action, achieving higher exploration efficiency and policy stability in dynamic underwater environments. Extensive simulation experiments validate the framework's superior robustness and flexibility, outperforming conventional control methods in challenging marine conditions, offering enhanced adaptability and reliability for AUV operations in underwater tasks. Finally, we will release the code publicly soon to support future research in this area.
comment: Jingzehua Xu, Guanwen Xie and Weiyi Liu contributed equally to this work
♻ ☆ LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
This paper extends LiDAR-BIND, a modular multi-modal fusion framework that binds heterogeneous sensors (radar, sonar) to a LiDAR-defined latent space, with mechanisms that explicitly enforce temporal consistency. We introduce three contributions: (i) temporal embedding similarity that aligns consecutive latent representations, (ii) a motion-aligned transformation loss that matches displacement between predictions and ground truth LiDAR, and (iii) windowed temporal fusion using a specialised temporal module. We further update the model architecture to better preserve spatial structure. Evaluations on radar/sonar-to-LiDAR translation demonstrate improved temporal and spatial coherence, yielding lower absolute trajectory error and better occupancy map accuracy in Cartographer-based SLAM (Simultaneous Localisation and Mapping). We propose different metrics based on the Fr\'echet Video Motion Distance (FVMD) and a correlation-peak distance metric providing practical temporal quality indicators to evaluate SLAM performance. The proposed temporal LiDAR-BIND, or LiDAR-BIND-T, maintains modular modality fusion while substantially enhancing temporal stability, resulting in improved robustness and performance for downstream SLAM.
♻ ☆ AgriCruiser: An Open Source Agriculture Robot for Over-the-row Navigation
We present the AgriCruiser, an open-source over-the-row agricultural robot developed for low-cost deployment and rapid adaptation across diverse crops and row layouts. The chassis provides an adjustable track width of 1.42 m to 1.57 m, along with a ground clearance of 0.94 m. The AgriCruiser achieves compact pivot turns with radii of 0.71 m to 0.79 m, enabling efficient headland maneuvers. The platform is designed for the integration of the other subsystems, and in this study, a precision spraying system was implemented to assess its effectiveness in weed management. In twelve flax plots, a single robotic spray pass reduced total weed populations (pigweed and Venice mallow) by 24- to 42-fold compared to manual weeding in four flax plots, while also causing less crop damage. Mobility experiments conducted on concrete, asphalt, gravel, grass, and both wet and dry soil confirmed reliable traversal consistent with torque sizing. The complete chassis can be constructed from commodity T-slot extrusion with minimal machining, resulting in a bill of materials costing approximately $5,000 - $6,000, which enables replication and customization. The mentioned results demonstrate that low-cost, reconfigurable over-the-row robots can achieve effective weed management with reduced crop damage and labor requirements, while providing a versatile foundation for phenotyping, sensing, and other agriculture applications. Design files and implementation details are released to accelerate research and adoption of modular agricultural robotics.
comment: GitHub: https://github.com/structuresComp/agri-cruiser
♻ ☆ Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
♻ ☆ Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.
comment: Submitted to AROB-ISBC 2026 (Journal Track option)
♻ ☆ SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
comment: 24 pages, 6 figures
♻ ☆ ADPro: a Test-time Adaptive Diffusion Policy via Manifold-constrained Denoising and Task-aware Initialization for Robotic Manipulation
Diffusion policies have recently emerged as a powerful class of visuomotor controllers for robot manipulation, offering stable training and expressive multi-modal action modeling. However, existing approaches typically treat action generation as an unconstrained denoising process, ignoring valuable a priori knowledge about geometry and control structure. In this work, we propose the Adaptive Diffusion Policy (ADP), a test-time adaptation method that introduces two key inductive biases into the diffusion. First, we embed a geometric manifold constraint that aligns denoising updates with task-relevant subspaces, leveraging the fact that the relative pose between the end-effector and target scene provides a natural gradient direction, and guiding denoising along the geodesic path of the manipulation manifold. Then, to reduce unnecessary exploration and accelerate convergence, we propose an analytically guided initialization: rather than sampling from an uninformative prior, we compute a rough registration between the gripper and target scenes to propose a structured initial noisy action. ADP is compatible with pre-trained diffusion policies and requires no retraining, enabling test-time adaptation that tailors the policy to specific tasks, thereby enhancing generalization across novel tasks and environments. Experiments on RLBench, CALVIN, and real-world datasets show that ADPro, an implementation of ADP, improves success rates, generalization, and sampling efficiency, achieving up to 25% faster execution and 9% points over strong diffusion baselines.
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: (i) VFM-driven superpixel generation for detailed semantic representation, (ii) a VFM-assisted contrastive learning strategy to align multimodal features, (iii) superpoint temporal consistency to maintain stable representations across time, and (iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach achieves substantial gains over state-of-the-art methods in linear probing and fine-tuning for LiDAR-based segmentation and object detection. Extensive experiments on 11 large-scale multi-sensor datasets highlight our superior performance, demonstrating adaptability, efficiency, and robustness in real-world autonomous driving scenarios.
comment: IEEE TPAMI 2025; 17 pages, 9 figures, 11 tables; Project Page at https://ldkong.com/LargeAD
♻ ☆ OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata NeurIPS 2025
Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.
comment: Accepted at NeurIPS 2025
♻ ☆ Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
♻ ☆ Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
Drones are increasingly used in forestry to capture high-resolution remote sensing data, supporting enhanced monitoring, assessment, and decision-making processes. While operations above the forest canopy are already highly automated, flying inside forests remains challenging, primarily relying on manual piloting. In dense forests, relying on the Global Navigation Satellite System (GNSS) for localization is not feasible. In addition, the drone must autonomously adjust its flight path to avoid collisions. Recently, advancements in robotics have enabled autonomous drone flights in GNSS-denied obstacle-rich areas. In this article, a step towards autonomous forest data collection is taken by building a prototype of a robotic under-canopy drone utilizing state-of-the-art open source methods and validating its performance for data collection inside forests. Specifically, the study focused on camera-based autonomous flight under the forest canopy and photogrammetric post-processing of the data collected with the low-cost onboard stereo camera. The autonomous flight capability of the prototype was evaluated through multiple test flights in boreal forests. The tree parameter estimation capability was studied by performing diameter at breast height (DBH) estimation. The prototype successfully carried out flights in selected challenging forest environments, and the experiments showed promising performance in forest 3D modeling with a miniaturized stereoscopic photogrammetric system. The DBH estimation achieved a root mean square error (RMSE) of 3.33 - 3.97 cm (10.69 - 12.98 %) across all trees. For trees with a DBH less than 30 cm, the RMSE was 1.16 - 2.56 cm (5.74 - 12.47 %). The results provide valuable insights into autonomous under-canopy forest mapping and highlight the critical next steps for advancing lightweight robotic drone systems for mapping complex forest environments.
comment: 35 pages, 11 Figures
♻ ☆ Track Any Motions under Any Disturbances
A foundational humanoid motion tracker is expected to be able to track diverse, highly dynamic, and contact-rich motions. More importantly, it needs to operate stably in real-world scenarios against various dynamics disturbances, including terrains, external forces, and physical property changes for general practical use. To achieve this goal, we propose Any2Track (Track Any motions under Any disturbances), a two-stage RL framework to track various motions under multiple disturbances in the real world. Any2Track reformulates dynamics adaptability as an additional capability on top of basic action execution and consists of two key components: AnyTracker and AnyAdapter. AnyTracker is a general motion tracker with a series of careful designs to track various motions within a single policy. AnyAdapter is a history-informed adaptation module that endows the tracker with online dynamics adaptability to overcome the sim2real gap and multiple real-world disturbances. We deploy Any2Track on Unitree G1 hardware and achieve a successful sim2real transfer in a zero-shot manner. Any2Track performs exceptionally well in tracking various motions under multiple real-world disturbances.
♻ ☆ Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Learning from long-horizon demonstrations with complex action sequences presents significant challenges for visual imitation learning, particularly in understanding temporal relationships of actions and spatial relationships between objects. In this paper, we propose a new agent framework that incorporates two dedicated reflection modules to enhance both plan and code generation. The plan generation module produces an initial action sequence, which is then verified by the plan reflection module to ensure temporal coherence and spatial alignment with the demonstration video. The code generation module translates the plan into executable code, while the code reflection module verifies and refines the generated code to ensure correctness and consistency with the generated plan. These two reflection modules jointly enable the agent to detect and correct errors in both the plan generation and code generation, improving performance in tasks with intricate temporal and spatial dependencies. To support systematic evaluation, we introduce LongVILBench, a benchmark comprising 300 human demonstrations with action sequences of up to 18 steps. LongVILBench emphasizes temporal and spatial complexity across multiple task types. Experimental results demonstrate that existing methods perform poorly on this benchmark, whereas our new framework establishes a strong baseline for long-horizon visual imitation learning.
comment: 9 pages, 4 figures
♻ ☆ WorldGym: World Model as An Environment for Policy Evaluation
Evaluating robot control policies is difficult: real-world testing is costly, and handcrafted simulators require manual effort to improve in realism and generality. We propose a world-model-based policy evaluation environment (WorldGym), an autoregressive, action-conditioned video generation model which serves as a proxy to real world environments. Policies are evaluated via Monte Carlo rollouts in the world model, with a vision-language model providing rewards. We evaluate a set of VLA-based real-robot policies in the world model using only initial frames from real robots, and show that policy success rates within the world model highly correlate with real-world success rates. Moreoever, we show that WorldGym is able to preserve relative policy rankings across different policy versions, sizes, and training checkpoints. Due to requiring only a single start frame as input, the world model further enables efficient evaluation of robot policies' generalization ability on novel tasks and environments. We find that modern VLA-based robot policies still struggle to distinguish object shapes and can become distracted by adversarial facades of objects. While generating highly realistic object interaction remains challenging, WorldGym faithfully emulates robot motions and offers a practical starting point for safe and reproducible policy evaluation before deployment.
comment: https://world-model-eval.github.io
Computer Vision and Pattern Recognition 150
☆ Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Text-to-Image (T2I) generation models have advanced rapidly in recent years, but accurately capturing spatial relationships like "above" or "to the right of" poses a persistent challenge. Earlier methods improved spatial relationship following with external position control. However, as architectures evolved to enhance image quality, these techniques became incompatible with modern models. We propose Stitch, a training-free method for incorporating external position control into Multi-Modal Diffusion Transformers (MMDiT) via automatically-generated bounding boxes. Stitch produces images that are both spatially accurate and visually appealing by generating individual objects within designated bounding boxes and seamlessly stitching them together. We find that targeted attention heads capture the information necessary to isolate and cut out individual objects mid-generation, without needing to fully complete the image. We evaluate Stitch on PosEval, our benchmark for position-based T2I generation. Featuring five new tasks that extend the concept of Position beyond the basic GenEval task, PosEval demonstrates that even top models still have significant room for improvement in position-based generation. Tested on Qwen-Image, FLUX, and SD3.5, Stitch consistently enhances base models, even improving FLUX by 218% on GenEval's Position task and by 206% on PosEval. Stitch achieves state-of-the-art results with Qwen-Image on PosEval, improving over previous models by 54%, all accomplished while integrating position control into leading models training-free. Code is available at https://github.com/ExplainableML/Stitch.
comment: Preprint
☆ TTT3R: 3D Reconstruction as Test-Time Training
Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R
comment: Page: https://rover-xingyu.github.io/TTT3R Code: https://github.com/Inception3D/TTT3R
☆ Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Unified Multimodal Models (UMMs) have demonstrated remarkable performance in text-to-image generation (T2I) and editing (TI2I), whether instantiated as assembled unified frameworks which couple powerful vision-language model (VLM) with diffusion-based generator, or as naive Unified Multimodal Models with an early fusion of understanding and generation modalities. We contend that in current unified frameworks, the crucial capability of multimodal generative reasoning which encompasses instruction understanding, grounding, and image referring for identity preservation and faithful reconstruction, is intrinsically entangled with high-fidelity synthesis. In this work, we introduce Query-Kontext, a novel approach that bridges the VLM and diffusion model via a multimodal ``kontext'' composed of semantic cues and coarse-grained image conditions encoded from multimodal inputs. This design delegates the complex ability of multimodal generative reasoning to powerful VLM while reserving diffusion model's role for high-quality visual synthesis. To achieve this, we propose a three-stage progressive training strategy. First, we connect the VLM to a lightweight diffusion head via multimodal kontext tokens to unleash the VLM's generative reasoning ability. Second, we scale this head to a large, pre-trained diffusion model to enhance visual detail and realism. Finally, we introduce a low-level image encoder to improve image fidelity and perform instruction tuning on downstream tasks. Furthermore, we build a comprehensive data pipeline integrating real, synthetic, and open-source datasets, covering diverse multimodal reference-to-image scenarios, including image generation, instruction-driven editing, customized generation, and multi-subject composition. Experiments show that our approach matches strong unified baselines and even outperforms task-specific state-of-the-art methods in several cases.
comment: 23 pages, 10 figures
Benchmarking Egocentric Visual-Inertial SLAM at City Scale ICCV 2025
Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.
comment: ICCV 2025
☆ Learning Generalizable Shape Completion with SIM(3) Equivariance NeurIPS 2025
3D shape completion methods typically assume scans are pre-aligned to a canonical frame. This leaks pose and scale cues that networks may exploit to memorize absolute positions rather than inferring intrinsic geometry. When such alignment is absent in real data, performance collapses. We argue that robust generalization demands architectural equivariance to the similarity group, SIM(3), so the model remains agnostic to pose and scale. Following this principle, we introduce the first SIM(3)-equivariant shape completion network, whose modular layers successively canonicalize features, reason over similarity-invariant geometry, and restore the original frame. Under a de-biased evaluation protocol that removes the hidden cues, our model outperforms both equivariant and augmentation baselines on the PCN benchmark. It also sets new cross-domain records on real driving and indoor scans, lowering minimal matching distance on KITTI by 17% and Chamfer distance $\ell1$ on OmniObject3D by 14%. Perhaps surprisingly, ours under the stricter protocol still outperforms competitors under their biased settings. These results establish full SIM(3) equivariance as an effective route to truly generalizable shape completion. Project page: https://sime-completion.github.io.
comment: NeurIPS 2025
☆ Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Large Language Models (LLMs), despite being trained on text alone, surprisingly develop rich visual priors. These priors allow latent visual capabilities to be unlocked for vision tasks with a relatively small amount of multimodal data, and in some cases, to perform visual tasks without ever having seen an image. Through systematic analysis, we reveal that visual priors-the implicit, emergent knowledge about the visual world acquired during language pre-training-are composed of separable perception and reasoning priors with unique scaling trends and origins. We show that an LLM's latent visual reasoning ability is predominantly developed by pre-training on reasoning-centric data (e.g., code, math, academia) and scales progressively. This reasoning prior acquired from language pre-training is transferable and universally applicable to visual reasoning. In contrast, a perception prior emerges more diffusely from broad corpora, and perception ability is more sensitive to the vision encoder and visual instruction tuning data. In parallel, text describing the visual world proves crucial, though its performance impact saturates rapidly. Leveraging these insights, we propose a data-centric recipe for pre-training vision-aware LLMs and verify it in 1T token scale pre-training. Our findings are grounded in over 100 controlled experiments consuming 500,000 GPU-hours, spanning the full MLLM construction pipeline-from LLM pre-training to visual alignment and supervised multimodal fine-tuning-across five model scales, a wide range of data categories and mixtures, and multiple adaptation setups. Along with our main findings, we propose and investigate several hypotheses, and introduce the Multi-Level Existence Bench (MLE-Bench). Together, this work provides a new way of deliberately cultivating visual priors from language pre-training, paving the way for the next generation of multimodal LLMs.
comment: Project page: https://junlinhan.github.io/projects/lsbs/
☆ HART: Human Aligned Reconstruction Transformer
We introduce HART, a unified framework for sparse-view human reconstruction. Given a small set of uncalibrated RGB images of a person as input, it outputs a watertight clothed mesh, the aligned SMPL-X body mesh, and a Gaussian-splat representation for photorealistic novel-view rendering. Prior methods for clothed human reconstruction either optimize parametric templates, which overlook loose garments and human-object interactions, or train implicit functions under simplified camera assumptions, limiting applicability in real scenes. In contrast, HART predicts per-pixel 3D point maps, normals, and body correspondences, and employs an occlusion-aware Poisson reconstruction to recover complete geometry, even in self-occluded regions. These predictions also align with a parametric SMPL-X body model, ensuring that reconstructed geometry remains consistent with human structure while capturing loose clothing and interactions. These human-aligned meshes initialize Gaussian splats to further enable sparse-view rendering. While trained on only 2.3K synthetic scans, HART achieves state-of-the-art results: Chamfer Distance improves by 18-23 percent for clothed-mesh reconstruction, PA-V2V drops by 6-27 percent for SMPL-X estimation, LPIPS decreases by 15-27 percent for novel-view synthesis on a wide range of datasets. These results suggest that feed-forward transformers can serve as a scalable model for robust human reconstruction in real-world settings. Code and models will be released.
comment: Project page: https://xiyichen.github.io/hart
☆ DA$^2$: Depth Anything in Any Direction
Panorama has a full FoV (360$^\circ\times$180$^\circ$), offering a more complete visual description than perspective images. Thanks to this characteristic, panoramic depth estimation is gaining increasing traction in 3D vision. However, due to the scarcity of panoramic data, previous methods are often restricted to in-domain settings, leading to poor zero-shot generalization. Furthermore, due to the spherical distortions inherent in panoramas, many approaches rely on perspective splitting (e.g., cubemaps), which leads to suboptimal efficiency. To address these challenges, we propose $\textbf{DA}$$^{\textbf{2}}$: $\textbf{D}$epth $\textbf{A}$nything in $\textbf{A}$ny $\textbf{D}$irection, an accurate, zero-shot generalizable, and fully end-to-end panoramic depth estimator. Specifically, for scaling up panoramic data, we introduce a data curation engine for generating high-quality panoramic depth data from perspective, and create $\sim$543K panoramic RGB-depth pairs, bringing the total to $\sim$607K. To further mitigate the spherical distortions, we present SphereViT, which explicitly leverages spherical coordinates to enforce the spherical geometric consistency in panoramic image features, yielding improved performance. A comprehensive benchmark on multiple datasets clearly demonstrates DA$^{2}$'s SoTA performance, with an average 38% improvement on AbsRel over the strongest zero-shot baseline. Surprisingly, DA$^{2}$ even outperforms prior in-domain methods, highlighting its superior zero-shot generalization. Moreover, as an end-to-end solution, DA$^{2}$ exhibits much higher efficiency over fusion-based approaches. Both the code and the curated panoramic data will be released. Project page: https://depth-any-in-any-dir.github.io/.
comment: Work primarily done during an internship at Tencent Hunyuan. Project page: https://depth-any-in-any-dir.github.io/
☆ Hy-Facial: Hybrid Feature Extraction by Dimensionality Reduction Methods for Enhanced Facial Expression Classification
Facial expression classification remains a challenging task due to the high dimensionality and inherent complexity of facial image data. This paper presents Hy-Facial, a hybrid feature extraction framework that integrates both deep learning and traditional image processing techniques, complemented by a systematic investigation of dimensionality reduction strategies. The proposed method fuses deep features extracted from the Visual Geometry Group 19-layer network (VGG19) with handcrafted local descriptors and the scale-invariant feature transform (SIFT) and Oriented FAST and Rotated BRIEF (ORB) algorithms, to obtain rich and diverse image representations. To mitigate feature redundancy and reduce computational complexity, we conduct a comprehensive evaluation of dimensionality reduction techniques and feature extraction. Among these, UMAP is identified as the most effective, preserving both local and global structures of the high-dimensional feature space. The Hy-Facial pipeline integrated VGG19, SIFT, and ORB for feature extraction, followed by K-means clustering and UMAP for dimensionality reduction, resulting in a classification accuracy of 83. 3\% in the facial expression recognition (FER) dataset. These findings underscore the pivotal role of dimensionality reduction not only as a pre-processing step but as an essential component in improving feature quality and overall classification performance.
☆ Video Object Segmentation-Aware Audio Generation
Existing multimodal audio generation models often lack precise user control, which limits their applicability in professional Foley workflows. In particular, these models focus on the entire video and do not provide precise methods for prioritizing a specific object within a scene, generating unnecessary background sounds, or focusing on the wrong objects. To address this gap, we introduce the novel task of video object segmentation-aware audio generation, which explicitly conditions sound synthesis on object-level segmentation maps. We present SAGANet, a new multimodal generative model that enables controllable audio generation by leveraging visual segmentation masks along with video and textual cues. Our model provides users with fine-grained and visually localized control over audio generation. To support this task and further research on segmentation-aware Foley, we propose Segmented Music Solos, a benchmark dataset of musical instrument performance videos with segmentation information. Our method demonstrates substantial improvements over current state-of-the-art methods and sets a new standard for controllable, high-fidelity Foley synthesis. Code, samples, and Segmented Music Solos are available at https://saganet.notion.site
comment: Preprint version. The Version of Record is published in DAGM GCPR 2025 proceedings with Springer Lecture Notes in Computer Science (LNCS). Updated results and resources are available at the project page: https://saganet.notion.site
☆ DiffCamera: Arbitrary Refocusing on Images
The depth-of-field (DoF) effect, which introduces aesthetically pleasing blur, enhances photographic quality but is fixed and difficult to modify once the image has been created. This becomes problematic when the applied blur is undesirable~(e.g., the subject is out of focus). To address this, we propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level. Specifically, we design a diffusion transformer framework for refocusing learning. However, the training requires pairs of data with different focus planes and bokeh levels in the same scene, which are hard to acquire. To overcome this limitation, we develop a simulation-based pipeline to generate large-scale image pairs with varying focus planes and bokeh levels. With the simulated data, we find that training with only a vanilla diffusion objective often leads to incorrect DoF behaviors due to the complexity of the task. This requires a stronger constraint during training. Inspired by the photographic principle that photos of different focus planes can be linearly blended into a multi-focus image, we propose a stacking constraint during training to enforce precise DoF manipulation. This constraint enhances model training by imposing physically grounded refocusing behavior that the focusing results should be faithfully aligned with the scene structure and the camera conditions so that they can be combined into the correct multi-focus image. We also construct a benchmark to evaluate the effectiveness of our refocusing model. Extensive experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.
☆ Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
Recent text-only models demonstrate remarkable mathematical reasoning capabilities. Extending these to visual domains requires vision-language models to translate images into text descriptions. However, current models, trained to produce captions for human readers, often omit the precise details that reasoning systems require. This creates an interface mismatch: reasoners often fail not due to reasoning limitations but because they lack access to critical visual information. We propose Adaptive-Clarification Reinforcement Learning (AC-RL), which teaches vision models what information reasoners need through interaction. Our key insight is that clarification requests during training reveal information gaps; by penalizing success that requires clarification, we create pressure for comprehensive initial captions that enable the reasoner to solve the problem in a single pass. AC-RL improves average accuracy by 4.4 points over pretrained baselines across seven visual mathematical reasoning benchmarks, and analysis shows it would cut clarification requests by up to 39% if those were allowed. By treating clarification as a form of implicit supervision, AC-RL demonstrates that vision-language interfaces can be effectively learned through interaction alone, without requiring explicit annotations.
☆ Autoproof: Automated Segmentation Proofreading for Connectomics
Producing connectomes from electron microscopy (EM) images has historically required a great deal of human proofreading effort. This manual annotation cost is the current bottleneck in scaling EM connectomics, for example, in making larger connectome reconstructions feasible, or in enabling comparative connectomics where multiple related reconstructions are produced. In this work, we propose using the available ground-truth data generated by this manual annotation effort to learn a machine learning model to automate or optimize parts of the required proofreading workflows. We validate our approach on a recent complete reconstruction of the \emph{Drosophila} male central nervous system. We first show our method would allow for obtaining 90\% of the value of a guided proofreading workflow while reducing required cost by 80\%. We then demonstrate a second application for automatically merging many segmentation fragments to proofread neurons. Our system is able to automatically attach 200 thousand fragments, equivalent to four proofreader years of manual work, and increasing the connectivity completion rate of the connectome by 1.3\% points.
☆ Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation NeurIPS 2025
Recent advances in video generation have enabled high-fidelity video synthesis from user provided prompts. However, existing models and benchmarks fail to capture the complexity and requirements of professional video generation. Towards that goal, we introduce Stable Cinemetrics, a structured evaluation framework that formalizes filmmaking controls into four disentangled, hierarchical taxonomies: Setup, Event, Lighting, and Camera. Together, these taxonomies define 76 fine-grained control nodes grounded in industry practices. Using these taxonomies, we construct a benchmark of prompts aligned with professional use cases and develop an automated pipeline for prompt categorization and question generation, enabling independent evaluation of each control dimension. We conduct a large-scale human study spanning 10+ models and 20K videos, annotated by a pool of 80+ film professionals. Our analysis, both coarse and fine-grained reveal that even the strongest current models exhibit significant gaps, particularly in Events and Camera-related controls. To enable scalable evaluation, we train an automatic evaluator, a vision-language model aligned with expert annotations that outperforms existing zero-shot baselines. SCINE is the first approach to situate professional video generation within the landscape of video generative models, introducing taxonomies centered around cinematic controls and supporting them with structured evaluation pipelines and detailed analyses to guide future research.
comment: NeurIPS 2025. Project Page : https://stable-cinemetrics.github.io/
☆ Automated and Scalable SEM Image Analysis of Perovskite Solar Cell Materials via a Deep Segmentation Framework
Scanning Electron Microscopy (SEM) is indispensable for characterizing the microstructure of thin films during perovskite solar cell fabrication. Accurate identification and quantification of lead iodide and perovskite phases are critical because residual lead iodide strongly influences crystallization pathways and defect formation, while the morphology of perovskite grains governs carrier transport and device stability. Yet current SEM image analysis is still largely manual, limiting throughput and consistency. Here, we present an automated deep learning-based framework for SEM image segmentation that enables precise and efficient identification of lead iodide, perovskite and defect domains across diverse morphologies. Built upon an improved YOLOv8x architecture, our model named PerovSegNet incorporates two novel modules: (i) Adaptive Shuffle Dilated Convolution Block, which enhances multi-scale and fine-grained feature extraction through group convolutions and channel mixing; and (ii) Separable Adaptive Downsampling module, which jointly preserves fine-scale textures and large-scale structures for more robust boundary recognition. Trained on an augmented dataset of 10,994 SEM images, PerovSegNet achieves a mean Average Precision of 87.25% with 265.4 Giga Floating Point Operations, outperforming the baseline YOLOv8x-seg by 4.08%, while reducing model size and computational load by 24.43% and 25.22%, respectively. Beyond segmentation, the framework provides quantitative grain-level metrics, such as lead iodide/perovskite area and count, which can serve as reliable indicators of crystallization efficiency and microstructural quality. These capabilities establish PerovSegNet as a scalable tool for real-time process monitoring and data-driven optimization of perovskite thin-film fabrication.The source code is available at:https://github.com/wlyyj/PerovSegNet/tree/master.
☆ Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of $91.6\%$, $53.3\%$, and $61.2\%$ on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of $28.0\%$ on AndroidWorld and $19.8\%$ on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.
☆ OceanGym: A Benchmark Environment for Underwater Embodied Agents
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
comment: Work in progress
☆ GastroViT: A Vision Transformer Based Ensemble Learning Approach for Gastrointestinal Disease Classification with Grad CAM & SHAP Visualization
The gastrointestinal (GI) tract of humans can have a wide variety of aberrant mucosal abnormality findings, ranging from mild irritations to extremely fatal illnesses. Prompt identification of gastrointestinal disorders greatly contributes to arresting the progression of the illness and improving therapeutic outcomes. This paper presents an ensemble of pre-trained vision transformers (ViTs) for accurately classifying endoscopic images of the GI tract to categorize gastrointestinal problems and illnesses. ViTs, attention-based neural networks, have revolutionized image recognition by leveraging the transformative power of the transformer architecture, achieving state-of-the-art (SOTA) performance across various visual tasks. The proposed model was evaluated on the publicly available HyperKvasir dataset with 10,662 images of 23 different GI diseases for the purpose of identifying GI tract diseases. An ensemble method is proposed utilizing the predictions of two pre-trained models, MobileViT_XS and MobileViT_V2_200, which achieved accuracies of 90.57% and 90.48%, respectively. All the individual models are outperformed by the ensemble model, GastroViT, with an average precision, recall, F1 score, and accuracy of 69%, 63%, 64%, and 91.98%, respectively, in the first testing that involves 23 classes. The model comprises only 20 million (M) parameters, even without data augmentation and despite the highly imbalanced dataset. For the second testing with 16 classes, the scores are even higher, with average precision, recall, F1 score, and accuracy of 87%, 86%, 87%, and 92.70%, respectively. Additionally, the incorporation of explainable AI (XAI) methods such as Grad-CAM (Gradient Weighted Class Activation Mapping) and SHAP (Shapley Additive Explanations) enhances model interpretability, providing valuable insights for reliable GI diagnosis in real-world settings.
☆ DEPTHOR++: Robust Depth Enhancement from a Real-World Lightweight dToF and RGB Guidance
Depth enhancement, which converts raw dToF signals into dense depth maps using RGB guidance, is crucial for improving depth perception in high-precision tasks such as 3D reconstruction and SLAM. However, existing methods often assume ideal dToF inputs and perfect dToF-RGB alignment, overlooking calibration errors and anomalies, thus limiting real-world applicability. This work systematically analyzes the noise characteristics of real-world lightweight dToF sensors and proposes a practical and novel depth completion framework, DEPTHOR++, which enhances robustness to noisy dToF inputs from three key aspects. First, we introduce a simulation method based on synthetic datasets to generate realistic training samples for robust model training. Second, we propose a learnable-parameter-free anomaly detection mechanism to identify and remove erroneous dToF measurements, preventing misleading propagation during completion. Third, we design a depth completion network tailored to noisy dToF inputs, which integrates RGB images and pre-trained monocular depth estimation priors to improve depth recovery in challenging regions. On the ZJU-L5 dataset and real-world samples, our training strategy significantly boosts existing depth completion models, with our model achieving state-of-the-art performance, improving RMSE and Rel by 22% and 11% on average. On the Mirror3D-NYU dataset, by incorporating the anomaly detection method, our model improves upon the previous SOTA by 37% in mirror regions. On the Hammer dataset, using simulated low-cost dToF data from RealSense L515, our method surpasses the L515 measurements with an average gain of 22%, demonstrating its potential to enable low-cost sensors to outperform higher-end devices. Qualitative results across diverse real-world datasets further validate the effectiveness and generalizability of our approach.
comment: 15 pages, 16 figures
☆ Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
The rapid advancement of large language models (LLMs) has significantly advanced the capabilities of artificial intelligence across various domains. However, their massive scale and high computational costs render them unsuitable for direct deployment in resource-constrained edge environments. This creates a critical need for high-performance small models that can operate efficiently at the edge. Yet, after pre-training alone, these smaller models often fail to meet the performance requirements of complex tasks. To bridge this gap, we introduce a systematic post-training pipeline that efficiently enhances small model accuracy. Our post training pipeline consists of curriculum-based supervised fine-tuning (SFT) and offline on-policy knowledge distillation. The resulting instruction-tuned model achieves state-of-the-art performance among billion-parameter models, demonstrating strong generalization under strict hardware constraints while maintaining competitive accuracy across a variety of tasks. This work provides a practical and efficient solution for developing high-performance language models on Ascend edge devices.
comment: 7
☆ Contrastive Diffusion Guidance for Spatial Inverse Problems
We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user`s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path-planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.
☆ CBAM Integrated Attention Driven Model For Betel Leaf Diseases Classification With Explainable AI
Betel leaf is an important crop because of its economic advantages and widespread use. Its betel vines are susceptible to a number of illnesses that are commonly referred to as betel leaf disease. Plant diseases are the largest threat to the food supply's security, and they are challenging to identify in time to stop possible financial damage. Interestingly, artificial intelligence can leave a big mark on the betel leaf industry since it helps with output growth by forecasting sickness. This paper presents a lightweight CBAM-CNN model with just 2.13 million parameters (8.13 MB), incorporating CBAM (Convolutional Block Attention Module) to improve feature emphasis without depending on heavy pre-trained networks. The model's capacity to discern minute variations among leaf disease classes is improved by the integrated attention mechanism, which allows it to adaptively focus on significant spatial and channel-wise information. In order to ensure class balance and diversity for efficient model training and validation, this work makes use of an enriched dataset of 10,185 images divided into three categories: Healthy Leaf, Leaf Rot, and Leaf Spot. The proposed model achieved a precision of 97%, recall of 94%, and F1 score of 95%, and 95.58% accuracy on the test set demonstrating strong and balanced classification performance outperforming traditional pre trained CNN models. The model's focus regions were visualized and interpreted using Grad-CAM (Gradient-weighted Class Activation Mapping), an explainable AI technique.
☆ Zero-Shot Decentralized Federated Learning
CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.
comment: Accepted at International Joint Conference on Neural Networks (IJCNN) 2025. Code available at https://github.com/perceivelab/ZeroDFL
☆ Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.
comment: British Machine Vision Conference (BMVC 2025), in the From Scene Understanding to Human Modeling Workshop
☆ Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi-view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the effectiveness of global style-content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings.
☆ Multi-View Camera System for Variant-Aware Autonomous Vehicle Inspection and Defect Detection
Ensuring that every vehicle leaving a modern production line is built to the correct \emph{variant} specification and is free from visible defects is an increasingly complex challenge. We present the \textbf{Automated Vehicle Inspection (AVI)} platform, an end-to-end, \emph{multi-view} perception system that couples deep-learning detectors with a semantic rule engine to deliver \emph{variant-aware} quality control in real time. Eleven synchronized cameras capture a full 360{\deg} sweep of each vehicle; task-specific views are then routed to specialised modules: YOLOv8 for part detection, EfficientNet for ICE/EV classification, Gemini-1.5 Flash for mascot OCR, and YOLOv8-Seg for scratch-and-dent segmentation. A view-aware fusion layer standardises evidence, while a VIN-conditioned rule engine compares detected features against the expected manifest, producing an interpretable pass/fail report in \(\approx\! 300\,\text{ms}\). On a mixed data set of Original Equipment Manufacturer(OEM) vehicle data sets of four distinct models plus public scratch/dent images, AVI achieves \textbf{ 93 \%} verification accuracy, \textbf{86 \%} defect-detection recall, and sustains \(\mathbf{3.3}\) vehicles/min, surpassing single-view or no segmentation baselines by large margins. To our knowledge, this is the first publicly reported system that unifies multi-camera feature validation with defect detection in a deployable automotive setting in industry.
☆ Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
Diffusion models achieve high-quality image generation but face deployment challenges due to their high computational requirements. Although 8-bit outlier-aware post-training quantization (PTQ) matches full-precision performance, extending PTQ to 4 bits remains challenging. Larger step sizes in 4-bit quantization amplify rounding errors in dense, low-magnitude activations, leading to the loss of fine-grained textures. We hypothesize that not only outliers but also small activations are critical for texture fidelity. To this end, we propose Quantization via Residual Truncation and Zero Suppression (QuaRTZ), a 4-bit PTQ scheme for diffusion models. QuaRTZ applies 8-bit min-max quantization for outlier handling and compresses to 4 bits via leading-zero suppression to retain LSBs, thereby preserving texture details. Our approach reduces rounding errors and improves quantization efficiency by balancing outlier preservation and LSB precision. Both theoretical derivations and empirical evaluations demonstrate the generalizability of QuaRTZ across diverse activation distributions. Notably, 4-bit QuaRTZ achieves an FID of 6.98 on FLUX.1-schnell, outperforming SVDQuant that requires auxiliary FP16 branches.
☆ PRISM: Progressive Rain removal with Integrated State-space Modeling
Image deraining is an essential vision technique that removes rain streaks and water droplets, enhancing clarity for critical vision tasks like autonomous driving. However, current single-scale models struggle with fine-grained recovery and global consistency. To address this challenge, we propose Progressive Rain removal with Integrated State-space Modeling (PRISM), a progressive three-stage framework: Coarse Extraction Network (CENet), Frequency Fusion Network (SFNet), and Refine Network (RNet). Specifically, CENet and SFNet utilize a novel Hybrid Attention UNet (HA-UNet) for multi-scale feature aggregation by combining channel attention with windowed spatial transformers. Moreover, we propose Hybrid Domain Mamba (HDMamba) for SFNet to jointly model spatial semantics and wavelet domain characteristics. Finally, RNet recovers the fine-grained structures via an original-resolution subnetwork. Our model learns high-frequency rain characteristics while preserving structural details and maintaining global context, leading to improved image quality. Our method achieves competitive results on multiple datasets against recent deraining methods.
comment: Preprint. Submitted to an IEEE conference and currently under review. Copyright 2025 IEEE; personal use permitted; all other uses require permission
☆ Image-Difficulty-Aware Evaluation of Super-Resolution Models
Image super-resolution models are commonly evaluated by average scores (over some benchmark test sets), which fail to reflect the performance of these models on images of varying difficulty and that some models generate artifacts on certain difficult images, which is not reflected by the average scores. We propose difficulty-aware performance evaluation procedures to better differentiate between SISR models that produce visually different results on some images but yield close average performance scores over the entire test set. In particular, we propose two image-difficulty measures, the high-frequency index and rotation-invariant edge index, to predict those test images, where a model would yield significantly better visual results over another model, and an evaluation method where these visual differences are reflected on objective measures. Experimental results demonstrate the effectiveness of the proposed image-difficulty measures and evaluation methodology.
comment: Accepted to and presented at ICIP 2025 Workshops
☆ MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Image-to-video generation has made remarkable progress with the advancements in diffusion models, yet generating videos with realistic motion remains highly challenging. This difficulty arises from the complexity of accurately modeling motion, which involves capturing physical constraints, object interactions, and domain-specific dynamics that are not easily generalized across diverse scenarios. To address this, we propose MotionRAG, a retrieval-augmented framework that enhances motion realism by adapting motion priors from relevant reference videos through Context-Aware Motion Adaptation (CAMA). The key technical innovations include: (i) a retrieval-based pipeline extracting high-level motion features using video encoder and specialized resamplers to distill semantic motion representations; (ii) an in-context learning approach for motion adaptation implemented through a causal transformer architecture; (iii) an attention-based motion injection adapter that seamlessly integrates transferred motion features into pretrained video diffusion models. Extensive experiments demonstrate that our method achieves significant improvements across multiple domains and various base models, all with negligible computational overhead during inference. Furthermore, our modular design enables zero-shot generalization to new domains by simply updating the retrieval database without retraining any components. This research enhances the core capability of video generation systems by enabling the effective retrieval and transfer of motion priors, facilitating the synthesis of realistic motion dynamics.
☆ PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer NeurIPS 2025
Video anomaly detection (VAD) is a critical yet challenging task due to the complex and diverse nature of real-world scenarios. Previous methods typically rely on domain-specific training data and manual adjustments when applying to new scenarios and unseen anomaly types, suffering from high labor costs and limited generalization. Therefore, we aim to achieve generalist VAD, i.e., automatically handle any scene and any anomaly types without training data or human involvement. In this work, we propose PANDA, an agentic AI engineer based on MLLMs. Specifically, we achieve PANDA by comprehensively devising four key capabilities: (1) self-adaptive scene-aware strategy planning, (2) goal-driven heuristic reasoning, (3) tool-augmented self-reflection, and (4) self-improving chain-of-memory. Concretely, we develop a self-adaptive scene-aware RAG mechanism, enabling PANDA to retrieve anomaly-specific knowledge for anomaly detection strategy planning. Next, we introduce a latent anomaly-guided heuristic prompt strategy to enhance reasoning precision. Furthermore, PANDA employs a progressive reflection mechanism alongside a suite of context-aware tools to iteratively refine decision-making in complex scenarios. Finally, a chain-of-memory mechanism enables PANDA to leverage historical experiences for continual performance improvement. Extensive experiments demonstrate that PANDA achieves state-of-the-art performance in multi-scenario, open-set, and complex scenario settings without training and manual involvement, validating its generalizable and robust anomaly detection capability. Code is released at https://github.com/showlab/PANDA.
comment: Accepted by NeurIPS 2025
☆ MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
Multimodal retrieval is becoming a crucial component of modern AI applications, yet its evaluation lags behind the demands of more realistic and challenging scenarios. Existing benchmarks primarily probe surface-level semantic correspondence (e.g., object-text matching) while failing to assess the deeper reasoning required to capture complex relationships between visual and textual information. To address this gap, we introduce MR$^2$-Bench, a reasoning-intensive benchmark for multimodal retrieval. MR$^2$-Bench presents the following critical values: 1) all tasks are reasoning-driven, going beyond shallow matching to effectively assess models' capacity for logical, spatial, and causal inference; 2) it features diverse multimodal data, such as natural images, diagrams, and visual puzzles, enabling comprehensive evaluation across content types; 3) it supports complex queries and documents containing multiple images and covers diverse retrieval scenarios, more accurately reflecting real-world applications. Our benchmark contains 1,309 curated queries, derived either from manual collection and annotation or from selective consolidation of public datasets. Despite achieving strong results on existing benchmarks, current state-of-the-art models still struggle on MR$^2$-Bench: for example, the leading Seed1.6-Embedding model attains a Recall@1 of 77.78 on MMEB, but only 9.91 on MR$^2$-Bench. This substantial performance gap highlights both the increased challenge posed by our benchmark and the pressing need for further advances in reasoning-intensive multimodal retrieval. The dataset and evaluation code will be made publicly available at https://github.com/VectorSpaceLab/MR2-Bench.
☆ Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
Test-time scaling (TTS) has demonstrated remarkable success in enhancing large language models, yet its application to next-token prediction (NTP) autoregressive (AR) image generation remains largely uncharted. Existing TTS approaches for visual AR (VAR), which rely on frequent partial decoding and external reward models, are ill-suited for NTP-based image generation due to the inherent incompleteness of intermediate decoding results. To bridge this gap, we introduce ScalingAR, the first TTS framework specifically designed for NTP-based AR image generation that eliminates the need for early decoding or auxiliary rewards. ScalingAR leverages token entropy as a novel signal in visual token generation and operates at two complementary scaling levels: (i) Profile Level, which streams a calibrated confidence state by fusing intrinsic and conditional signals; and (ii) Policy Level, which utilizes this state to adaptively terminate low-confidence trajectories and dynamically schedule guidance for phase-appropriate conditioning strength. Experiments on both general and compositional benchmarks show that ScalingAR (1) improves base models by 12.5% on GenEval and 15.2% on TIIF-Bench, (2) efficiently reduces visual token consumption by 62.0% while outperforming baselines, and (3) successfully enhances robustness, mitigating performance drops by 26.0% in challenging scenarios.
comment: Code: https://github.com/EnVision-Research/ScalingAR
☆ SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning
Embodied task planning requires agents to produce executable actions in a close-loop manner within the environment. With progressively improving capabilities of LLMs in task decomposition, planning, and generalization, current embodied task planning methods adopt LLM-based architecture.However, existing LLM-based planners remain limited in three aspects, i.e., fixed planning paradigms, lack of action sequence constraints, and error-agnostic. In this work, we propose SDA-PLANNER, enabling an adaptive planning paradigm, state-dependency aware and error-aware mechanisms for comprehensive embodied task planning. Specifically, SDA-PLANNER introduces a State-Dependency Graph to explicitly model action preconditions and effects, guiding the dynamic revision. To handle execution error, it employs an error-adaptive replanning strategy consisting of Error Backtrack and Diagnosis and Adaptive Action SubTree Generation, which locally reconstructs the affected portion of the plan based on the current environment state. Experiments demonstrate that SDA-PLANNER consistently outperforms baselines in success rate and goal completion, particularly under diverse error conditions.
☆ TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
Identifying key moments in long videos is essential for downstream understanding and reasoning tasks. In this paper, we introduce a new problem, Taskoriented Temporal Grounding ToTG, which aims to localize time intervals containing the necessary information based on a task's natural description. Along with the definition, we also present ToTG Bench, a comprehensive benchmark for evaluating the performance on ToTG. ToTG is particularly challenging for traditional approaches due to their limited generalizability and difficulty in handling long videos. To address these challenges, we propose TimeScope, a novel framework built upon progressive reasoning. TimeScope first identifies a coarse-grained temporal scope in the long video that likely contains the key moments, and then refines this scope through finegrained moment partitioning. Additionally, we curate a highquality dataset, namely ToTG Pile, to enhance TimeScope's ability to perform progressive temporal grounding effectively. Extensive experiments demonstrate that TimeScope consistently outperforms both existing temporalgrounding methods and popular MLLMs across various settings, highlighting its effectiveness in addressing this new challenging problem.
☆ EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built \mname, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. \mname demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that \mname achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new \benchname, outperforming a wide range of VLM-as-judge models. Furthermore, we use \mname to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates \mname's ability to serve as a reward model to scale up high-quality training data for image editing. Furthermore, its strong alignment suggests potential for advanced applications like reinforcement learning-based post-training and test-time scaling of image editing models. \mname with its training dataset will be released to help the community build more high-quality image editing training datasets.
comment: Work in progress. Project Page: https://tiger-ai-lab.github.io/EditReward
☆ SQUARE: Semantic Query-Augmented Fusion and Efficient Batch Reranking for Training-free Zero-Shot Composed Image Retrieval
Composed Image Retrieval (CIR) aims to retrieve target images that preserve the visual content of a reference image while incorporating user-specified textual modifications. Training-free zero-shot CIR (ZS-CIR) approaches, which require no task-specific training or labeled data, are highly desirable, yet accurately capturing user intent remains challenging. In this paper, we present SQUARE, a novel two-stage training-free framework that leverages Multimodal Large Language Models (MLLMs) to enhance ZS-CIR. In the Semantic Query-Augmented Fusion (SQAF) stage, we enrich the query embedding derived from a vision-language model (VLM) such as CLIP with MLLM-generated captions of the target image. These captions provide high-level semantic guidance, enabling the query to better capture the user's intent and improve global retrieval quality. In the Efficient Batch Reranking (EBR) stage, top-ranked candidates are presented as an image grid with visual marks to the MLLM, which performs joint visual-semantic reasoning across all candidates. Our reranking strategy operates in a single pass and yields more accurate rankings. Experiments show that SQUARE, with its simplicity and effectiveness, delivers strong performance on four standard CIR benchmarks. Notably, it maintains high performance even with lightweight pre-trained, demonstrating its potential applicability.
comment: 20 pages, 9 figures
☆ Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
We introduce a novel formulation for continuous space-time video super-resolution. Instead of decoupling the representation of a video sequence into separate spatial and temporal components and relying on brittle, explicit frame warping for motion compensation, we encode video as a continuous, spatio-temporally coherent 3D Video Fourier Field (VFF). That representation offers three key advantages: (1) it enables cheap, flexible sampling at arbitrary locations in space and time; (2) it is able to simultaneously capture fine spatial detail and smooth temporal dynamics; and (3) it offers the possibility to include an analytical, Gaussian point spread function in the sampling to ensure aliasing-free reconstruction at arbitrary scale. The coefficients of the proposed, Fourier-like sinusoidal basis are predicted with a neural encoder with a large spatio-temporal receptive field, conditioned on the low-resolution input video. Through extensive experiments, we show that our joint modeling substantially improves both spatial and temporal super-resolution and sets a new state of the art for multiple benchmarks: across a wide range of upscaling factors, it delivers sharper and temporally more consistent reconstructions than existing baselines, while being computationally more efficient. Project page: https://v3vsr.github.io.
☆ FLOWER: A Flow-Matching Solver for Inverse Problems
We introduce Flower, a solver for inverse problems. It leverages a pre-trained flow model to produce reconstructions that are consistent with the observed measurements. Flower operates through an iterative procedure over three steps: (i) a flow-consistent destination estimation, where the velocity network predicts a denoised target; (ii) a refinement step that projects the estimated destination onto a feasible set defined by the forward operator; and (iii) a time-progression step that re-projects the refined destination along the flow trajectory. We provide a theoretical analysis that demonstrates how Flower approximates Bayesian posterior sampling, thereby unifying perspectives from plug-and-play methods and generative inverse solvers. On the practical side, Flower achieves state-of-the-art reconstruction quality while using nearly identical hyperparameters across various inverse problems.
☆ Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
Driven by the growing need for Oriented Object Detection (OOD), learning from point annotations under a weakly-supervised framework has emerged as a promising alternative to costly and laborious manual labeling. In this paper, we discuss two deficiencies in existing point-supervised methods: inefficient utilization and poor quality of pseudo labels. Therefore, we present Point2RBox-v3. At the core are two principles: 1) Progressive Label Assignment (PLA). It dynamically estimates instance sizes in a coarse yet intelligent manner at different stages of the training process, enabling the use of label assignment methods. 2) Prior-Guided Dynamic Mask Loss (PGDM-Loss). It is an enhancement of the Voronoi Watershed Loss from Point2RBox-v2, which overcomes the shortcomings of Watershed in its poor performance in sparse scenes and SAM's poor performance in dense scenes. To our knowledge, Point2RBox-v3 is the first model to employ dynamic pseudo labels for label assignment, and it creatively complements the advantages of SAM model with the watershed algorithm, which achieves excellent performance in both sparse and dense scenes. Our solution gives competitive performance, especially in scenarios with large variations in object size or sparse object occurrences: 66.09%/56.86%/41.28%/46.40%/19.60%/45.96% on DOTA-v1.0/DOTA-v1.5/DOTA-v2.0/DIOR/STAR/RSAR.
comment: 19pages, 5figures, 6tables
☆ ProfVLM: A Lightweight Video-Language Model for Multi-View Proficiency Estimation
Existing approaches to skill proficiency estimation often rely on black-box video classifiers, ignoring multi-view context and lacking explainability. We present ProfVLM, a compact vision-language model that reformulates this task as generative reasoning: it jointly predicts skill level and generates expert-like feedback from egocentric and exocentric videos. Central to our method is an AttentiveGatedProjector that dynamically fuses multi-view features, projected from a frozen TimeSformer backbone into a language model tuned for feedback generation. Trained on EgoExo4D with expert commentaries, ProfVLM surpasses state-of-the-art methods while using up to 20x fewer parameters and reducing training time by up to 60%. Our approach not only achieves superior accuracy across diverse activities, but also outputs natural language critiques aligned with performance, offering transparent reasoning. These results highlight generative vision-language modeling as a powerful new direction for skill assessment.
☆ Cat: Post-training quantization error reduction via cluster-based affine transformation
Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types. While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization (LQ) regime (e.g., 2-bit). Affine transformation is a classical technique used to reduce the discrepancy between the information processed by a quantized model and that processed by its full-precision counterpart; however, we find that using plain affine transformation, which applies a uniform affine parameter set for all outputs, worsens the results in low-bit PTQ. To address this, we propose Cluster-based Affine Transformation (CAT), an error-reduction framework that employs cluster-specific parameters to align LQ outputs with FP counterparts. CAT refines LQ outputs with only a negligible number of additional parameters, without requiring fine-tuning of the model or quantization parameters. We further introduce a novel PTQ framework integrated with CAT. Experiments on ImageNet-1K show that this framework consistently outperforms prior PTQ methods across diverse architectures and LQ settings, achieving up to 53.18% Top-1 accuracy on W2A2 ResNet-18. Moreover, CAT enhances existing PTQ baselines by more than 3% when used as a plug-in. We plan to release our implementation alongside the publication of this paper.
comment: 29 pages, 20 figures
☆ PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake detection and propose Paragraph-level Relative Policy Optimization (PRPO), a reinforcement learning algorithm that aligns LLM reasoning with image content at the paragraph level. Experiments show that PRPO improves detection accuracy by a wide margin and achieves the highest reasoning score of 4.55/5.0. Ablation studies further demonstrate that PRPO significantly outperforms GRPO under test-time conditions. These results underscore the importance of grounding multimodal reasoning in visual evidence to enable more reliable and interpretable deepfake detection.
☆ ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic causal-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.
comment: 41 pages. The last two authors contributed equally in co-advising
☆ Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA
Latent Action Models (LAMs) enable Vision-Language-Action (VLA) systems to learn semantic action representations from large-scale unannotated data. Yet, we identify two bottlenecks of LAMs: 1) the commonly adopted end-to-end trained image encoder suffers from poor spatial understanding; 2) LAMs can be fragile when input frames are distant, leading to limited temporal perception. Such factors inevitably hinder stable and clear action modeling. To this end, we propose Farsighted-LAM, a latent action framework with geometry-aware spatial encoding and multi-scale temporal modeling, capturing structural priors and dynamic motion patterns from consecutive frames. We further propose SSM-VLA, an end-to-end VLA framework built upon Farsighted-LAM, which integrates structured perception with a visual Chain-of-Thought module to explicitly reason about environmental dynamics, enhancing decision consistency and interpretability. We validate SSM-VLA on multiple VLA tasks in both simulation and real-world settings, and achieve state-of-the-art performance. Our results demonstrate that our strategy of combining geometry-aware modeling, temporal coherence, and explicit reasoning is effective in enhancing the robustness and generalizability of embodied intelligence.
☆ Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Recent advances in Visually-rich Document Understanding rely on large Vision-Language Models like Donut, which perform document-level Visual Question Answering without Optical Character Recognition. Despite their effectiveness, these models are too costly for real-time or resource-constrained applications. We investigate model compression through knowledge distillation, training compact student models from a larger teacher. We leverage mechanistic interpretability to drive student architecture design within this framework. By analyzing internal computations, we identify essential subcomponents to retain, while having a clear view of which subcomponents should be approximated, skipped, or reparametrized based on their function. This approach yields Donut-MINT (Mechanistic Interpretability-based Network Trimming), a pruned Donut variant that reduces inference time and memory usage while maintaining strong performance on DocVQA, a standard benchmark for document Visual Question Answering. Our method reframes compression as circuit discovery, bridging interpretability research and practical Vision-Language Model deployment.
comment: Accepted at Workshop on Machine Learning in Document Analysis and Recognition (ICDAR WML 2025), Wuhan, China
☆ 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
Creating personalized 3D animations with precise control and realistic head motions remains challenging for current speech-driven 3D facial animation methods. Editing these animations is especially complex and time consuming, requires precise control and typically handled by highly skilled animators. Most existing works focus on controlling style or emotion of the synthesized animation and cannot edit/regenerate parts of an input animation. They also overlook the fact that multiple plausible lip and head movements can match the same audio input. To address these challenges, we present 3DiFACE, a novel method for holistic speech-driven 3D facial animation. Our approach produces diverse plausible lip and head motions for a single audio input and allows for editing via keyframing and interpolation. Specifically, we propose a fully-convolutional diffusion model that can leverage the viseme-level diversity in our training corpus. Additionally, we employ a speaking-style personalization and a novel sparsely-guided motion diffusion to enable precise control and editing. Through quantitative and qualitative evaluations, we demonstrate that our method is capable of generating and editing diverse holistic 3D facial animations given a single audio input, with control between high fidelity and diversity. Code and models are available here: https://balamuruganthambiraja.github.io/3DiFACE
☆ IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance ICCV 2025
Ensuring precise multimodal alignment between diffusion-generated images and input prompts has been a long-standing challenge. Earlier works finetune diffusion weight using high-quality preference data, which tends to be limited and difficult to scale up. Recent editing-based methods further refine local regions of generated images but may compromise overall image quality. In this work, we propose Implicit Multimodal Guidance (IMG), a novel re-generation-based multimodal alignment framework that requires no extra data or editing operations. Specifically, given a generated image and its prompt, IMG a) utilizes a multimodal large language model (MLLM) to identify misalignments; b) introduces an Implicit Aligner that manipulates diffusion conditioning features to reduce misalignments and enable re-generation; and c) formulates the re-alignment goal into a trainable objective, namely Iteratively Updated Preference Objective. Extensive qualitative and quantitative evaluations on SDXL, SDXL-DPO, and FLUX show that IMG outperforms existing alignment methods. Furthermore, IMG acts as a flexible plug-and-play adapter, seamlessly enhancing prior finetuning-based alignment methods. Our code will be available at https://github.com/SHI-Labs/IMG-Multimodal-Diffusion-Alignment.
comment: ICCV 2025
☆ Generalized Fine-Grained Category Discovery with Multi-Granularity Conceptual Experts
Generalized Category Discovery (GCD) is an open-world problem that clusters unlabeled data by leveraging knowledge from partially labeled categories. A key challenge is that unlabeled data may contain both known and novel categories. Existing approaches suffer from two main limitations. First, they fail to exploit multi-granularity conceptual information in visual data, which limits representation quality. Second, most assume that the number of unlabeled categories is known during training, which is impractical in real-world scenarios. To address these issues, we propose a Multi-Granularity Conceptual Experts (MGCE) framework that adaptively mines visual concepts and integrates multi-granularity knowledge for accurate category discovery. MGCE consists of two modules: (1) Dynamic Conceptual Contrastive Learning (DCCL), which alternates between concept mining and dual-level representation learning to jointly optimize feature learning and category discovery; and (2) Multi-Granularity Experts Collaborative Learning (MECL), which extends the single-expert paradigm by introducing additional experts at different granularities and by employing a concept alignment matrix for effective cross-expert collaboration. Importantly, MGCE can automatically estimate the number of categories in unlabeled data, making it suitable for practical open-world settings. Extensive experiments on nine fine-grained visual recognition benchmarks demonstrate that MGCE achieves state-of-the-art results, particularly in novel-class accuracy. Notably, even without prior knowledge of category numbers, MGCE outperforms parametric approaches that require knowing the exact number of categories, with an average improvement of 3.6\%. Code is available at https://github.com/HaiyangZheng/MGCE.
☆ An Experimental Study on Generating Plausible Textual Explanations for Video Summarization
In this paper, we present our experimental study on generating plausible textual explanations for the outcomes of video summarization. For the needs of this study, we extend an existing framework for multigranular explanation of video summarization by integrating a SOTA Large Multimodal Model (LLaVA-OneVision) and prompting it to produce natural language descriptions of the obtained visual explanations. Following, we focus on one of the most desired characteristics for explainable AI, the plausibility of the obtained explanations that relates with their alignment with the humans' reasoning and expectations. Using the extended framework, we propose an approach for evaluating the plausibility of visual explanations by quantifying the semantic overlap between their textual descriptions and the textual descriptions of the corresponding video summaries, with the help of two methods for creating sentence embeddings (SBERT, SimCSE). Based on the extended framework and the proposed plausibility evaluation approach, we conduct an experimental study using a SOTA method (CA-SUM) and two datasets (SumMe, TVSum) for video summarization, to examine whether the more faithful explanations are also the more plausible ones, and identify the most appropriate approach for generating plausible textual explanations for video summarization.
comment: IEEE CBMI 2025. This is the authors' accepted version. The final publication is available at https://ieeexplore.ieee.org/
☆ Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation
Dataset distillation has emerged as a promising paradigm that synthesizes compact, informative datasets capable of retaining the knowledge of large-scale counterparts, thereby addressing the substantial computational and storage burdens of modern model training. Conventional approaches typically rely on dense pixel-level representations, which introduce redundancy and are difficult to scale up. In this work, we propose GSDD, a novel and efficient sparse representation for dataset distillation based on 2D Gaussians. Instead of representing all pixels equally, GSDD encodes critical discriminative information in a distilled image using only a small number of Gaussian primitives. This sparse representation could improve dataset diversity under the same storage budget, enhancing coverage of difficult samples and boosting distillation performance. To ensure both efficiency and scalability, we adapt CUDA-based splatting operators for parallel inference and training, enabling high-quality rendering with minimal computational and memory overhead. Our method is simple yet effective, broadly applicable to different distillation pipelines, and highly scalable. Experiments show that GSDD achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet subsets, while remaining highly efficient encoding and decoding cost. Our code is available at https://github.com/j-cyoung/GSDatasetDistillation.
comment: 19 pages; Code is available on https://github.com/j-cyoung/GSDatasetDistillation
☆ TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees Videos
In this paper, we deal with the task of text-driven saliency detection in 360-degrees videos. For this, we introduce the TSV360 dataset which includes 16,000 triplets of ERP frames, textual descriptions of salient objects/events in these frames, and the associated ground-truth saliency maps. Following, we extend and adapt a SOTA visual-based approach for 360-degrees video saliency detection, and develop the TSalV360 method that takes into account a user-provided text description of the desired objects and/or events. This method leverages a SOTA vision-language model for data representation and integrates a similarity estimation module and a viewport spatio-temporal cross-attention mechanism, to discover dependencies between the different data modalities. Quantitative and qualitative evaluations using the TSV360 dataset, showed the competitiveness of TSalV360 compared to a SOTA visual-based approach and documented its competency to perform customized text-driven saliency detection in 360-degrees videos.
comment: IEEE CBMI 2025. This is the authors' accepted version. The final publication is available at https://ieeexplore.ieee.org/
☆ Optimizing Indoor Environmental Quality in Smart Buildings Using Deep Learning
Ensuring optimal Indoor Environmental Quality (IEQ) is vital for occupant health and productivity, yet it often comes at a high energy cost in conventional Heating, Ventilation, and Air Conditioning (HVAC) systems. This paper proposes a deep learning driven approach to proactively manage IEQ parameters specifically CO2 concentration, temperature, and humidity while balancing building energy efficiency. Leveraging the ROBOD dataset collected from a net-zero energy academic building, we benchmark three architectures--Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and a hybrid Convolutional Neural Network LSTM (CNN-LSTM)--to forecast IEQ variables across various time horizons. Our results show that GRU achieves the best short-term prediction accuracy with lower computational overhead, whereas CNN-LSTM excels in extracting dominant features for extended forecasting windows. Meanwhile, LSTM offers robust long-range temporal modeling. The comparative analysis highlights that prediction reliability depends on data resolution, sensor placement, and fluctuating occupancy conditions. These findings provide actionable insights for intelligent Building Management Systems (BMS) to implement predictive HVAC control, thereby reducing energy consumption and enhancing occupant comfort in real-world building operations.
comment: 10 pages, 4 figures, 1 table. Accepted and presented at the 5th International Conference on Digital Technologies and Applications (ICDTA 2025), April 17-18, 2025, Al Akhawayn University, Ifrane, Morocco
☆ AttriGen: Automated Multi-Attribute Annotation for Blood Cell Datasets
We introduce AttriGen, a novel framework for automated, fine-grained multi-attribute annotation in computer vision, with a particular focus on cell microscopy where multi-attribute classification remains underrepresented compared to traditional cell type categorization. Using two complementary datasets: the Peripheral Blood Cell (PBC) dataset containing eight distinct cell types and the WBC Attribute Dataset (WBCAtt) that contains their corresponding 11 morphological attributes, we propose a dual-model architecture that combines a CNN for cell type classification, as well as a Vision Transformer (ViT) for multi-attribute classification achieving a new benchmark of 94.62\% accuracy. Our experiments demonstrate that AttriGen significantly enhances model interpretability and offers substantial time and cost efficiency relative to conventional full-scale human annotation. Thus, our framework establishes a new paradigm that can be extended to other computer vision classification tasks by effectively automating the expansion of multi-attribute labels.
comment: 6 pages, 4 figures, 3 tables. Accepted at the 12th International Conference on Wireless Networks and Mobile Communications 2025 (WINCOM 2025)
☆ Neighbor-aware informal settlement mapping with graph convolutional networks
Mapping informal settlements is crucial for addressing challenges related to urban planning, public health, and infrastructure in rapidly growing cities. Geospatial machine learning has emerged as a key tool for detecting and mapping these areas from remote sensing data. However, existing approaches often treat spatial units independently, neglecting the relational structure of the urban fabric. We propose a graph-based framework that explicitly incorporates local geographical context into the classification process. Each spatial unit (cell) is embedded in a graph structure along with its adjacent neighbors, and a lightweight Graph Convolutional Network (GCN) is trained to classify whether the central cell belongs to an informal settlement. Experiments are conducted on a case study in Rio de Janeiro using spatial cross-validation across five distinct zones, ensuring robustness and generalizability across heterogeneous urban landscapes. Our method outperforms standard baselines, improving Kappa coefficient by 17 points over individual cell classification. We also show that graph-based modeling surpasses simple feature concatenation of neighboring cells, demonstrating the benefit of encoding spatial structure for urban scene understanding.
comment: 10 pages, 3 figures, 2 tables. Accepted at the ECML PKDD 2025 Workshop on Machine Learning for Earth Observation
☆ Beyond Overall Accuracy: Pose- and Occlusion-driven Fairness Analysis in Pedestrian Detection for Autonomous Driving
Pedestrian detection plays a critical role in autonomous driving (AD), where ensuring safety and reliability is important. While many detection models aim to reduce miss-rates and handle challenges such as occlusion and long-range recognition, fairness remains an underexplored yet equally important concern. In this work, we systematically investigate how variations in the pedestrian pose -- including leg status, elbow status, and body orientation -- as well as individual joint occlusions, affect detection performance. We evaluate five pedestrian-specific detectors (F2DNet, MGAN, ALFNet, CSP, and Cascade R-CNN) alongside three general-purpose models (YOLOv12 variants) on the EuroCity Persons Dense Pose (ECP-DP) dataset. Fairness is quantified using the Equal Opportunity Difference (EOD) metric across various confidence thresholds. To assess statistical significance and robustness, we apply the Z-test. Our findings highlight biases against pedestrians with parallel legs, straight elbows, and lateral views. Occlusion of lower body joints has a more negative impact on the detection rate compared to the upper body and head. Cascade R-CNN achieves the lowest overall miss-rate and exhibits the smallest bias across all attributes. To the best of our knowledge, this is the first comprehensive pose- and occlusion-aware fairness evaluation in pedestrian detection for AD.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
☆ Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) have demonstrated significant advances in visual understanding tasks. However, their capacity to comprehend human-centric scenes has rarely been explored, primarily due to the absence of comprehensive evaluation benchmarks that take into account both the human-oriented granular level and higher-dimensional causal reasoning ability. Such high-quality evaluation benchmarks face tough obstacles, given the physical complexity of the human body and the difficulty of annotating granular structures. In this paper, we propose Human-MME, a curated benchmark designed to provide a more holistic evaluation of MLLMs in human-centric scene understanding. Compared with other existing benchmarks, our work provides three key features: 1. Diversity in human scene, spanning 4 primary visual domains with 15 secondary domains and 43 sub-fields to ensure broad scenario coverage. 2. Progressive and diverse evaluation dimensions, evaluating the human-based activities progressively from the human-oriented granular perception to the higher-dimensional reasoning, consisting of eight dimensions with 19,945 real-world image question pairs and an evaluation suite. 3. High-quality annotations with rich data paradigms, constructing the automated annotation pipeline and human-annotation platform, supporting rigorous manual labeling to facilitate precise and reliable model assessment. Our benchmark extends the single-target understanding to the multi-person and multi-image mutual understanding by constructing the choice, short-answer, grounding, ranking and judgment question components, and complex questions of their combination. The extensive experiments on 17 state-of-the-art MLLMs effectively expose the limitations and guide future MLLMs research toward better human-centric image understanding. All data and code are available at https://github.com/Yuan-Hou/Human-MME.
☆ Towards Continual Expansion of Data Coverage: Automatic Text-guided Edge-case Synthesis
The performance of deep neural networks is strongly influenced by the quality of their training data. However, mitigating dataset bias by manually curating challenging edge cases remains a major bottleneck. To address this, we propose an automated pipeline for text-guided edge-case synthesis. Our approach employs a Large Language Model, fine-tuned via preference learning, to rephrase image captions into diverse textual prompts that steer a Text-to-Image model toward generating difficult visual scenarios. Evaluated on the FishEye8K object detection benchmark, our method achieves superior robustness, surpassing both naive augmentation and manually engineered prompts. This work establishes a scalable framework that shifts data curation from manual effort to automated, targeted synthesis, offering a promising direction for developing more reliable and continuously improving AI systems. Code is available at https://github.com/gokyeongryeol/ATES.
comment: 17 pages, 6 figures
☆ EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting
Transformer-based models have significantly advanced time series forecasting, with patch-based input strategies offering efficiency and improved long-horizon modeling. Yet, existing approaches rely on temporally-agnostic patch construction, where arbitrary starting positions and fixed lengths fracture temporal coherence by splitting natural transitions across boundaries. This naive segmentation often disrupts short-term dependencies and weakens representation learning. In response, we propose EntroPE (Entropy-Guided Dynamic Patch Encoder), a novel, temporally informed framework that dynamically detects transition points via conditional entropy and dynamically places patch boundaries. This preserves temporal structure while retaining the computational benefits of patching. EntroPE consists of two key modules, namely an Entropy-based Dynamic Patcher (EDP) that applies information-theoretic criteria to locate natural temporal shifts and determine patch boundaries, and an Adaptive Patch Encoder (APE) that employs pooling and cross-attention to capture intra-patch dependencies and produce fixed-size latent representations. These embeddings are then processed by a global transformer to model inter-patch dynamics. Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency, establishing entropy-guided dynamic patching as a promising new paradigm for time series modeling. Code is available at: https://github.com/Sachithx/EntroPE.
comment: Preprint. Under Review
☆ Ordinal Label-Distribution Learning with Constrained Asymmetric Priors for Imbalanced Retinal Grading NeurIPS 2025
Diabetic retinopathy grading is inherently ordinal and long-tailed, with minority stages being scarce, heterogeneous, and clinically critical to detect accurately. Conventional methods often rely on isotropic Gaussian priors and symmetric loss functions, misaligning latent representations with the task's asymmetric nature. We propose the Constrained Asymmetric Prior Wasserstein Autoencoder (CAP-WAE), a novel framework that addresses these challenges through three key innovations. Our approach employs a Wasserstein Autoencoder (WAE) that aligns its aggregate posterior with a asymmetric prior, preserving the heavy-tailed and skewed structure of minority classes. The latent space is further structured by a Margin-Aware Orthogonality and Compactness (MAOC) loss to ensure grade-ordered separability. At the supervision level, we introduce a direction-aware ordinal loss, where a lightweight head predicts asymmetric dispersions to generate soft labels that reflect clinical priorities by penalizing under-grading more severely. Stabilized by an adaptive multi-task weighting scheme, our end-to-end model requires minimal tuning. Across public DR benchmarks, CAP-WAE consistently achieves state-of-the-art Quadratic Weighted Kappa, accuracy, and macro-F1, surpassing both ordinal classification and latent generative baselines. t-SNE visualizations further reveal that our method reshapes the latent manifold into compact, grade-ordered clusters with reduced overlap.
comment: Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
☆ EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Subject-driven generation is a critical task in creative AI; yet current state-of-the-art methods present a stark trade-off. They either rely on computationally expensive, per-subject fine-tuning, sacrificing efficiency and zero-shot capability, or employ feed-forward architectures built on diffusion models, which are inherently plagued by slow inference speeds. Visual Auto-Regressive (VAR) models are renowned for their rapid sampling speeds and strong generative quality, making them an ideal yet underexplored foundation for resolving this tension. To bridge this gap, we introduce EchoGen, a pioneering framework that empowers VAR models with subject-driven generation capabilities. The core design of EchoGen is an effective dual-path injection strategy that disentangles a subject's high-level semantic identity from its low-level fine-grained details, enabling enhanced controllability and fidelity. We employ a semantic encoder to extract the subject's abstract identity, which is injected through decoupled cross-attention to guide the overall composition. Concurrently, a content encoder captures intricate visual details, which are integrated via a multi-modal attention mechanism to ensure high-fidelity texture and structural preservation. To the best of our knowledge, EchoGen is the first feed-forward subject-driven framework built upon VAR models. Both quantitative and qualitative results substantiate our design, demonstrating that EchoGen achieves subject fidelity and image quality comparable to state-of-the-art diffusion-based methods with significantly lower sampling latency. Code and models will be released soon.
☆ EVODiff: Entropy-aware Variance Optimized Diffusion Inference NeurIPS 2025
Diffusion models (DMs) excel in image generation, but suffer from slow inference and the training-inference discrepancies. Although gradient-based solvers like DPM-Solver accelerate the denoising inference, they lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.
comment: NeurIPS 2025, 40 pages, 14 figures
☆ Text-to-Scene with Large Reasoning Models
Prompt-driven scene synthesis allows users to generate complete 3D environments from textual descriptions. Current text-to-scene methods often struggle with complex geometries and object transformations, and tend to show weak adherence to complex instructions. We address these limitations by introducing Reason-3D, a text-to-scene model powered by large reasoning models (LRMs). Reason-3D integrates object retrieval using captions covering physical, functional, and contextual attributes. Reason-3D then places the selected objects based on implicit and explicit layout constraints, and refines their positions with collision-aware spatial reasoning. Evaluated on instructions ranging from simple to complex indoor configurations, Reason-3D significantly outperforms previous methods in human-rated visual fidelity, adherence to constraints, and asset retrieval quality. Beyond its contribution to the field of text-to-scene generation, our work showcases the advanced spatial reasoning abilities of modern LRMs. Additionally, we release the codebase to further the research in object retrieval and placement with LRMs.
☆ Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention
Penalty kicks often decide championships, yet goalkeepers must anticipate the kicker's intent from subtle biomechanical cues within a very short time window. This study introduces a real-time, multi-modal deep learning framework to predict the direction of a penalty kick (left, middle, or right) before ball contact. The model uses a dual-branch architecture: a MobileNetV2-based CNN extracts spatial features from RGB frames, while 2D keypoints are processed by an LSTM network with attention mechanisms. Pose-derived keypoints further guide visual focus toward task-relevant regions. A distance-based thresholding method segments input sequences immediately before ball contact, ensuring consistent input across diverse footage. A custom dataset of 755 penalty kick events was created from real match videos, with frame-level annotations for object detection, shooter keypoints, and final ball placement. The model achieved 89% accuracy on a held-out test set, outperforming visual-only and pose-only baselines by 14-22%. With an inference time of 22 milliseconds, the lightweight and interpretable design makes it suitable for goalkeeper training, tactical analysis, and real-time game analytics.
☆ EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models
Self-supervised models have recently achieved notable advancements, particularly in the domain of semantic occupancy prediction. These models utilize sophisticated loss computation strategies to compensate for the absence of ground-truth labels. For instance, techniques such as novel view synthesis, cross-view rendering, and depth estimation have been explored to address the issue of semantic and depth ambiguity. However, such techniques typically incur high computational costs and memory usage during the training stage, especially in the case of novel view synthesis. To mitigate these issues, we propose 3D pseudo-ground-truth labels generated by the foundation models Grounded-SAM and Metric3Dv2, and harness temporal information for label densification. Our 3D pseudo-labels can be easily integrated into existing models, which yields substantial performance improvements, with mIoU increasing by 45\%, from 9.73 to 14.09, when implemented into the OccNeRF model. This stands in contrast to earlier advancements in the field, which are often not readily transferable to other architectures. Additionally, we propose a streamlined model, EasyOcc, achieving 13.86 mIoU. This model conducts learning solely from our labels, avoiding complex rendering strategies mentioned previously. Furthermore, our method enables models to attain state-of-the-art performance when evaluated on the full scene without applying the camera mask, with EasyOcc achieving 7.71 mIoU, outperforming the previous best model by 31\%. These findings highlight the critical importance of foundation models, temporal context, and the choice of loss computation space in self-supervised learning for comprehensive scene understanding.
☆ Geometric Learning of Canonical Parameterizations of $2D$-curves
Most datasets encountered in computer vision and medical applications present symmetries that should be taken into account in classification tasks. A typical example is the symmetry by rotation and/or scaling in object detection. A common way to build neural networks that learn the symmetries is to use data augmentation. In order to avoid data augmentation and build more sustainable algorithms, we present an alternative method to mod out symmetries based on the notion of section of a principal fiber bundle. This framework allows the use of simple metrics on the space of objects in order to measure dissimilarities between orbits of objects under the symmetry group. Moreover, the section used can be optimized to maximize separation of classes. We illustrate this methodology on a dataset of contours of objects for the groups of translations, rotations, scalings and reparameterizations. In particular, we present a $2$-parameter family of canonical parameterizations of curves, containing the constant-speed parameterization as a special case, which we believe is interesting in its own right. We hope that this simple application will serve to convey the geometric concepts underlying this method, which have a wide range of possible applications. The code is available at the following link: $\href{https://github.com/GiLonga/Geometric-Learning}{https://github.com/GiLonga/Geometric-Learning}$. A tutorial notebook showcasing an application of the code to a specific dataset is available at the following link: $\href{https://github.com/ioanaciuclea/geometric-learning-notebook}{https://github.com/ioanaciuclea/geometric-learning-notebook}$
comment: 30 pages, 18 figures
☆ Multi-modal Liver Segmentation and Fibrosis Staging Using Real-world MRI Images
Liver fibrosis represents the accumulation of excessive extracellular matrix caused by sustained hepatic injury. It disrupts normal lobular architecture and function, increasing the chances of cirrhosis and liver failure. Precise staging of fibrosis for early diagnosis and intervention is often invasive, which carries risks and complications. To address this challenge, recent advances in artificial intelligence-based liver segmentation and fibrosis staging offer a non-invasive alternative. As a result, the CARE 2025 Challenge aimed for automated methods to quantify and analyse liver fibrosis in real-world scenarios, using multi-centre, multi-modal, and multi-phase MRI data. This challenge included tasks of precise liver segmentation (LiSeg) and fibrosis staging (LiFS). In this study, we developed an automated pipeline for both tasks across all the provided MRI modalities. This pipeline integrates pseudo-labelling based on multi-modal co-registration, liver segmentation using deep neural networks, and liver fibrosis staging based on shape, textural, appearance, and directional (STAD) features derived from segmentation masks and MRI images. By solely using the released data with limited annotations, our proposed pipeline demonstrated excellent generalisability for all MRI modalities, achieving top-tier performance across all competition subtasks. This approach provides a rapid and reproducible framework for quantitative MRI-based liver fibrosis assessment, supporting early diagnosis and clinical decision-making. Code is available at https://github.com/YangForever/care2025_liver_biodreamer.
☆ GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts
This paper presents GaussEdit, a framework for adaptive 3D scene editing guided by text and image prompts. GaussEdit leverages 3D Gaussian Splatting as its backbone for scene representation, enabling convenient Region of Interest selection and efficient editing through a three-stage process. The first stage involves initializing the 3D Gaussians to ensure high-quality edits. The second stage employs an Adaptive Global-Local Optimization strategy to balance global scene coherence and detailed local edits and a category-guided regularization technique to alleviate the Janus problem. The final stage enhances the texture of the edited objects using a sophisticated image-to-image synthesis technique, ensuring that the results are visually realistic and align closely with the given prompts. Our experimental results demonstrate that GaussEdit surpasses existing methods in editing accuracy, visual fidelity, and processing speed. By successfully embedding user-specified concepts into 3D scenes, GaussEdit is a powerful tool for detailed and user-driven 3D scene editing, offering significant improvements over traditional methods.
☆ DGM4+: Dataset Extension for Global Scene Inconsistency
The rapid advances in generative models have significantly lowered the barrier to producing convincing multimodal disinformation. Fabricated images and manipulated captions increasingly co-occur to create persuasive false narratives. While the Detecting and Grounding Multi-Modal Media Manipulation (DGM4) dataset established a foundation for research in this area, it is restricted to local manipulations such as face swaps, attribute edits, and caption changes. This leaves a critical gap: global inconsistencies, such as mismatched foregrounds and backgrounds, which are now prevalent in real-world forgeries. To address this, we extend DGM4 with 5,000 high-quality samples that introduce Foreground-Background (FG-BG) mismatches and their hybrids with text manipulations. Using OpenAI's gpt-image-1 and carefully designed prompts, we generate human-centric news-style images where authentic figures are placed into absurd or impossible backdrops (e.g., a teacher calmly addressing students on the surface of Mars). Captions are produced under three conditions: literal, text attribute, and text split, yielding three new manipulation categories: FG-BG, FG-BG+TA, and FG-BG+TS. Quality control pipelines enforce one-to-three visible faces, perceptual hash deduplication, OCR-based text scrubbing, and realistic headline length. By introducing global manipulations, our extension complements existing datasets, creating a benchmark DGM4+ that tests detectors on both local and global reasoning. This resource is intended to strengthen evaluation of multimodal models such as HAMMER, which currently struggle with FG-BG inconsistencies. We release our DGM4+ dataset and generation script at https://github.com/Gaganx0/DGM4plus
comment: 8 pages, 3 figures
Scaling Up Temporal Domain Generalization via Temporal Experts Averaging
Temporal Domain Generalization (TDG) aims to generalize across temporal distribution shifts, e.g., lexical change over time. Prior work often addresses this by predicting future model weights. However, full model prediction is prohibitively expensive for even reasonably sized models. Thus, recent methods only predict the classifier layer, limiting generalization by failing to adjust other model components. To address this, we propose Temporal Experts Averaging (TEA), a novel and scalable TDG framework that updates the entire model using weight averaging to maximize generalization potential while minimizing computational costs. Our theoretical analysis guides us to two steps that enhance generalization to future domains. First, we create expert models with functional diversity yet parameter similarity by fine-tuning a domain-agnostic base model on individual temporal domains while constraining weight changes. Second, we optimize the bias-variance tradeoff through adaptive averaging coefficients derived from modeling temporal weight trajectories in a principal component subspace. Expert's contributions are based on their projected proximity to future domains. Extensive experiments across 7 TDG benchmarks, 5 models, and 2 TDG settings shows TEA outperforms prior TDG methods by up to 69% while being up to 60x more efficient.
comment: Accepted by EMNLP 2025 main
☆ SGS: Segmentation-Guided Scoring for Global Scene Inconsistencies
We extend HAMMER, a state-of-the-art model for multimodal manipulation detection, to handle global scene inconsistencies such as foreground-background (FG-BG) mismatch. While HAMMER achieves strong performance on the DGM4 dataset, it consistently fails when the main subject is contextually misplaced into an implausible background. We diagnose this limitation as a combination of label-space bias, local attention focus, and spurious text-foreground alignment. To remedy this without retraining, we propose a lightweight segmentation-guided scoring (SGS) pipeline. SGS uses person/face segmentation masks to separate foreground and background regions, extracts embeddings with a joint vision-language model, and computes region-aware coherence scores. These scores are fused with HAMMER's original prediction to improve binary detection, grounding, and token-level explanations. SGS is inference-only, incurs negligible computational overhead, and significantly enhances robustness to global manipulations. This work demonstrates the importance of region-aware reasoning in multimodal disinformation detection. We release scripts for segmentation and scoring at https://github.com/Gaganx0/HAMMER-sgs
comment: 6 pages, 3 figures
☆ CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
The integration of Large Language Models (LLMs) with Neural Architecture Search (NAS) has introduced new possibilities for automating the design of neural architectures. However, most existing methods face critical limitations, including architectural invalidity, computational inefficiency, and inferior performance compared to traditional NAS. In this work, we present Collaborative LLM-based NAS (CoLLM-NAS), a two-stage NAS framework with knowledge-guided search driven by two complementary LLMs. Specifically, we propose a Navigator LLM to guide search direction and a Generator LLM to synthesize high-quality candidates, with a dedicated Coordinator module to manage their interaction. CoLLM-NAS efficiently guides the search process by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results. Furthermore, CoLLM-NAS consistently enhances the performance and efficiency of various two-stage NAS methods (e.g., OFA, SPOS, and AutoFormer) across diverse search spaces (e.g., MobileNet, ShuffleNet, and AutoFormer), demonstrating its excellent generalization.
☆ SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP ICLR 2026
While Contrastive Language-Image Pretraining (CLIP) excels at zero-shot tasks by aligning image and text embeddings, its performance in few-shot classification is hindered by a critical limitation: intra-modal misalignment. This issue, caused by a persistent modality gap and CLIP's exclusively inter-modal training objective, leaves the embedding spaces uncalibrated, making direct image-to-image comparisons unreliable. Existing methods attempt to address this by refining similarity logits or by computationally expensive per-sample optimization. To overcome these challenges, we introduce SeMoBridge, a lightweight yet powerful approach that directly addresses the misalignment. Our method maps images into the text modality, while keeping their semantic content intact through what we call a Semantic Modality Bridge. SeMoBridge is closed-form and can optionally be trained through multi-modal supervision, combining image and text-alignment losses to optimize the projection. Experiments show that the trained version, SeMoBridge-T, requires only a fraction of the training time while overall outperforming other methods, particularly in low-data scenarios (1, 2, and 4 shots). The code is available at \href{https://github.com/christti98/semobridge}{github.com/christti98/semobridge}.
comment: 19 pages, 12 figures, Under review as a conference paper at ICLR 2026
☆ Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging
Out-of-distribution (OOD) generalization remains a central challenge in deploying deep learning models to real-world scenarios, particularly in domains such as biomedical images, where distribution shifts are both subtle and pervasive. While existing methods often pursue domain invariance through complex generative models or adversarial training, these approaches may overlook the underlying causal mechanisms of generalization.In this work, we propose Causally-Guided Gaussian Perturbations (CGP)-a lightweight framework that enhances OOD generalization by injecting spatially varying noise into input images, guided by soft causal masks derived from Vision Transformers. By applying stronger perturbations to background regions and weaker ones to foreground areas, CGP encourages the model to rely on causally relevant features rather than spurious correlations.Experimental results on the challenging WILDS benchmark Camelyon17 demonstrate consistent performance gains over state-of-the-art OOD baselines, highlighting the potential of causal perturbation as a tool for reliable and interpretable generalization.
☆ PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution CVPR 2025
Pre-trained video generation models hold great potential for generative video super-resolution (VSR). However, adapting them for full-size VSR, as most existing methods do, suffers from unnecessary intensive full-attention computation and fixed output resolution. To overcome these limitations, we make the first exploration into utilizing video diffusion priors for patch-wise VSR. This is non-trivial because pre-trained video diffusion models are not native for patch-level detail generation. To mitigate this challenge, we propose an innovative approach, called PatchVSR, which integrates a dual-stream adapter for conditional guidance. The patch branch extracts features from input patches to maintain content fidelity while the global branch extracts context features from the resized full video to bridge the generation gap caused by incomplete semantics of patches. Particularly, we also inject the patch's location information into the model to better contextualize patch synthesis within the global video frame. Experiments demonstrate that our method can synthesize high-fidelity, high-resolution details at the patch level. A tailor-made multi-patch joint modulation is proposed to ensure visual consistency across individually enhanced patches. Due to the flexibility of our patch-based paradigm, we can achieve highly competitive 4K VSR based on a 512x512 resolution base model, with extremely high efficiency.
comment: CVPR 2025
☆ GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data NeurIPS 2025
Integrating ground-level geospatial data with rich geographic context, like OpenStreetMap (OSM), into remote sensing (RS) foundation models (FMs) is essential for advancing geospatial intelligence and supporting a broad spectrum of tasks. However, modality gap between RS and OSM data, including differences in data structure, content, and spatial granularity, makes effective synergy highly challenging, and most existing RS FMs focus on imagery alone. To this end, this study presents GeoLink, a multimodal framework that leverages OSM data to enhance RS FM during both the pretraining and downstream task stages. Specifically, GeoLink enhances RS self-supervised pretraining using multi-granularity learning signals derived from OSM data, guided by cross-modal spatial correlations for information interaction and collaboration. It also introduces image mask-reconstruction to enable sparse input for efficient pretraining. For downstream tasks, GeoLink generates both unimodal and multimodal fine-grained encodings to support a wide range of applications, from common RS interpretation tasks like land cover classification to more comprehensive geographic tasks like urban function zone mapping. Extensive experiments show that incorporating OSM data during pretraining enhances the performance of the RS image encoder, while fusing RS and OSM data in downstream tasks improves the FM's adaptability to complex geographic scenarios. These results underscore the potential of multimodal synergy in advancing high-level geospatial artificial intelligence. Moreover, we find that spatial correlation plays a crucial role in enabling effective multimodal geospatial data integration. Code, checkpoints, and using examples are released at https://github.com/bailubin/GeoLink_NeurIPS2025
comment: NeurIPS 2025
☆ SETR: A Two-Stage Semantic-Enhanced Framework for Zero-Shot Composed Image Retrieval
Zero-shot Composed Image Retrieval (ZS-CIR) aims to retrieve a target image given a reference image and a relative text, without relying on costly triplet annotations. Existing CLIP-based methods face two core challenges: (1) union-based feature fusion indiscriminately aggregates all visual cues, carrying over irrelevant background details that dilute the intended modification, and (2) global cosine similarity from CLIP embeddings lacks the ability to resolve fine-grained semantic relations. To address these issues, we propose SETR (Semantic-enhanced Two-Stage Retrieval). In the coarse retrieval stage, SETR introduces an intersection-driven strategy that retains only the overlapping semantics between the reference image and relative text, thereby filtering out distractors inherent to union-based fusion and producing a cleaner, high-precision candidate set. In the fine-grained re-ranking stage, we adapt a pretrained multimodal LLM with Low-Rank Adaptation to conduct binary semantic relevance judgments ("Yes/No"), which goes beyond CLIP's global feature matching by explicitly verifying relational and attribute-level consistency. Together, these two stages form a complementary pipeline: coarse retrieval narrows the candidate pool with high recall, while re-ranking ensures precise alignment with nuanced textual modifications. Experiments on CIRR, Fashion-IQ, and CIRCO show that SETR achieves new state-of-the-art performance, improving Recall@1 on CIRR by up to 15.15 points. Our results establish two-stage reasoning as a general paradigm for robust and portable ZS-CIR.
☆ New Fourth-Order Grayscale Indicator-Based Telegraph Diffusion Model for Image Despeckling
Second-order PDE models have been widely used for suppressing multiplicative noise, but they often introduce blocky artifacts in the early stages of denoising. To resolve this, we propose a fourth-order nonlinear PDE model that integrates diffusion and wave properties. The diffusion process, guided by both the Laplacian and intensity values, reduces noise better than gradient-based methods, while the wave part keeps fine details and textures. The effectiveness of the proposed model is evaluated against two second-order anisotropic diffusion approaches using the Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index (MSSIM) for images with available ground truth. For SAR images, where a noise-free reference is unavailable, the Speckle Index (SI) is used to measure noise reduction. Additionally, we extend the proposed model to study color images by applying the denoising process independently to each channel, preserving both structure and color consistency. The same quantitative metrics PSNR and MSSIM are used for performance evaluation, ensuring a fair comparison across grayscale and color images. In all the cases, our computed results produce better results compared to existing models in this genre.
PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion
In this paper, we present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth. Our key insight is to exploit the complementary characteristics of pinhole and fisheye imagery (undistorted vs. distorted, small vs. large FOV, far vs. near field) for joint optimization. PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics. Within PFDepth, we first explicitly lift 2D features from each heterogeneous view into a canonical 3D volumetric space. Then, a core module termed Heterogeneous Spatial Fusion is designed to process and fuse distortion-aware volumetric features across overlapping and non-overlapping regions. Additionally, we subtly reformulate the conventional voxel fusion into a novel 3D Gaussian representation, in which learnable latent Gaussian spheres dynamically adapt to local image textures for finer 3D aggregation. Finally, fused volume features are rendered into multi-view depth maps. Through extensive experiments, we demonstrate that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks. To the best of our knowledge, this is the first systematic study of heterogeneous pinhole-fisheye depth estimation, offering both technical novelty and valuable empirical insights.
comment: Accepted by ACM MM 2025 Conference
☆ AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment
Image quality assessment (IQA) is inherently complex, as it reflects both the quantification and interpretation of perceptual quality rooted in the human visual system. Conventional approaches typically rely on fixed models to output scalar scores, limiting their adaptability to diverse distortions, user-specific queries, and interpretability needs. Furthermore, scoring and interpretation are often treated as independent processes, despite their interdependence: interpretation identifies perceptual degradations, while scoring abstracts them into a compact metric. To address these limitations, we propose AgenticIQA, a modular agentic framework that integrates vision-language models (VLMs) with traditional IQA tools in a dynamic, query-aware manner. AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution -- coordinated by a planner, executor, and summarizer. The planner formulates task-specific strategies, the executor collects perceptual evidence via tool invocation, and the summarizer integrates this evidence to produce accurate scores with human-aligned explanations. To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents. Extensive experiments across diverse IQA datasets demonstrate that AgenticIQA consistently surpasses strong baselines in both scoring accuracy and explanatory alignment.
☆ Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations
Pixel-level recognition of objects manipulated by the user from egocentric images enables key applications spanning assistive technologies, industrial safety, and activity monitoring. However, progress in this area is currently hindered by the scarcity of annotated datasets, as existing approaches rely on costly manual labels. In this paper, we propose to learn human-object interaction detection leveraging narrations -- natural language descriptions of the actions performed by the camera wearer which contain clues about manipulated objects (e.g., "I am pouring vegetables from the chopping board to the pan"). Narrations provide a form of weak supervision that is cheap to acquire and readily available in state-of-the-art egocentric datasets. We introduce Narration-Supervised in-Hand Object Segmentation (NS-iHOS), a novel task where models have to learn to segment in-hand objects by learning from natural-language narrations. Narrations are then not employed at inference time. We showcase the potential of the task by proposing Weakly-Supervised In-hand Object Segmentation from Human Narrations (WISH), an end-to-end model distilling knowledge from narrations to learn plausible hand-object associations and enable in-hand object segmentation without using narrations at test time. We benchmark WISH against different baselines based on open-vocabulary object detectors and vision-language models, showing the superiority of its design. Experiments on EPIC-Kitchens and Ego4D show that WISH surpasses all baselines, recovering more than 50% of the performance of fully supervised methods, without employing fine-grained pixel-wise annotations.
☆ VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
In light of recent progress in video editing, deep learning models focusing on both spatial and temporal dependencies have emerged as the primary method. However, these models suffer from the quadratic computational complexity of traditional attention mechanisms, making them difficult to adapt to long-duration and high-resolution videos. This limitation restricts their applicability in practical contexts such as real-time video processing. To tackle this challenge, we introduce a method to reduce both time and space complexity of these systems by proposing VRWKV-Editor, a novel video editing model that integrates a linear spatio-temporal aggregation module into video-based diffusion models. VRWKV-Editor leverages bidirectional weighted key-value recurrence mechanism of the RWKV transformer to capture global dependencies while preserving temporal coherence, achieving linear complexity without sacrificing quality. Extensive experiments demonstrate that the proposed method achieves up to 3.7x speedup and 60% lower memory usage compared to state-of-the-art diffusion-based video editing methods, while maintaining competitive performance in frame consistency and text alignment. Furthermore, a comparative analysis we conducted on videos with different sequence lengths confirms that the gap in editing speed between our approach and architectures with self-attention becomes more significant with long videos.
☆ Towards Reliable and Holistic Visual In-Context Learning Prompt Selection NeurIPS 2025
Visual In-Context Learning (VICL) has emerged as a prominent approach for adapting visual foundation models to novel tasks, by effectively exploiting contextual information embedded in in-context examples, which can be formulated as a global ranking problem of potential candidates. Current VICL methods, such as Partial2Global and VPR, are grounded in the similarity-priority assumption that images more visually similar to a query image serve as better in-context examples. This foundational assumption, while intuitive, lacks sufficient justification for its efficacy in selecting optimal in-context examples. Furthermore, Partial2Global constructs its global ranking from a series of randomly sampled pairwise preference predictions. Such a reliance on random sampling can lead to incomplete coverage and redundant samplings of comparisons, thus further adversely impacting the final global ranking. To address these issues, this paper introduces an enhanced variant of Partial2Global designed for reliable and holistic selection of in-context examples in VICL. Our proposed method, dubbed RH-Partial2Global, leverages a jackknife conformal prediction-guided strategy to construct reliable alternative sets and a covering design-based sampling approach to ensure comprehensive and uniform coverage of pairwise preferences. Extensive experiments demonstrate that RH-Partial2Global achieves excellent performance and outperforms Partial2Global across diverse visual tasks.
comment: Accepted by NeurIPS 2025
☆ PinPoint3D: Fine-Grained 3D Part Segmentation from a Few Clicks
Fine-grained 3D part segmentation is crucial for enabling embodied AI systems to perform complex manipulation tasks, such as interacting with specific functional components of an object. However, existing interactive segmentation methods are largely confined to coarse, instance-level targets, while non-interactive approaches struggle with sparse, real-world scans and suffer from a severe lack of annotated data. To address these limitations, we introduce PinPoint3D, a novel interactive framework for fine-grained, multi-granularity 3D segmentation, capable of generating precise part-level masks from only a few user point clicks. A key component of our work is a new 3D data synthesis pipeline that we developed to create a large-scale, scene-level dataset with dense part annotations, overcoming a critical bottleneck that has hindered progress in this field. Through comprehensive experiments and user studies, we demonstrate that our method significantly outperforms existing approaches, achieving an average IoU of around 55.8% on each object part under first-click settings and surpassing 71.3% IoU with only a few additional clicks. Compared to current state-of-the-art baselines, PinPoint3D yields up to a 16% improvement in IoU and precision, highlighting its effectiveness on challenging, sparse point clouds with high efficiency. Our work represents a significant step towards more nuanced and precise machine perception and interaction in complex 3D environments.
comment: 15 pages, 12 figures, conference
☆ A Multi-purpose Tracking Framework for Salmon Welfare Monitoring in Challenging Environments ICCV 2025
Computer Vision (CV)-based continuous, automated and precise salmon welfare monitoring is a key step toward reduced salmon mortality and improved salmon welfare in industrial aquaculture net pens. Available CV methods for determining welfare indicators focus on single indicators and rely on object detectors and trackers from other application areas to aid their welfare indicator calculation algorithm. This comes with a high resource demand for real-world applications, since each indicator must be calculated separately. In addition, the methods are vulnerable to difficulties in underwater salmon scenes, such as object occlusion, similar object appearance, and similar object motion. To address these challenges, we propose a flexible tracking framework that uses a pose estimation network to extract bounding boxes around salmon and their corresponding body parts, and exploits information about the body parts, through specialized modules, to tackle challenges specific to underwater salmon scenes. Subsequently, the high-detail body part tracks are employed to calculate welfare indicators. We construct two novel datasets assessing two salmon tracking challenges: salmon ID transfers in crowded scenes and salmon ID switches during turning. Our method outperforms the current state-of-the-art pedestrian tracker, BoostTrack, for both salmon tracking challenges. Additionally, we create a dataset for calculating salmon tail beat wavelength, demonstrating that our body part tracking method is well-suited for automated welfare monitoring based on tail beat analysis. Datasets and code are available at https://github.com/espenbh/BoostCompTrack.
comment: Accepted to the Joint Workshop on Marine Vision 2025 (CVAUI & AAMVEM), held in conjunction with ICCV 2025
☆ Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images, anchored in explicit visual evidence to improve interpretability and facilitate integration into clinical workflows. However, existing methods often rely on separately trained detection modules that require extensive expert annotations, introducing high labeling costs and limiting generalizability due to pathology distribution bias across datasets. To address these challenges, we propose Self-Supervised Anatomical Consistency Learning (SS-ACL) -- a novel and annotation-free framework that aligns generated reports with corresponding anatomical regions using simple textual prompts. SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy, organizing entities by spatial location. It recursively reconstructs fine-grained anatomical regions to enforce intra-sample spatial alignment, inherently guiding attention maps toward visually relevant areas prompted by text. To further enhance inter-sample semantic alignment for abnormality recognition, SS-ACL introduces a region-level contrastive learning based on anatomical consistency. These aligned embeddings serve as priors for report generation, enabling attention maps to provide interpretable visual evidence. Extensive experiments demonstrate that SS-ACL, without relying on expert annotations, (i) generates accurate and visually grounded reports -- outperforming state-of-the-art methods by 10\% in lexical accuracy and 25\% in clinical efficacy, and (ii) achieves competitive performance on various downstream visual tasks, surpassing current leading visual foundation models by 8\% in zero-shot visual grounding.
☆ CO3: Contrasting Concepts Compose Better
We propose to improve multi-concept prompt fidelity in text-to-image diffusion models. We begin with common failure cases-prompts like "a cat and a dog" that sometimes yields images where one concept is missing, faint, or colliding awkwardly with another. We hypothesize that this happens when the diffusion model drifts into mixed modes that over-emphasize a single concept it learned strongly during training. Instead of re-training, we introduce a corrective sampling strategy that steers away from regions where the joint prompt behavior overlaps too strongly with any single concept in the prompt. The goal is to steer towards "pure" joint modes where all concepts can coexist with balanced visual presence. We further show that existing multi-concept guidance schemes can operate in unstable weight regimes that amplify imbalance; we characterize favorable regions and adapt sampling to remain within them. Our approach, CO3, is plug-and-play, requires no model tuning, and complements standard classifier-free guidance. Experiments on diverse multi-concept prompts indicate improvements in concept coverage, balance and robustness, with fewer dropped or distorted concepts compared to standard baselines and prior compositional methods. Results suggest that lightweight corrective guidance can substantially mitigate brittle semantic alignment behavior in modern diffusion systems.
☆ UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression
Existing anomaly detection (AD) methods often treat the modality and class as independent factors. Although this paradigm has enriched the development of AD research branches and produced many specialized models, it has also led to fragmented solutions and excessive memory overhead. Moreover, reconstruction-based multi-class approaches typically rely on shared decoding paths, which struggle to handle large variations across domains, resulting in distorted normality boundaries, domain interference, and high false alarm rates. To address these limitations, we propose UniMMAD, a unified framework for multi-modal and multi-class anomaly detection. At the core of UniMMAD is a Mixture-of-Experts (MoE)-driven feature decompression mechanism, which enables adaptive and disentangled reconstruction tailored to specific domains. This process is guided by a ``general to specific'' paradigm. In the encoding stage, multi-modal inputs of varying combinations are compressed into compact, general-purpose features. The encoder incorporates a feature compression module to suppress latent anomalies, encourage cross-modal interaction, and avoid shortcut learning. In the decoding stage, the general features are decompressed into modality-specific and class-specific forms via a sparsely-gated cross MoE, which dynamically selects expert pathways based on input modality and class. To further improve efficiency, we design a grouped dynamic filtering mechanism and a MoE-in-MoE structure, reducing parameter usage by 75\% while maintaining sparse activation and fast inference. UniMMAD achieves state-of-the-art performance on 9 anomaly detection datasets, spanning 3 fields, 12 modalities, and 66 classes. The source code will be available at https://github.com/yuanzhao-CVLAB/UniMMAD.
comment: manuscript
☆ From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks
Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes.
☆ The Impact of Scaling Training Data on Adversarial Robustness NeurIPS 2025
Deep neural networks remain vulnerable to adversarial examples despite advances in architectures and training paradigms. We investigate how training data characteristics affect adversarial robustness across 36 state-of-the-art vision models spanning supervised, self-supervised, and contrastive learning approaches, trained on datasets from 1.2M to 22B images. Models were evaluated under six black-box attack categories: random perturbations, two types of geometric masks, COCO object manipulations, ImageNet-C corruptions, and ImageNet-R style shifts. Robustness follows a logarithmic scaling law with both data volume and model size: a tenfold increase in data reduces attack success rate (ASR) on average by ~3.2%, whereas a tenfold increase in model size reduces ASR on average by ~13.4%. Notably, some self-supervised models trained on curated datasets, such as DINOv2, outperform others trained on much larger but less curated datasets, challenging the assumption that scale alone drives robustness. Adversarial fine-tuning of ResNet50s improves generalization across structural variations but not across color distributions. Human evaluation reveals persistent gaps between human and machine vision. These results show that while scaling improves robustness, data quality, architecture, and training objectives play a more decisive role than raw scale in achieving broad-spectrum adversarial resilience.
comment: Accepted at the workshop Reliable ML from Unreliable Data at NeurIPS 2025
♻ ☆ Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
The impact of misinformation arises not only from factual inaccuracies but also from the misleading narratives that creators deliberately embed. Interpreting such creator intent is therefore essential for multimodal misinformation detection (MMD) and effective information governance. To this end, we introduce DeceptionDecoded, a large-scale benchmark of 12,000 image-caption pairs grounded in trustworthy reference articles, created using an intent-guided simulation framework that models both the desired influence and the execution plan of news creators. The dataset captures both misleading and non-misleading cases, spanning manipulations across visual and textual modalities, and supports three intent-centric tasks: (1) misleading intent detection, (2) misleading source attribution, and (3) creator desire inference. We evaluate 14 state-of-the-art vision-language models (VLMs) and find that they struggle with intent reasoning, often relying on shallow cues such as surface-level alignment, stylistic polish, or heuristic authenticity signals. These results highlight the limitations of current VLMs and position DeceptionDecoded as a foundation for developing intent-aware models that go beyond shallow cues in MMD.
♻ ☆ Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
Intention recognition has traditionally focused on individual intentions, overlooking the complexities of collective intentions in group settings. To address this limitation, we introduce the concept of group intention, which represents shared goals emerging through the actions of multiple individuals, and Group Intention Forecasting (GIF), a novel task that forecasts when group intentions will occur by analyzing individual actions and interactions before the collective goal becomes apparent. To investigate GIF in a specific scenario, we propose SHOT, the first large-scale dataset for GIF, consisting of 1,979 basketball video clips captured from 5 camera views and annotated with 6 types of individual attributes. SHOT is designed with 3 key characteristics: multi-individual information, multi-view adaptability, and multi-level intention, making it well-suited for studying emerging group intentions. Furthermore, we introduce GIFT (Group Intention ForecasTer), a framework that extracts fine-grained individual features and models evolving group dynamics to forecast intention emergence. Experimental results confirm the effectiveness of SHOT and GIFT, establishing a strong foundation for future research in group intention forecasting. The dataset is available at https://xinyi-hu.github.io/SHOT_DATASET.
comment: ACMMM 2025 Datasets Track
♻ ☆ ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.
♻ ☆ VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2$\times$ speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.
comment: 32 pages, 5 figures, 13 tables
♻ ☆ BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation ICCV 2025
This paper presents a generalizable RGB-based approach for object pose estimation, specifically designed to address challenges in sparse-view settings. While existing methods can estimate the poses of unseen objects, their generalization ability remains limited in scenarios involving occlusions and sparse reference views, restricting their real-world applicability. To overcome these limitations, we introduce corner points of the object bounding box as an intermediate representation of the object pose. The 3D object corners can be reliably recovered from sparse input views, while the 2D corner points in the target view are estimated through a novel reference-based point synthesizer, which works well even in scenarios involving occlusions. As object semantic points, object corners naturally establish 2D-3D correspondences for object pose estimation with a PnP algorithm. Extensive experiments on the YCB-Video and Occluded-LINEMOD datasets show that our approach outperforms state-of-the-art methods, highlighting the effectiveness of the proposed representation and significantly enhancing the generalization capabilities of object pose estimation, which is crucial for real-world applications.
comment: ICCV 2025 Camera Ready, Project page: https://zju3dv.github.io/boxdreamer
♻ ☆ FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation
Existing periodic activation-based implicit neural representation (INR) networks, such as SIREN and FINER, suffer from hidden feature redundancy, where neurons within a layer capture overlapping frequency components due to the use of a fixed frequency multiplier. This redundancy limits the expressive capacity of multilayer perceptrons (MLPs). Drawing inspiration from classical signal processing methods such as the Discrete Sine Transform (DST), we propose FM-SIREN and FM-FINER, which assign Nyquist-informed, neuron-specific frequency multipliers to periodic activations. Unlike existing approaches, our design introduces frequency diversity without requiring hyperparameter tuning or additional network depth. This simple yet principled modification reduces the redundancy of features by nearly 50% and consistently improves signal reconstruction across diverse INR tasks, including fitting 1D audio, 2D image and 3D shape, and synthesis of neural radiance fields (NeRF), outperforming their baseline counterparts while maintaining efficiency.
♻ ☆ Unlocking Transfer Learning for Open-World Few-Shot Recognition NeurIPS 2025
Few-Shot Open-Set Recognition (FSOSR) targets a critical real-world challenge, aiming to categorize inputs into known categories, termed closed-set classes, while identifying open-set inputs that fall outside these classes. Although transfer learning where a model is tuned to a given few-shot task has become a prominent paradigm in closed-world, we observe that it fails to expand to open-world. To unlock this challenge, we propose a two-stage method which consists of open-set aware meta-learning with open-set free transfer learning. In the open-set aware meta-learning stage, a model is trained to establish a metric space that serves as a beneficial starting point for the subsequent stage. During the open-set free transfer learning stage, the model is further adapted to a specific target task through transfer learning. Additionally, we introduce a strategy to simulate open-set examples by modifying the training dataset or generating pseudo open-set examples. The proposed method achieves state-of-the-art performance on two widely recognized benchmarks, miniImageNet and tieredImageNet, with only a 1.5\% increase in training effort. Our work demonstrates the effectiveness of transfer learning in FSOSR.
comment: Accepted at NeurIPS 2025 workshop
♻ ☆ Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery NeurIPS 2025
Inspired by the human ability to learn and organize knowledge into hierarchical taxonomies with prototypes, this paper addresses key limitations in current deep hierarchical clustering methods. Existing methods often tie the structure to the number of classes and underutilize the rich prototype information available at intermediate hierarchical levels. We introduce deep taxonomic networks, a novel deep latent variable approach designed to bridge these gaps. Our method optimizes a large latent taxonomic hierarchy, specifically a complete binary tree structured mixture-of-Gaussian prior within a variational inference framework, to automatically discover taxonomic structures and associated prototype clusters directly from unlabeled data without assuming true label sizes. We analytically show that optimizing the ELBO of our method encourages the discovery of hierarchical relationships among prototypes. Empirically, our learned models demonstrate strong hierarchical clustering performance, outperforming baselines across diverse image classification datasets using our novel evaluation mechanism that leverages prototype clusters discovered at all hierarchical levels. Qualitative results further reveal that deep taxonomic networks discover rich and interpretable hierarchical taxonomies, capturing both coarse-grained semantic categories and fine-grained visual distinctions.
comment: NeurIPS 2025
♻ ☆ CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary
Large-scale text-to-image (T2I) diffusion models have achieved remarkable generative performance about various concepts. With the limitation of privacy and safety in practice, the generative capability concerning NSFW (Not Safe For Work) concepts is undesirable, e.g., producing sexually explicit photos, and licensed images. The concept erasure task for T2I diffusion models has attracted considerable attention and requires an effective and efficient method. To achieve this goal, we propose a CE-SDWV framework, which removes the target concepts (e.g., NSFW concepts) of T2I diffusion models in the text semantic space by only adjusting the text condition tokens and does not need to re-train the original T2I diffusion model's weights. Specifically, our framework first builds a target concept-related word vocabulary to enhance the representation of the target concepts within the text semantic space, and then utilizes an adaptive semantic component suppression strategy to ablate the target concept-related semantic information in the text condition tokens. To further adapt the above text condition tokens to the original image semantic space, we propose an end-to-end gradient-orthogonal token optimization strategy. Extensive experiments on I2P and UnlearnCanvas benchmarks demonstrate the effectiveness and efficiency of our method. Code is available at https://github.com/TtuHamg/CE-SDWV.
comment: 25 pages, 14 figures
♻ ☆ Photography Perspective Composition: Towards Aesthetic Perspective Recommendation NeurIPS25
Traditional photography composition approaches are dominated by 2D cropping-based methods. However, these methods fall short when scenes contain poorly arranged subjects. Professional photographers often employ perspective adjustment as a form of 3D recomposition, modifying the projected 2D relationships between subjects while maintaining their actual spatial positions to achieve better compositional balance. Inspired by this artistic practice, we propose photography perspective composition (PPC), extending beyond traditional cropping-based methods. However, implementing the PPC faces significant challenges: the scarcity of perspective transformation datasets and undefined assessment criteria for perspective quality. To address these challenges, we present three key contributions: (1) An automated framework for building PPC datasets through expert photographs. (2) A video generation approach that demonstrates the transformation process from less favorable to aesthetically enhanced perspectives. (3) A perspective quality assessment (PQA) model constructed based on human performance. Our approach is concise and requires no additional prompt instructions or camera trajectories, helping and guiding ordinary users to enhance their composition skills.
comment: Accepted at NeurIPS25
♻ ☆ Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
Advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. We introduce CT-RATE, a public dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in multi-abnormality detection and case retrieval, and outperforms state-of-the-art fully supervised models across all key metrics. By combining CT-CLIP's vision encoder with a pretrained large language model, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT underscores the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging but also lays the groundwork for future innovations in medical AI and improved patient care.
♻ ☆ A Survey on SAR ship classification using Deep Learning
Deep learning (DL) has emerged as a powerful tool for Synthetic Aperture Radar (SAR) ship classification. This survey comprehensively analyzes the diverse DL techniques employed in this domain. We identify critical trends and challenges, highlighting the importance of integrating handcrafted features, utilizing public datasets, data augmentation, fine-tuning, explainability techniques, and fostering interdisciplinary collaborations to improve DL model performance. This survey establishes a first-of-its-kind taxonomy for categorizing relevant research based on DL models, handcrafted feature use, SAR attribute utilization, and the impact of fine-tuning. We discuss the methodologies used in SAR ship classification tasks and the impact of different techniques. Finally, the survey explores potential avenues for future research, including addressing data scarcity, exploring novel DL architectures, incorporating interpretability techniques, and establishing standardized performance metrics. By addressing these challenges and leveraging advancements in DL, researchers can contribute to developing more accurate and efficient ship classification systems, ultimately enhancing maritime surveillance and related applications.
comment: Submitted to JSTARS journal
♻ ☆ RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent
With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field generation from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field generation while maintaining rendering quality. Based on this insight, we introduce EffLiFe, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse view images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further sparsely optimized by focusing only on important MPI gradients in a few iterations. Nevertheless, relying solely on optimization can lead to artifacts at occlusion boundaries. Therefore, we propose an occlusion-aware iterative refinement module that removes visual artifacts in occluded regions by iteratively filtering the input. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivering better performance (about 2 dB higher in PSNR) compared to other online approaches.
comment: Submitted to IEEE TPAMI
♻ ☆ Multi-temporal crack segmentation in concrete structures using deep learning approaches
Cracks are among the earliest indicators of deterioration in concrete structures. Early automatic detection of these cracks can significantly extend the lifespan of critical infrastructures, such as bridges, buildings, and tunnels, while simultaneously reducing maintenance costs and facilitating efficient structural health monitoring. This study investigates whether leveraging multi-temporal data for crack segmentation can enhance segmentation quality. Therefore, we compare a Swin UNETR trained on multi-temporal data with a U-Net trained on mono-temporal data to assess the effect of temporal information compared with conventional single-epoch approaches. To this end, a multi-temporal dataset comprising 1356 images, each with 32 sequential crack propagation images, was created. After training the models, experiments were conducted to analyze their generalization ability, temporal consistency, and segmentation quality. The multi-temporal approach consistently outperformed its mono-temporal counterpart, achieving an IoU of $82.72\%$ and a F1-score of $90.54\%$, representing a significant improvement over the mono-temporal model's IoU of $76.69\%$ and F1-score of $86.18\%$, despite requiring only half of the trainable parameters. The multi-temporal model also displayed a more consistent segmentation quality, with reduced noise and fewer errors. These results suggest that temporal information significantly enhances the performance of segmentation models, offering a promising solution for improved crack detection and the long-term monitoring of concrete structures, even with limited sequential data.
♻ ☆ EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
comment: Code, Models and benchmark will be publicly available at https://github.com/VectorSpaceLab/EditScore
♻ ☆ Octic Vision Transformers: Quicker ViTs Through Equivariance
Why are state-of-the-art Vision Transformers (ViTs) not designed to exploit natural geometric symmetries such as 90-degree rotations and reflections? In this paper, we argue that there is no fundamental reason, and what has been missing is an efficient implementation. To this end, we introduce Octic Vision Transformers (octic ViTs) which rely on octic group equivariance to capture these symmetries. In contrast to prior equivariant models that increase computational cost, our octic linear layers achieve 5.33x reductions in FLOPs and up to 8x reductions in memory compared to ordinary linear layers. In full octic ViT blocks the computational reductions approach the reductions in the linear layers with increased embedding dimension. We study two new families of ViTs, built from octic blocks, that are either fully octic equivariant or break equivariance in the last part of the network. Training octic ViTs supervised (DeiT-III) and unsupervised (DINOv2) on ImageNet-1K, we find that they match baseline accuracy while at the same time providing substantial efficiency gains.
♻ ☆ CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
Modern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade novel view synthesis. Joint optimization of scene-specific representations and per-image appearance embeddings has been proposed to address this issue, but with increased computational complexity and slower training. In this work, we propose a generalizable, feed-forward approach that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner. Our model processes hundreds of frames in a single step, enabling efficient large-scale harmonization, and seamlessly integrates into downstream 3D reconstruction models, providing cross-scene generalization without requiring scene-specific retraining. To overcome the lack of paired data, we employ a hybrid self-supervised rendering loss leveraging 3D foundation models, improving generalization to real-world variations. Extensive experiments show that our approach outperforms or matches the reconstruction quality of existing scene-specific optimization methods with appearance modeling, without significantly affecting the training time of baseline 3D models.
Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion
Remote sensing pansharpening aims to reconstruct spatial-spectral properties during the fusion of panchromatic (PAN) images and low-resolution multi-spectral (LR-MS) images, finally generating the high-resolution multi-spectral (HR-MS) images. Although deep learning-based models have achieved excellent performance, they often come with high computational complexity, which hinder their applications on resource-limited devices. In this paper, we explore the feasibility of applying the binary neural network (BNN) to pan-sharpening. Nevertheless, there are two main issues with binarizing pan-sharpening models: (i) the binarization will cause serious spectral distortion due to the inconsistent spectral distribution of the PAN/LR-MS images; (ii) the common binary convolution kernel is difficult to adapt to the multi-scale and anisotropic spatial features of remote sensing objects, resulting in serious degradation of contours. To address the above issues, we design the customized spatial-spectral binarized convolution (S2B-Conv), which is composed of the Spectral-Redistribution Mechanism (SRM) and Gabor Spatial Feature Amplifier (GSFA). Specifically, SRM employs an affine transformation, generating its scaling and bias parameters through a dynamic learning process. GSFA, which randomly selects different frequencies and angles within a preset range, enables to better handle multi-scale and-directional spatial features. A series of S2B-Conv form a brand-new binary network for pan-sharpening, dubbed as S2BNet. Extensive quantitative and qualitative experiments have shown our high-efficiency binarized pan-sharpening method can attain a promising performance.
♻ ☆ Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models NeurIPS 2025
Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose RAIF, a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to a 8B LLM. Evaluation on OOD constraints also confirms the generalizability of our RAIF. Codes and data are available at https://github.com/yuleiqin/RAIF. Keywords: reinforcement learning with verifiable rewards (RLVR), instruction following, complex instructions
comment: Accepted to NeurIPS 2025; 15 pages of main body, 5 tables, 5 figures, 42 pages of appendix
♻ ☆ BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at https://dingning-liu.github.io/bridge.github.io/.
comment: 20 pages, 7 figures
♻ ☆ CODA: Repurposing Continuous VAEs for Discrete Tokenization
Discrete visual tokenizers transform images into a sequence of tokens, enabling token-based visual generation akin to language models. However, this process is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. Traditional discrete tokenizers typically learn the two tasks jointly, often leading to unstable training, low codebook utilization, and limited reconstruction quality. In this paper, we introduce \textbf{CODA}(\textbf{CO}ntinuous-to-\textbf{D}iscrete \textbf{A}daptation), a framework that decouples compression and discretization. Instead of training discrete tokenizers from scratch, CODA adapts off-the-shelf continuous VAEs -- already optimized for perceptual compression -- into discrete tokenizers via a carefully designed discretization process. By primarily focusing on discretization, CODA ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs. Empirically, with $\mathbf{6 \times}$ less training budget than standard VQGAN, our approach achieves a remarkable codebook utilization of 100% and notable reconstruction FID (rFID) of $\mathbf{0.43}$ and $\mathbf{1.34}$ for $8 \times$ and $16 \times$ compression on ImageNet 256$\times$ 256 benchmark.
comment: Project page: https://lzy-tony.github.io/coda
♻ ☆ Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance that often surpasses SSL approaches due to their exceptional generalization capabilities. This gap motivates us to question: how can we effectively harness the powerful generalization capabilities of VLMs into task-specific models? Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but we identify that it suffers from gradient conflicts between supervised and distillation losses. To address this challenge, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal. We observe that DHO resolves gradient conflicts, enabling improved feature learning compared to single-head KD baselines, with practical benefits of minimal computational overhead and test-time hyperparameter tuning without retraining. Extensive experiments across 15 datasets show that DHO consistently outperforms KD baselines, often outperforming teacher models with smaller student models. DHO also achieves new state-of-the-art performance on both in-distribution ImageNet semi-supervised learning and out-of-distribution generalization across ImageNet variants. We publicly release our code and model checkpoints to facilitate future research at https://github.com/erjui/DHO.
comment: 38 pages, 17 figures, preprint
♻ ☆ Scaling RL to Long Videos NeurIPS 2025
We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. We address the unique challenges of long video reasoning by integrating three critical components: (1) a large-scale dataset, LongVideo-Reason, comprising 104K long video QA pairs with high-quality reasoning annotations across diverse domains such as sports, games, and vlogs; (2) a two-stage training pipeline that extends VLMs with chain-of-thought supervised fine-tuning (CoT-SFT) and reinforcement learning (RL); and (3) a training infrastructure for long video RL, named Multi-modal Reinforcement Sequence Parallelism (MR-SP), which incorporates sequence parallelism and a vLLM-based engine tailored for long video, using cached video embeddings for efficient rollout and prefilling. In our experiments, LongVILA-R1-7B achieves strong performance on video benchmarks, reaching 65.1% and 71.1% accuracy on VideoMME without and with subtitles, respectively, and consistently outperforming LongVILA-7B across multiple benchmarks. Moreover, LongVILA-R1-7B supports processing up to 8,192 video frames per video, and configurable FPS settings. Notably, our MR-SP system achieves up to 2.1x speedup on long video RL training. In addition, we release our training system for public availability that supports RL training on various modalities (video, text, and audio), various models (VILA and Qwen series), and even image and video generation models. On a single A100 node (8 GPUs), it supports RL training on hour-long videos (e.g., 3,600 frames).
comment: Accepted by NeurIPS 2025. Code at https://github.com/NVlabs/Long-RL and model at https://huggingface.co/Efficient-Large-Model/LongVILA-R1-7B
♻ ☆ Binary Diffusion Probabilistic Model
We propose the Binary Diffusion Probabilistic Model (BDPM), a generative framework specifically designed for data representations in binary form. Conventional denoising diffusion probabilistic models (DDPMs) assume continuous inputs, use mean squared error objectives and Gaussian perturbations, i.e., assumptions that are not suited to discrete and binary representations. BDPM instead encodes images into binary representations using multi bit-plane and learnable binary embeddings, perturbs them via XOR-based noise, and trains a model by optimizing a binary cross-entropy loss. These binary representations offer fine-grained noise control, accelerate convergence, and reduce inference cost. On image-to-image translation tasks, such as super-resolution, inpainting, and blind restoration, BDPM based on a small denoiser and multi bit-plane representation outperforms state-of-the-art methods on FFHQ, CelebA, and CelebA-HQ using a few sampling steps. In class-conditional generation on ImageNet-1k, BDPM based on learnable binary embeddings achieves competitive results among models with both low parameter counts and a few sampling steps.
♻ ☆ MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
Feature extraction techniques are crucial in medical image classification; however, classical feature extractors, in addition to traditional machine learning classifiers, often exhibit significant limitations in providing sufficient discriminative information for complex image sets. While Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) have shown promise in feature extraction, they are prone to overfitting due to the inherent characteristics of medical imaging data, including small sample sizes or high intra-class variance. In this work, the Medical Image Attention-based Feature Extractor (MIAFEx) is proposed, a novel method that employs a learnable refinement mechanism to enhance the classification token within the Transformer encoder architecture. This mechanism adjusts the token based on learned weights, improving the extraction of salient features and enhancing the model's adaptability to the challenges presented by medical imaging data. The MIAFEx output feature quality is compared against classical feature extractors using traditional and hybrid classifiers. Also, the performance of these features is compared against modern CNN and ViT models in classification tasks, demonstrating their superiority in accuracy and robustness across multiple complex medical imaging datasets. This advantage is particularly pronounced in scenarios with limited training data, where traditional and modern models often struggle to generalize effectively. The source code of this proposal can be found at https://github.com/Oscar-RamosS/Medical-Image-Attention-based-Feature-Extractor-MIAFEx
comment: This is the preprint version of an article that has been accepted for publication in Knowledge-Based Systems
♻ ☆ RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness NeurIPS 2025
Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter-efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, few methods are dedicated to efficient merging, and existing methods designed for full fine-tuning merging fail under efficient merging. To address the issue, we analyze from low-rank decomposition and reveal that direction robustness during merging is crucial for merging efficient modules. We furthermore uncover that compensating for the gap between stark singular values contributes to direction robustness. Therefore, we propose RobustMerge, a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness. Specifically, we (1) prune parameters and scale coefficients from inter-parameter relation for singular values to maintain direction stability away from task interference, and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness. Code is available at https://github.com/AuroraZengfh/RobustMerge.
comment: NeurIPS 2025 (Spotlight)
♻ ☆ Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models
Counterfactual generation aims to simulate realistic hypothetical outcomes under causal interventions. Diffusion models have emerged as a powerful tool for this task, combining DDIM inversion with conditional generation and classifier-free guidance (CFG). In this work, we identify a key limitation of CFG for counterfactual generation: it prescribes a global guidance scale for all attributes, leading to significant spurious changes in inferred counterfactuals. To mitigate this, we propose Decoupled Classifier-Free Guidance (DCFG), a flexible and model-agnostic guidance technique that enables attribute-wise control following a causal graph. DCFG is implemented via a simple attribute-split embedding strategy that disentangles semantic inputs, enabling selective guidance on user-defined attribute groups.
♻ ☆ LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
This paper extends LiDAR-BIND, a modular multi-modal fusion framework that binds heterogeneous sensors (radar, sonar) to a LiDAR-defined latent space, with mechanisms that explicitly enforce temporal consistency. We introduce three contributions: (i) temporal embedding similarity that aligns consecutive latent representations, (ii) a motion-aligned transformation loss that matches displacement between predictions and ground truth LiDAR, and (iii) windowed temporal fusion using a specialised temporal module. We further update the model architecture to better preserve spatial structure. Evaluations on radar/sonar-to-LiDAR translation demonstrate improved temporal and spatial coherence, yielding lower absolute trajectory error and better occupancy map accuracy in Cartographer-based SLAM (Simultaneous Localisation and Mapping). We propose different metrics based on the Fr\'echet Video Motion Distance (FVMD) and a correlation-peak distance metric providing practical temporal quality indicators to evaluate SLAM performance. The proposed temporal LiDAR-BIND, or LiDAR-BIND-T, maintains modular modality fusion while substantially enhancing temporal stability, resulting in improved robustness and performance for downstream SLAM.
♻ ☆ From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
Dense prediction tasks hold significant importance of computer vision, aiming to learn pixel-wise annotated labels for input images. Despite advances in this field, existing methods primarily focus on idealized conditions, exhibiting limited real-world generalization and struggling with the acute scarcity of real-world data in practical scenarios. To systematically study this problem, we first introduce DenseWorld, a benchmark spanning a broad set of 25 dense prediction tasks that correspond to urgent real-world applications, featuring unified evaluation across tasks. We then propose DenseDiT, which exploits generative models' visual priors to perform diverse real-world dense prediction tasks through a unified strategy. DenseDiT combines a parameter-reuse mechanism and two lightweight branches that adaptively integrate multi-scale context. This design enables DenseDiT to achieve efficient tuning with less than 0.1% additional parameters, activating the visual priors while effectively adapting to diverse real-world dense prediction tasks. Evaluations on DenseWorld reveal significant performance drops in existing general and specialized baselines, highlighting their limited real-world generalization. In contrast, DenseDiT achieves superior results using less than 0.01% training data of baselines, underscoring its practical value for real-world deployment.
♻ ☆ U-Mamba2-SSL for Semi-Supervised Tooth and Pulp Segmentation in CBCT
Accurate segmentation of teeth and pulp in Cone-Beam Computed Tomography (CBCT) is vital for clinical applications like treatment planning and diagnosis. However, this process requires extensive expertise and is exceptionally time-consuming, highlighting the critical need for automated algorithms that can effectively utilize unlabeled data. In this paper, we propose U-Mamba2-SSL, a novel semi-supervised learning framework that builds on the U-Mamba2 model and employs a multi-stage training strategy. The framework first pre-trains U-Mamba2 in a self-supervised manner using a disruptive autoencoder. It then leverages unlabeled data through consistency regularization, where we introduce input and feature perturbations to ensure stable model outputs. Finally, a pseudo-labeling strategy is implemented with a reduced loss weighting to minimize the impact of potential errors. U-Mamba2-SSL achieved an average score of 0.789 and a DSC of 0.917 on the hidden test set, achieving first place in Task 1 of the STSR 2025 challenge. The code is available at https://github.com/zhiqin1998/UMamba2.
comment: First place solution in Task 1 of the STSR 2025 challenge, MICCAI 2025
♻ ☆ Multi-View Projection for Unsupervised Domain Adaptation in 3D Semantic Segmentation
3D semantic segmentation is essential for autonomous driving and road infrastructure analysis, but state-of-the-art 3D models suffer from severe domain shift when applied across datasets. We propose a multi-view projection framework for unsupervised domain adaptation (UDA). Our method aligns LiDAR scans into coherent 3D scenes and renders them from multiple virtual camera poses to generate large-scale synthetic 2D datasets (PC2D) in various modalities. An ensemble of 2D segmentation models is trained on these modalities, and during inference, hundreds of views per scene are processed, with logits back-projected to 3D using an occlusion-aware voting scheme to produce point-wise labels. These labels can be used directly or to fine-tune a 3D segmentation model in the target domain. We evaluate our approach in both Real-to-Real and Simulation-to-Real UDA, achieving state-of-the-art performance in the Real-to-Real setting. Furthermore, we show that our framework enables segmentation of rare classes, leveraging only 2D annotations for those classes while relying on 3D annotations for others in the source domain.
♻ ☆ U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT
Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.
comment: First place solution for both tasks of the ToothFairy3 challenge, MICCAI 2025
♻ ☆ AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
Hallucination remains a major challenge in multimodal large language models (MLLMs). To address this, various contrastive decoding (CD) methods have been proposed that contrasts original logits with hallucinated logits generated from perturbed inputs. While CD has shown promise in vision-language models (VLMs), it is not well-suited for AV-LLMs, where hallucinations often emerge from both unimodal and cross-modal combinations involving audio, video, and language. These intricate interactions call for a more adaptive and modality-aware decoding strategy. In this paper, we propose Audio-Visual Contrastive Decoding (AVCD)-a novel, training-free decoding framework designed to model trimodal interactions and suppress modality-induced hallucinations in AV-LLMs. Unlike previous CD methods in VLMs that corrupt a fixed modality, AVCD leverages attention distributions to dynamically identify less dominant modalities and applies attentive masking to generate perturbed output logits. To support CD in a trimodal setting, we also reformulate the original CD framework to jointly handle audio, visual, and textual inputs. Finally, to improve efficiency, we introduce entropy-guided adaptive decoding, which selectively skips unnecessary decoding steps based on the model's confidence in its predictions. Extensive experiments demonstrate that AVCD consistently outperforms existing decoding methods. Especially, on the AVHBench dataset, it improves accuracy by 2% for VideoLLaMA2 and 7% for video-SALMONN, demonstrating strong robustness and generalizability. Our code is available at https://github.com/kaistmm/AVCD.
♻ ☆ OmniCount: Multi-label Object Counting with Semantic-Geometric Priors AAAI 2025
Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies come with their own set of limitations, such as the need for manual exemplar input and multiple passes for multiple categories, resulting in significant inefficiencies. This paper introduces a more practical approach enabling simultaneous counting of multiple object categories using an open-vocabulary framework. Our solution, OmniCount, stands out by using semantic and geometric insights (priors) from pre-trained models to count multiple categories of objects as specified by users, all without additional training. OmniCount distinguishes itself by generating precise object masks and leveraging varied interactive prompts via the Segment Anything Model for efficient counting. To evaluate OmniCount, we created the OmniCount-191 benchmark, a first-of-its-kind dataset with multi-label object counts, including points, bounding boxes, and VQA annotations. Our comprehensive evaluation in OmniCount-191, alongside other leading benchmarks, demonstrates OmniCount's exceptional performance, significantly outpacing existing solutions. The project webpage is available at https://mondalanindya.github.io/OmniCount.
comment: Accepted to AAAI 2025
♻ ☆ Dynamic Novel View Synthesis in High Dynamic Range
High Dynamic Range Novel View Synthesis (HDR NVS) seeks to learn an HDR 3D model from Low Dynamic Range (LDR) training images captured under conventional imaging conditions. Current methods primarily focus on static scenes, implicitly assuming all scene elements remain stationary and non-living. However, real-world scenarios frequently feature dynamic elements, such as moving objects, varying lighting conditions, and other temporal events, thereby presenting a significantly more challenging scenario. To address this gap, we propose a more realistic problem named HDR Dynamic Novel View Synthesis (HDR DNVS), where the additional dimension ``Dynamic'' emphasizes the necessity of jointly modeling temporal radiance variations alongside sophisticated 3D translation between LDR and HDR. To tackle this complex, intertwined challenge, we introduce HDR-4DGS, a Gaussian Splatting-based architecture featured with an innovative dynamic tone-mapping module that explicitly connects HDR and LDR domains, maintaining temporal radiance coherence by dynamically adapting tone-mapping functions according to the evolving radiance distributions across the temporal dimension. As a result, HDR-4DGS achieves both temporal radiance consistency and spatially accurate color translation, enabling photorealistic HDR renderings from arbitrary viewpoints and time instances. Extensive experiments demonstrate that HDR-4DGS surpasses existing state-of-the-art methods in both quantitative performance and visual fidelity. Source code will be released.
♻ ☆ SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
comment: 24 pages, 6 figures
♻ ☆ Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
The goal of this work is to enhance balanced multimodal understanding in audio-visual large language models (AV-LLMs) by addressing modality bias without additional training. In current AV-LLMs, audio and video features are typically processed jointly in the decoder. While this strategy facilitates unified multimodal understanding, it may introduce modality bias, where the model tends to over-rely on one modality due to imbalanced training signals. To mitigate this, we propose Fork-Merge Decoding (FMD), a simple yet effective inference-time strategy that requires no additional training or architectural modifications. FMD first performs modality-specific reasoning by processing audio-only and video-only inputs through the early decoder layers (fork), and then merges the resulting hidden states for joint reasoning in the remaining layers (merge). This separation allows each modality to be emphasized in the early stages while encouraging balanced contributions during integration. We validate our method on three representative AV-LLMs-VideoLLaMA2, video-SALMONN, and Qwen2.5-Omni-using three benchmark datasets. Experimental results show consistent gains in audio, video, and audio-visual reasoning tasks, highlighting the effectiveness of inference-time interventions for robust and efficient multimodal understanding.
♻ ☆ PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology
Accurate analysis of pathological images is essential for automated tumor diagnosis but remains challenging due to high structural similarity and subtle morphological variations in tissue images. Current vision-language (VL) models often struggle to capture the complex reasoning required for interpreting structured pathological reports. To address these limitations, we propose PathoHR-Bench, a novel benchmark designed to evaluate VL models' abilities in hierarchical semantic understanding and compositional reasoning within the pathology domain. Results of this benchmark reveal that existing VL models fail to effectively model intricate cross-modal relationships, hence limiting their applicability in clinical setting. To overcome this, we further introduce a pathology-specific VL training scheme that generates enhanced and perturbed samples for multimodal contrastive learning. Experimental evaluations demonstrate that our approach achieves state-of-the-art performance on PathoHR-Bench and six additional pathology datasets, highlighting its effectiveness in fine-grained pathology representation.
comment: Accept by EMNLP2025
♻ ☆ Rethinking Diffusion Model in High Dimension
Curse of Dimensionality is an unavoidable challenge in statistical probability models, yet diffusion models seem to overcome this limitation, achieving impressive results in high-dimensional data generation. Diffusion models assume that they can learn the statistical quantities of the underlying probability distribution, enabling sampling from this distribution to generate realistic samples. But is this really how they work? We argue not, based on the following observations: 1) In high-dimensional sparse scenarios, the fitting target of the diffusion model's objective function degrades from a weighted sum of multiple samples to a single sample, which we believe hinders the model's ability to effectively learn essential statistical quantities such as posterior, score, or velocity field. 2) Most inference methods can be unified within a simple framework which involves no statistical concepts, aligns with the degraded objective function, and provides an novel and intuitive perspective on the inference process.
♻ ☆ ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in open-ended visual question answering, they remain vulnerable to hallucinations. These are outputs that contradict or misrepresent input semantics, posing a critical challenge to the reliability and factual consistency. Existing methods often rely on external verification or post-hoc correction, lacking an internal mechanism to validate outputs directly during training. To bridge this gap, we propose ReLoop, a unified closed-loop training framework that encourages multimodal consistency for cross-modal understanding in MLLMs. ReLoop adopts a ring-shaped structure that integrates three complementary consistency feedback mechanisms, obliging MLLMs to "seeing twice and thinking backwards". Specifically, ReLoop employs the frozen Consistency Feedback Plugin (CFP), comprising semantic reconstruction, visual description, and an attention supervision module for attention alignment. These components collectively enforce semantic reversibility, visual consistency, and interpretable attention, enabling the model to correct its outputs during training. Extensive evaluations and analyses demonstrate the effectiveness of ReLoop in reducing hallucination rates across multiple benchmarks, establishing a robust method for hallucination mitigation in MLLMs. We will release our source code and data in the camera-ready version.
comment: Accepted by conference EMNLP2025
♻ ☆ Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation
Despite remarkable progress in image quality and prompt fidelity, text-to-image (T2I) diffusion models continue to exhibit persistent "hallucinations", where generated content subtly or significantly diverges from the intended prompt semantics. While often regarded as unpredictable artifacts, we argue that these failures reflect deeper, structured misalignments within the generative process. In this work, we propose a cognitively inspired perspective that reinterprets hallucinations as trajectory drift within a latent alignment space. Empirical observations reveal that generation unfolds within a multiaxial cognitive tension field, where the model must continuously negotiate competing demands across three key critical axes: semantic coherence, structural alignment, and knowledge grounding. We then formalize this three-axis space as the Hallucination Tri-Space and introduce the Alignment Risk Code (ARC): a dynamic vector representation that quantifies real-time alignment tension during generation. The magnitude of ARC captures overall misalignment, its direction identifies the dominant failure axis, and its imbalance reflects tension asymmetry. Based on this formulation, we develop the TensionModulator (TM-ARC): a lightweight controller that operates entirely in latent space. TM-ARC monitors ARC signals and applies targeted, axis-specific interventions during the sampling process. Extensive experiments on standard T2I benchmarks demonstrate that our approach significantly reduces hallucination without compromising image quality or diversity. This framework offers a unified and interpretable approach for understanding and mitigating generative failures in diffusion-based T2I systems.
comment: 9 pages,6 figures,7 tables
♻ ☆ RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement
Capturing screens is now routine in our everyday lives. But the photographs of emissive displays are often influenced by the flicker-banding (FB), which is alternating bright%u2013dark stripes that arise from temporal aliasing between a camera's rolling-shutter readout and the display's brightness modulation. Unlike moire degradation, which has been extensively studied, the FB remains underexplored despite its frequent and severe impact on readability and perceived quality. We formulate FB removal as a dedicated restoration task and introduce Removal of Image Flicker-Banding via Latent Diffusion Enhancement, RIFLE, a diffusion-based framework designed to remove FB while preserving fine details. We propose the flicker-banding prior estimator (FPE) that predicts key banding attributes and injects it into the restoration network. Additionally, Masked Loss (ML) is proposed to concentrate supervision on banded regions without sacrificing global fidelity. To overcome data scarcity, we provide a simulation pipeline that synthesizes FB in the luminance domain with stochastic jitter in banding angle, banding spacing, and banding width. Feathered boundaries and sensor noise are also applied for a more realistic simulation. For evaluation, we collect a paired real-world FB dataset with pixel-aligned banding-free references captured via long exposure. Across quantitative metrics and visual comparisons on our real-world dataset, RIFLE consistently outperforms recent image reconstruction baselines from mild to severe flicker-banding. To the best of our knowledge, it is the first work to research the simulation and removal of FB. Our work establishes a great foundation for subsequent research in both the dataset construction and the removal model design. Our dataset and code will be released soon.
♻ ☆ ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Recently, 3D Gaussian Splatting (3D-GS) has prevailed in novel view synthesis, achieving high fidelity and efficiency. However, it often struggles to capture rich details and complete geometry. Our analysis reveals that the 3D-GS densification operation lacks adaptiveness and faces a dilemma between geometry coverage and detail recovery. To address this, we introduce a novel densification operation, residual split, which adds a downscaled Gaussian as a residual. Our approach is capable of adaptively retrieving details and complementing missing geometry. To further support this method, we propose a pipeline named ResGS. Specifically, we integrate a Gaussian image pyramid for progressive supervision and implement a selection scheme that prioritizes the densification of coarse Gaussians over time. Extensive experiments demonstrate that our method achieves SOTA rendering quality. Consistent performance improvements can be achieved by applying our residual split on various 3D-GS variants, underscoring its versatility and potential for broader application in 3D-GS-based applications.
♻ ☆ SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid 3D Registration
Existing optimization-based methods for non-rigid registration typically minimize an alignment error metric based on the point-to-point or point-to-plane distance between corresponding point pairs on the source surface and target surface. However, these metrics can result in slow convergence or a loss of detail. In this paper, we propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration. The symmetrized point-to-plane distance relies on both the positions and normals of the corresponding points, resulting in a more accurate approximation of the underlying geometry and can achieve higher accuracy than existing methods. To solve this optimization problem efficiently, we introduce an as-rigid-as-possible regulation term to estimate the deformed normals and propose an alternating minimization solver using a majorization-minimization strategy. Moreover, for effective initialization of the solver, we incorporate a deformation graph-based coarse alignment that improves registration quality and efficiency. Extensive experiments show that the proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency. The code is publicly available at https://github.com/yaoyx689/spare.
comment: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence
♻ ☆ DiffTex: Differentiable Texturing for Architectural Proxy Models SIGGRAPH
Simplified proxy models are commonly used to represent architectural structures, reducing storage requirements and enabling real-time rendering. However, the geometric simplifications inherent in proxies result in a loss of fine color and geometric details, making it essential for textures to compensate for the loss. Preserving the rich texture information from the original dense architectural reconstructions remains a daunting task, particularly when working with unordered RGB photographs. We propose an automated method for generating realistic texture maps for architectural proxy models at the texel level from an unordered collection of registered photographs. Our approach establishes correspondences between texels on a UV map and pixels in the input images, with each texel's color computed as a weighted blend of associated pixel values. Using differentiable rendering, we optimize blending parameters to ensure photometric and perspective consistency, while maintaining seamless texture coherence. Experimental results demonstrate the effectiveness and robustness of our method across diverse architectural models and varying photographic conditions, enabling the creation of high-quality textures that preserve visual fidelity and structural detail.
comment: ACM TOG and SIGGRAPH Asia 2025 (Patent Protected); Project page: https://vcc.tech/research/2025/DiffTex
♻ ☆ LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) have demonstrated exceptional success in various multimodal tasks, yet their deployment is frequently limited by substantial computational demands and prolonged inference times. Given that the vision modality typically contains more comprehensive information than the text modality, resulting in encoded representations comprising an extensive number of tokens, leading to significant computational overhead due to the quadratic complexity of the attention mechanism. Current token reduction methods are typically restricted to specific model architectures and often necessitate extensive retraining or fine-tuning, restricting their applicability to many state-of-the-art models. In this paper, we introduce a learning-free token reduction (LFTR) method designed for MLLMs. LFTR can be seamlessly integrated into most open-source MLLM architectures without requiring additional fine-tuning. By capitalizing on the redundancy in visual representations, our approach effectively reduces tokens while preserving the general inference performance of MLLMs. We conduct experiments on multiple MLLM architectures (LLaVA, MiniGPT, QwenVL), and our results show that LFTR achieves up to a $16\times$ reduction of visual tokens while maintaining or even enhancing performance on mainstream vision question-answering benchmarks, all in a learning-free setting. Additionally, LFTR is complementary to other acceleration techniques, such as vision encoder compression and post-training quantization, further promoting the efficient deployment of MLLMs. Our project is available at https://anonymous.4open.science/r/LFTR-AAAI-0528.
♻ ☆ On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification
The development of explainable artificial intelligence (xAI) methods for scene classification problems has attracted great attention in remote sensing (RS). Most xAI methods and the related evaluation metrics in RS are initially developed for natural images considered in computer vision (CV), and their direct usage in RS may not be suitable. To address this issue, in this paper, we investigate the effectiveness of explanation methods and metrics in the context of RS image scene classification. In detail, we methodologically and experimentally analyze ten explanation metrics spanning five categories (faithfulness, robustness, localization, complexity, randomization), applied to five established feature attribution methods (Occlusion, LIME, GradCAM, LRP, and DeepLIFT) across three RS datasets. Our methodological analysis identifies key limitations in both explanation methods and metrics. The performance of perturbation-based methods, such as Occlusion and LIME, heavily depends on perturbation baselines and spatial characteristics of RS scenes. Gradient-based approaches like GradCAM struggle when multiple labels are present in the same image, while some relevance propagation methods (LRP) can distribute relevance disproportionately relative to the spatial extent of classes. Analogously, we find limitations in evaluation metrics. Faithfulness metrics share the same problems as perturbation-based methods. Localization metrics and complexity metrics are unreliable for classes with a large spatial extent. In contrast, robustness metrics and randomization metrics consistently exhibit greater stability. Our experimental results support these methodological findings. Based on our analysis, we provide guidelines for selecting explanation methods, metrics, and hyperparameters in the context of RS image scene classification.
comment: The code of this work will be publicly available at https://git.tu-berlin.de/rsim/xai4rs
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: (i) VFM-driven superpixel generation for detailed semantic representation, (ii) a VFM-assisted contrastive learning strategy to align multimodal features, (iii) superpoint temporal consistency to maintain stable representations across time, and (iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach achieves substantial gains over state-of-the-art methods in linear probing and fine-tuning for LiDAR-based segmentation and object detection. Extensive experiments on 11 large-scale multi-sensor datasets highlight our superior performance, demonstrating adaptability, efficiency, and robustness in real-world autonomous driving scenarios.
comment: IEEE TPAMI 2025; 17 pages, 9 figures, 11 tables; Project Page at https://ldkong.com/LargeAD
♻ ☆ UniVid: The Open-Source Unified Video Model
Unified video modeling that combines generation and understanding capabilities is increasingly important but faces two key challenges: maintaining semantic faithfulness during flow-based generation due to text-visual token imbalance and the limitations of uniform cross-modal attention across the flow trajectory, and efficiently extending image-centric MLLMs to video without costly retraining. We present UniVid, a unified architecture that couples an MLLM with a diffusion decoder through a lightweight adapter, enabling both video understanding and generation. We introduce Temperature Modality Alignment to improve prompt adherence and Pyramid Reflection for efficient temporal reasoning via dynamic keyframe selection. Extensive experiments on standard benchmarks demonstrate state-of-the-art performance, achieving a 2.2% improvement on VBench-Long total score compared to EasyAnimateV5.1, and 1.0% and 3.3% accuracy gains on MSVD-QA and ActivityNet-QA, respectively, compared with the best prior 7B baselines. Code: https://github.com/AIGeeksGroup/UniVid. Website: https://aigeeksgroup.github.io/UniVid.
♻ ☆ OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata NeurIPS 2025
Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.
comment: Accepted at NeurIPS 2025
♻ ☆ Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.
comment: 18 pages,15 figures
♻ ☆ LayerLock: Non-collapsing Representation Learning with Progressive Freezing ICCV 2025
We introduce LayerLock, a simple yet effective approach for self-supervised visual representation learning, that gradually transitions from pixel to latent prediction through progressive layer freezing. First, we make the observation that during training of video masked-autoencoding (MAE) models, ViT layers converge in the order of their depth: shallower layers converge early, deeper layers converge late. We then show that this observation can be exploited to accelerate standard MAE by progressively freezing the model according to an explicit schedule, throughout training. Furthermore, this same schedule can be used in a simple and scalable approach to latent prediction that does not suffer from "representation collapse". We apply our proposed approach, LayerLock, to large models of up to 4B parameters with results surpassing those of non-latent masked prediction on the 4DS perception suite.
comment: ICCV 2025
♻ ☆ Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms recent strong baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/
♻ ☆ M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation
Accurate medical image segmentation is critical for early medical diagnosis. Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder. However, both the two operations easily generate plenty of redundant information, which will weaken the complementarity between different level features, resulting in inaccurate localization and blurred edges of lesions. To address this challenge, we propose a general multi-scale in multi-scale subtraction network (M$^{2}$SNet) to finish diverse segmentation from medical image. Specifically, we first design a basic subtraction unit (SU) to produce the difference features between adjacent levels in encoder. Next, we expand the single-scale SU to the intra-layer multi-scale SU, which can provide the decoder with both pixel-level and structure-level difference information. Then, we pyramidally equip the multi-scale SUs at different levels with varying receptive fields, thereby achieving the inter-layer multi-scale feature aggregation and obtaining rich multi-scale difference information. In addition, we build a training-free network ``LossNet'' to comprehensively supervise the task-aware features from bottom layer to top layer, which drives our multi-scale subtraction network to capture the detailed and structural cues simultaneously. Without bells and whistles, our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks of diverse image modalities, including color colonoscopy imaging, ultrasound imaging, computed tomography (CT), and optical coherence tomography (OCT). The source code can be available at https://github.com/Xiaoqi-Zhao-DLUT/MSNet.
comment: Manuscript
♻ ☆ Modeling Saliency Dataset Bias ICCV 2025
Recent advances in image-based saliency prediction are approaching gold standard performance levels on existing benchmarks. Despite this success, we show that predicting fixations across multiple saliency datasets remains challenging due to dataset bias. We find a significant performance drop (around 40%) when models trained on one dataset are applied to another. Surprisingly, increasing dataset diversity does not resolve this inter-dataset gap, with close to 60% attributed to dataset-specific biases. To address this remaining generalization gap, we propose a novel architecture extending a mostly dataset-agnostic encoder-decoder structure with fewer than 20 dataset-specific parameters that govern interpretable mechanisms such as multi-scale structure, center bias, and fixation spread. Adapting only these parameters to new data accounts for more than 75% of the generalization gap, with a large fraction of the improvement achieved with as few as 50 samples. Our model sets a new state-of-the-art on all three datasets of the MIT/Tuebingen Saliency Benchmark (MIT300, CAT2000, and COCO-Freeview), even when purely generalizing from unrelated datasets, but with a substantial boost when adapting to the respective training datasets. The model also provides valuable insights into spatial saliency properties, revealing complex multi-scale effects that combine both absolute and relative sizes.
comment: published at ICCV 2025
♻ ☆ J-NeuS: Joint field optimization for Neural Surface reconstruction in urban scenes with limited image overlap
Reconstructing the surrounding surface geometry from recorded driving sequences poses a significant challenge due to the limited image overlap and complex topology of urban environments. SoTA neural implicit surface reconstruction methods often struggle in such setting, either failing due to small vision overlap or exhibiting suboptimal performance in accurately reconstructing both the surface and fine structures. To address these limitations, we introduce J-NeuS, a novel hybrid implicit surface reconstruction method for large driving sequences with outward facing camera poses. J-NeuS cross-representation uncertainty estimation to tackle ambiguous geometry caused by limited observations. Our method performs joint optimization of two radiance fields in addition to guided sampling achieving accurate reconstruction of large areas along with fine structures in complex urban scenarios. Extensive evaluation on major driving datasets demonstrates the superiority of our approach in reconstructing large driving sequences with limited image overlap, outperforming concurrent SoTA methods.
♻ ☆ Adjustable Spatio-Spectral Hyperspectral Image Compression Network
With the rapid growth of hyperspectral data archives in remote sensing (RS), the need for efficient storage has become essential, driving significant attention toward learning-based hyperspectral image (HSI) compression. However, a comprehensive investigation of the individual and joint effects of spectral and spatial compression on learning-based HSI compression has not been thoroughly examined yet. Conducting such an analysis is crucial for understanding how the exploitation of spectral, spatial, and joint spatio-spectral redundancies affects HSI compression. To address this issue, we propose Adjustable Spatio-Spectral Hyperspectral Image Compression Network (HyCASS), a learning-based model designed for adjustable HSI compression in both spectral and spatial dimensions. HyCASS consists of six main modules: 1) spectral encoder module; 2) spatial encoder module; 3) compression ratio (CR) adapter encoder module; 4) CR adapter decoder module; 5) spatial decoder module; and 6) spectral decoder module. The modules employ convolutional layers and transformer blocks to capture both short-range and long-range redundancies. Experimental results on three HSI benchmark datasets demonstrate the effectiveness of our proposed adjustable model compared to existing learning-based compression models, surpassing the state of the art by up to 2.36 dB in terms of PSNR. Based on our results, we establish a guideline for effectively balancing spectral and spatial compression across different CRs, taking into account the spatial resolution of the HSIs. Our code and pre-trained model weights are publicly available at https://git.tu-berlin.de/rsim/hycass .
♻ ☆ Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
Drones are increasingly used in forestry to capture high-resolution remote sensing data, supporting enhanced monitoring, assessment, and decision-making processes. While operations above the forest canopy are already highly automated, flying inside forests remains challenging, primarily relying on manual piloting. In dense forests, relying on the Global Navigation Satellite System (GNSS) for localization is not feasible. In addition, the drone must autonomously adjust its flight path to avoid collisions. Recently, advancements in robotics have enabled autonomous drone flights in GNSS-denied obstacle-rich areas. In this article, a step towards autonomous forest data collection is taken by building a prototype of a robotic under-canopy drone utilizing state-of-the-art open source methods and validating its performance for data collection inside forests. Specifically, the study focused on camera-based autonomous flight under the forest canopy and photogrammetric post-processing of the data collected with the low-cost onboard stereo camera. The autonomous flight capability of the prototype was evaluated through multiple test flights in boreal forests. The tree parameter estimation capability was studied by performing diameter at breast height (DBH) estimation. The prototype successfully carried out flights in selected challenging forest environments, and the experiments showed promising performance in forest 3D modeling with a miniaturized stereoscopic photogrammetric system. The DBH estimation achieved a root mean square error (RMSE) of 3.33 - 3.97 cm (10.69 - 12.98 %) across all trees. For trees with a DBH less than 30 cm, the RMSE was 1.16 - 2.56 cm (5.74 - 12.47 %). The results provide valuable insights into autonomous under-canopy forest mapping and highlight the critical next steps for advancing lightweight robotic drone systems for mapping complex forest environments.
comment: 35 pages, 11 Figures
♻ ☆ How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
Despite significant advancements in Large Vision Language Models (LVLMs), a gap remains, particularly regarding their interpretability and how they locate and interpret textual information within images. In this paper, we explore various LVLMs to identify the specific heads responsible for recognizing text from images, which we term the Optical Character Recognition Head (OCR Head). Our findings regarding these heads are as follows: (1) Less Sparse: Unlike previous retrieval heads, a large number of heads are activated to extract textual information from images. (2) Qualitatively Distinct: OCR heads possess properties that differ significantly from general retrieval heads, exhibiting low similarity in their characteristics. (3) Statically Activated: The frequency of activation for these heads closely aligns with their OCR scores. We validate our findings in downstream tasks by applying Chain-of-Thought (CoT) to both OCR and conventional retrieval heads and by masking these heads. We also demonstrate that redistributing sink-token values within the OCR heads improves performance. These insights provide a deeper understanding of the internal mechanisms LVLMs employ in processing embedded textual information in images.
comment: EMNLP 2025 Oral
♻ ☆ Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification
Previous work has suggested that preprocessing images through lossy compression can defend against adversarial perturbations, but comprehensive attack evaluations have been lacking. In this paper, we construct strong white-box and adaptive attacks against various compression models and identify a critical challenge for attackers: high realism in reconstructed images significantly increases attack difficulty. Through rigorous evaluation across multiple attack scenarios, we demonstrate that compression models capable of producing realistic, high-fidelity reconstructions are substantially more resistant to our attacks. In contrast, low-realism compression models can be broken. Our analysis reveals that this is not due to gradient masking. Rather, realistic reconstructions maintaining distributional alignment with natural images seem to offer inherent robustness. This work highlights a significant obstacle for future adversarial attacks and suggests that developing more effective techniques to overcome realism represents an essential challenge for comprehensive security evaluation.
Machine Learning 150
☆ Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Text-to-Image (T2I) generation models have advanced rapidly in recent years, but accurately capturing spatial relationships like "above" or "to the right of" poses a persistent challenge. Earlier methods improved spatial relationship following with external position control. However, as architectures evolved to enhance image quality, these techniques became incompatible with modern models. We propose Stitch, a training-free method for incorporating external position control into Multi-Modal Diffusion Transformers (MMDiT) via automatically-generated bounding boxes. Stitch produces images that are both spatially accurate and visually appealing by generating individual objects within designated bounding boxes and seamlessly stitching them together. We find that targeted attention heads capture the information necessary to isolate and cut out individual objects mid-generation, without needing to fully complete the image. We evaluate Stitch on PosEval, our benchmark for position-based T2I generation. Featuring five new tasks that extend the concept of Position beyond the basic GenEval task, PosEval demonstrates that even top models still have significant room for improvement in position-based generation. Tested on Qwen-Image, FLUX, and SD3.5, Stitch consistently enhances base models, even improving FLUX by 218% on GenEval's Position task and by 206% on PosEval. Stitch achieves state-of-the-art results with Qwen-Image on PosEval, improving over previous models by 54%, all accomplished while integrating position control into leading models training-free. Code is available at https://github.com/ExplainableML/Stitch.
comment: Preprint
☆ Convergence and Divergence of Language Models under Different Random Seeds
In this paper, we investigate the convergence of language models (LMs) trained under different random seeds, measuring convergence as the expected per-token Kullback--Leibler (KL) divergence across seeds. By comparing LM convergence as a function of model size and training checkpoint, we identify a four-phase convergence pattern: (i) an initial uniform phase, (ii) a sharp-convergence phase, (iii) a sharp-divergence phase, and (iv) a slow-reconvergence phase. Further, we observe that larger models reconverge faster in later training stages, while smaller models never actually reconverge; these results suggest that a certain model size may be necessary to learn stable distributions. Restricting our analysis to specific token frequencies or part-of-speech (PoS) tags further reveals that convergence is uneven across linguistic categories: frequent tokens and function words converge faster and more reliably than their counterparts (infrequent tokens and content words). Overall, our findings highlight factors that influence the stability of the learned distributions in model training.
comment: Published at EMNLP 2025
☆ SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards
Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabilities typically requires access to the training and testing datasets, which may pose risks to data privacy and confidentiality. To improve transparency in organizations that handle confidential data or manage critical infrastructure, it is essential to allow external verification and validation of AI without the disclosure of private datasets. This paper presents Systematic Pattern Analysis (SPATA), a deterministic method that converts any tabular dataset to a domain-independent representation of its statistical patterns, to provide more detailed and transparent data cards. SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage. These projected datasets can be reliably used for the evaluation of how different features affect ML model robustness and for the generation of interpretable explanations of their behavior, contributing to more trustworthy AI.
comment: 16 pages, 3 tables, 6 figures, SynDAiTE, ECML PKDD 2025
☆ AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
Rapid advances in multimodal models demand benchmarks that rigorously evaluate understanding and reasoning in safety-critical, dynamic real-world settings. We present AccidentBench, a large-scale benchmark that combines vehicle accident scenarios with Beyond domains, safety-critical settings in air and water that emphasize spatial and temporal reasoning (e.g., navigation, orientation, multi-vehicle motion). The benchmark contains approximately 2000 videos and over 19000 human-annotated question--answer pairs spanning multiple video lengths (short/medium/long) and difficulty levels (easy/medium/hard). Tasks systematically probe core capabilities: temporal, spatial, and intent understanding and reasoning. By unifying accident-centric traffic scenes with broader safety-critical scenarios in air and water, AccidentBench offers a comprehensive, physically grounded testbed for evaluating models under real-world variability. Evaluations of state-of-the-art models (e.g., Gemini-2.5 Pro and GPT-5) show that even the strongest models achieve only about 18% accuracy on the hardest tasks and longest videos, revealing substantial gaps in real-world temporal, spatial, and intent reasoning. AccidentBench is designed to expose these critical gaps and drive the development of multimodal models that are safer, more robust, and better aligned with real-world safety-critical challenges. The code and dataset are available at: https://github.com/SafeRL-Lab/AccidentBench
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Reinforcement Learning (RL) has shown remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). Process-Supervised RL (PSRL) has emerged as a more effective paradigm compared to outcome-based RL. However, existing PSRL approaches suffer from limited exploration efficiency, both in terms of branching positions and sampling. In this paper, we introduce a novel PSRL framework (AttnRL), which enables efficient exploration for reasoning models. Motivated by preliminary observations that steps exhibiting high attention scores correlate with reasoning behaviors, we propose to branch from positions with high values. Furthermore, we develop an adaptive sampling strategy that accounts for problem difficulty and historical batch size, ensuring that the whole training batch maintains non-zero advantage values. To further improve sampling efficiency, we design a one-step off-policy training pipeline for PSRL. Extensive experiments on multiple challenging mathematical reasoning benchmarks demonstrate that our method consistently outperforms prior approaches in terms of performance and sampling and training efficiency.
☆ TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
☆ Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
comment: 24 pages, 9 figures
☆ Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Large Language Models (LLMs), despite being trained on text alone, surprisingly develop rich visual priors. These priors allow latent visual capabilities to be unlocked for vision tasks with a relatively small amount of multimodal data, and in some cases, to perform visual tasks without ever having seen an image. Through systematic analysis, we reveal that visual priors-the implicit, emergent knowledge about the visual world acquired during language pre-training-are composed of separable perception and reasoning priors with unique scaling trends and origins. We show that an LLM's latent visual reasoning ability is predominantly developed by pre-training on reasoning-centric data (e.g., code, math, academia) and scales progressively. This reasoning prior acquired from language pre-training is transferable and universally applicable to visual reasoning. In contrast, a perception prior emerges more diffusely from broad corpora, and perception ability is more sensitive to the vision encoder and visual instruction tuning data. In parallel, text describing the visual world proves crucial, though its performance impact saturates rapidly. Leveraging these insights, we propose a data-centric recipe for pre-training vision-aware LLMs and verify it in 1T token scale pre-training. Our findings are grounded in over 100 controlled experiments consuming 500,000 GPU-hours, spanning the full MLLM construction pipeline-from LLM pre-training to visual alignment and supervised multimodal fine-tuning-across five model scales, a wide range of data categories and mixtures, and multiple adaptation setups. Along with our main findings, we propose and investigate several hypotheses, and introduce the Multi-Level Existence Bench (MLE-Bench). Together, this work provides a new way of deliberately cultivating visual priors from language pre-training, paving the way for the next generation of multimodal LLMs.
comment: Project page: https://junlinhan.github.io/projects/lsbs/
☆ Uncertainty Quantification for Regression using Proper Scoring Rules
Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, these advances were focused on classification, while extending these ideas to regression remains challenging. In this work, we introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores. We derive closed-form expressions for the resulting uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models. In particular, the derived uncertainty measures naturally decompose into aleatoric and epistemic components. The framework recovers popular regression UQ measures based on predictive variance and differential entropy. Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.
☆ Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
Deploying reinforcement learning (RL) in robotics, industry, and health care is blocked by two obstacles: the difficulty of specifying accurate rewards and the risk of unsafe, data-hungry exploration. We address this by proposing a two-stage framework that first learns a safe initial policy from a reward-free dataset of expert demonstrations, then fine-tunes it online using preference-based human feedback. We provide the first principled analysis of this offline-to-online approach and introduce BRIDGE, a unified algorithm that integrates both signals via an uncertainty-weighted objective. We derive regret bounds that shrink with the number of offline demonstrations, explicitly connecting the quantity of offline data to online sample efficiency. We validate BRIDGE in discrete and continuous control MuJoCo environments, showing it achieves lower regret than both standalone behavioral cloning and online preference-based RL. Our work establishes a theoretical foundation for designing more sample-efficient interactive agents.
comment: 85 pages (11 + references and appendix), 9 figures
☆ DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
While previous AI Scientist systems can generate novel findings, they often lack the focus to produce scientifically valuable contributions that address pressing human-defined challenges. We introduce DeepScientist, a system designed to overcome this by conducting goal-oriented, fully autonomous scientific discovery over month-long timelines. It formalizes discovery as a Bayesian Optimization problem, operationalized through a hierarchical evaluation process consisting of "hypothesize, verify, and analyze". Leveraging a cumulative Findings Memory, this loop intelligently balances the exploration of novel hypotheses with exploitation, selectively promoting the most promising findings to higher-fidelity levels of validation. Consuming over 20,000 GPU hours, the system generated about 5,000 unique scientific ideas and experimentally validated approximately 1100 of them, ultimately surpassing human-designed state-of-the-art (SOTA) methods on three frontier AI tasks by 183.7\%, 1.9\%, and 7.9\%. This work provides the first large-scale evidence of an AI achieving discoveries that progressively surpass human SOTA on scientific tasks, producing valuable findings that genuinely push the frontier of scientific discovery. To facilitate further research into this process, we will open-source all experimental logs and system code at https://github.com/ResearAI/DeepScientist/.
☆ MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages
Ensuring native-like quality of large language model (LLM) responses across many languages is challenging. To address this, we introduce MENLO, a framework that operationalizes the evaluation of native-like response quality based on audience design-inspired mechanisms. Using MENLO, we create a dataset of 6,423 human-annotated prompt-response preference pairs covering four quality dimensions with high inter-annotator agreement in 47 language varieties. Our evaluation reveals that zero-shot LLM judges benefit significantly from pairwise evaluation and our structured annotation rubrics, yet they still underperform human annotators on our dataset. We demonstrate substantial improvements through fine-tuning with reinforcement learning, reward shaping, and multi-task learning approaches. Additionally, we show that RL-trained judges can serve as generative reward models to enhance LLMs' multilingual proficiency, though discrepancies with human judgment remain. Our findings suggest promising directions for scalable multilingual evaluation and preference alignment. We release our dataset and evaluation framework to support further research in multilingual LLM evaluation.
comment: 10 pages, 23 tables, 17 figures
☆ Are Robust LLM Fingerprints Adversarially Robust?
Model fingerprinting has emerged as a promising paradigm for claiming model ownership. However, robustness evaluations of these schemes have mostly focused on benign perturbations such as incremental fine-tuning, model merging, and prompting. Lack of systematic investigations into {\em adversarial robustness} against a malicious model host leaves current systems vulnerable. To bridge this gap, we first define a concrete, practical threat model against model fingerprinting. We then take a critical look at existing model fingerprinting schemes to identify their fundamental vulnerabilities. Based on these, we develop adaptive adversarial attacks tailored for each vulnerability, and demonstrate that these can bypass model authentication completely for ten recently proposed fingerprinting schemes while maintaining high utility of the model for the end users. Our work encourages fingerprint designers to adopt adversarial robustness by design. We end with recommendations for future fingerprinting methods.
☆ Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
Recent text-only models demonstrate remarkable mathematical reasoning capabilities. Extending these to visual domains requires vision-language models to translate images into text descriptions. However, current models, trained to produce captions for human readers, often omit the precise details that reasoning systems require. This creates an interface mismatch: reasoners often fail not due to reasoning limitations but because they lack access to critical visual information. We propose Adaptive-Clarification Reinforcement Learning (AC-RL), which teaches vision models what information reasoners need through interaction. Our key insight is that clarification requests during training reveal information gaps; by penalizing success that requires clarification, we create pressure for comprehensive initial captions that enable the reasoner to solve the problem in a single pass. AC-RL improves average accuracy by 4.4 points over pretrained baselines across seven visual mathematical reasoning benchmarks, and analysis shows it would cut clarification requests by up to 39% if those were allowed. By treating clarification as a form of implicit supervision, AC-RL demonstrates that vision-language interfaces can be effectively learned through interaction alone, without requiring explicit annotations.
☆ Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models
Large Language Models (LLMs) are widely used across multiple domains but continue to raise concerns regarding security and fairness. Beyond known attack vectors such as data poisoning and prompt injection, LLMs are also vulnerable to fairness bugs. These refer to unintended behaviors influenced by sensitive demographic cues (e.g., race or sexual orientation) that should not affect outcomes. Another key issue is hallucination, where models generate plausible yet false information. Retrieval-Augmented Generation (RAG) has emerged as a strategy to mitigate hallucinations by combining external retrieval with text generation. However, its adoption raises new fairness concerns, as the retrieved content itself may surface or amplify bias. This study conducts fairness testing through metamorphic testing (MT), introducing controlled demographic perturbations in prompts to assess fairness in sentiment analysis performed by three Small Language Models (SLMs) hosted on HuggingFace (Llama-3.2-3B-Instruct, Mistral-7B-Instruct-v0.3, and Llama-3.1-Nemotron-8B), each integrated into a RAG pipeline. Results show that minor demographic variations can break up to one third of metamorphic relations (MRs). A detailed analysis of these failures reveals a consistent bias hierarchy, with perturbations involving racial cues being the predominant cause of the violations. In addition to offering a comparative evaluation, this work reinforces that the retrieval component in RAG must be carefully curated to prevent bias amplification. The findings serve as a practical alert for developers, testers and small organizations aiming to adopt accessible SLMs without compromising fairness or reliability.
☆ Source Separation for A Cappella Music
In this work, we study the task of multi-singer separation in a cappella music, where the number of active singers varies across mixtures. To address this, we use a power set-based data augmentation strategy that expands limited multi-singer datasets into exponentially more training samples. To separate singers, we introduce SepACap, an adaptation of SepReformer, a state-of-the-art speaker separation model architecture. We adapt the model with periodic activations and a composite loss function that remains effective when stems are silent, enabling robust detection and separation. Experiments on the JaCappella dataset demonstrate that our approach achieves state-of-the-art performance in both full-ensemble and subset singer separation scenarios, outperforming spectrogram-based baselines while generalizing to realistic mixtures with varying numbers of singers.
☆ Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Process Reward Models (PRMs) have emerged as a promising approach to enhance the reasoning capabilities of large language models (LLMs) by guiding their step-by-step reasoning toward a final answer. However, existing PRMs either treat each reasoning step in isolation, failing to capture inter-step dependencies, or struggle to align process rewards with the final outcome. Consequently, the reward signal fails to respect temporal causality in sequential reasoning and faces ambiguous credit assignment. These limitations make downstream models vulnerable to reward hacking and lead to suboptimal performance. In this work, we propose Conditional Reward Modeling (CRM) that frames LLM reasoning as a temporal process leading to a correct answer. The reward of each reasoning step is not only conditioned on the preceding steps but also explicitly linked to the final outcome of the reasoning trajectory. By enforcing conditional probability rules, our design captures the causal relationships among reasoning steps, with the link to the outcome allowing precise attribution of each intermediate step, thereby resolving credit assignment ambiguity. Further, through this consistent probabilistic modeling, the rewards produced by CRM enable more reliable cross-sample comparison. Experiments across Best-of-N sampling, beam search and reinforcement learning demonstrate that CRM consistently outperforms existing reward models, offering a principled framework for enhancing LLM reasoning. In particular, CRM is more robust to reward hacking and delivers stable downstream improvements without relying on verifiable rewards derived from ground truth.
☆ Importance of localized dilatation and distensibility in identifying determinants of thoracic aortic aneurysm with neural operators
Thoracic aortic aneurysms (TAAs) arise from diverse mechanical and mechanobiological disruptions to the aortic wall that increase the risk of dissection or rupture. Evidence links TAA development to dysfunctions in the aortic mechanotransduction axis, including loss of elastic fiber integrity and cell-matrix connections. Because distinct insults create different mechanical vulnerabilities, there is a critical need to identify interacting factors that drive progression. Here, we use a finite element framework to generate synthetic TAAs from hundreds of heterogeneous insults spanning varying degrees of elastic fiber damage and impaired mechanosensing. From these simulations, we construct spatial maps of localized dilatation and distensibility to train neural networks that predict the initiating combined insult. We compare several architectures (Deep Operator Networks, UNets, and Laplace Neural Operators) and multiple input data formats to define a standard for future subject-specific modeling. We also quantify predictive performance when networks are trained using only geometric data (dilatation) versus both geometric and mechanical data (dilatation plus distensibility). Across all networks, prediction errors are significantly higher when trained on dilatation alone, underscoring the added value of distensibility information. Among the tested models, UNet consistently provides the highest accuracy across all data formats. These findings highlight the importance of acquiring full-field measurements of both dilatation and distensibility in TAA assessment to reveal the mechanobiological drivers of disease and support the development of personalized treatment strategies.
☆ AI-assisted Advanced Propellant Development for Electric Propulsion
Artificial Intelligence algorithms are introduced in this work as a tool to predict the performance of new chemical compounds as alternative propellants for electric propulsion, focusing on predicting their ionisation characteristics and fragmentation patterns. The chemical properties and structure of the compounds are encoded using a chemical fingerprint, and the training datasets are extracted from the NIST WebBook. The AI-predicted ionisation energy and minimum appearance energy have a mean relative error of 6.87% and 7.99%, respectively, and a predicted ion mass with a 23.89% relative error. In the cases of full mass spectra due to electron ionisation, the predictions have a cosine similarity of 0.6395 and align with the top 10 most similar mass spectra in 78% of instances within a 30 Da range.
comment: 23 pages, 10 figures, 5 tables. Journal of Electric Propulsion
☆ Parametric Neural Amp Modeling with Active Learning
We introduce Panama, an active learning framework to train parametric guitar amp models end-to-end using a combination of an LSTM model and a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined through an ensemble-based active learning strategy to minimize the amount of datapoints needed (i.e., amp knob settings). Our strategy uses gradient-based optimization to maximize the disagreement among ensemble models, in order to identify the most informative datapoints. MUSHRA listening tests reveal that, with 75 datapoints, our models are able to match the perceptual quality of NAM, the leading open-source non-parametric amp modeler.
☆ DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis
Deep neural networks (DNNs) are increasingly being deployed in high-stakes applications, from self-driving cars to biometric authentication. However, their unpredictable and unreliable behaviors in real-world settings require new approaches to characterize and ensure their reliability. This paper introduces DeepProv, a novel and customizable system designed to capture and characterize the runtime behavior of DNNs during inference by using their underlying graph structure. Inspired by system audit provenance graphs, DeepProv models the computational information flow of a DNN's inference process through Inference Provenance Graphs (IPGs). These graphs provide a detailed structural representation of the behavior of DNN, allowing both empirical and structural analysis. DeepProv uses these insights to systematically repair DNNs for specific objectives, such as improving robustness, privacy, or fairness. We instantiate DeepProv with adversarial robustness as the goal of model repair and conduct extensive case studies to evaluate its effectiveness. Our results demonstrate its effectiveness and scalability across diverse classification tasks, attack scenarios, and model complexities. DeepProv automatically identifies repair actions at the node and edge-level within IPGs, significantly enhancing the robustness of the model. In particular, applying DeepProv repair strategies to just a single layer of a DNN yields an average 55% improvement in adversarial accuracy. Moreover, DeepProv complements existing defenses, achieving substantial gains in adversarial robustness. Beyond robustness, we demonstrate the broader potential of DeepProv as an adaptable system to characterize DNN behavior in other critical areas, such as privacy auditing and fairness analysis.
comment: 18 pages, 9 figures, 6 tables, To appear in the 41st Annual Computer Security Applications Conference (ACSAC), 2025
☆ Estimating Dimensionality of Neural Representations from Finite Samples
The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a large language model and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.
☆ Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
In-context learning (ICL) is a central capability of Transformer models, but the structures in data that enable its emergence and govern its robustness remain poorly understood. In this work, we study how the structure of pretraining tasks governs generalization in ICL. Using a solvable model for ICL of linear regression by linear attention, we derive an exact expression for ICL generalization error in high dimensions under arbitrary pretraining-testing task covariance mismatch. This leads to a new alignment measure that quantifies how much information about the pretraining task distribution is useful for inference at test time. We show that this measure directly predicts ICL performance not only in the solvable model but also in nonlinear Transformers. Our analysis further reveals a tradeoff between specialization and generalization in ICL: depending on task distribution alignment, increasing pretraining task diversity can either improve or harm test performance. Together, these results identify train-test task alignment as a key determinant of generalization in ICL.
☆ Towards Verified Code Reasoning by LLMs
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software engineer understand a new code base, (2) helping a software engineer during code review sessions, and (3) ensuring that the code generated by an automated code generation system meets certain requirements (e.g. fixes a bug, improves readability, implements a feature). As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted. Manually confirming responses from a code reasoning agent requires human effort and can result in slower developer productivity, which weakens the assistance benefits of the agent. In this paper, we describe a method to automatically validate the answers provided by a code reasoning agent by verifying its reasoning steps. At a very high level, the method consists of extracting a formal representation of the agent's response and, subsequently, using formal verification and program analysis tools to verify the agent's reasoning steps. We applied this approach to a benchmark set of 20 uninitialized variable errors detected by sanitizers and 20 program equivalence queries. For the uninitialized variable errors, the formal verification step was able to validate the agent's reasoning on 13/20 examples, and for the program equivalence queries, the formal verification step successfully caught 6/8 incorrect judgments made by the agent.
comment: 43 pages
☆ Bayesian Influence Functions for Hessian-Free Data Attribution
Classical influence functions face significant challenges when applied to deep neural networks, primarily due to non-invertible Hessians and high-dimensional parameter spaces. We propose the local Bayesian influence function (BIF), an extension of classical influence functions that replaces Hessian inversion with loss landscape statistics that can be estimated via stochastic-gradient MCMC sampling. This Hessian-free approach captures higher-order interactions among parameters and scales efficiently to neural networks with billions of parameters. We demonstrate state-of-the-art results on predicting retraining experiments.
comment: 32 pages, 19 figures
TASP: Topology-aware Sequence Parallelism
Long-context large language models (LLMs) face constraints due to the quadratic complexity of the self-attention mechanism. The mainstream sequence parallelism (SP) method, Ring Attention, attempts to solve this by distributing the query into multiple query chunks across accelerators and enable each Q tensor to access all KV tensors from other accelerators via the Ring AllGather communication primitive. However, it exhibits low communication efficiency, restricting its practical applicability. This inefficiency stems from the mismatch between the Ring AllGather communication primitive it adopts and the AlltoAll topology of modern accelerators. A Ring AllGather primitive is composed of iterations of ring-styled data transfer, which can only utilize a very limited fraction of an AlltoAll topology. Inspired by the Hamiltonian decomposition of complete directed graphs, we identify that modern accelerator topology can be decomposed into multiple orthogonal ring datapaths which can concurrently transfer data without interference. Based on this, we further observe that the Ring AllGather primitive can also be decomposed into the same number of concurrent ring-styled data transfer at every iteration. Based on these insights, we propose TASP, a topology-aware SP method for long-context LLMs that fully utilizes the communication capacity of modern accelerators via topology decomposition and primitive decomposition. Experimental results on both single-node and multi-node NVIDIA H100 systems and a single-node AMD MI300X system demonstrate that TASP achieves higher communication efficiency than Ring Attention on these modern accelerator topologies and achieves up to 3.58 speedup than Ring Attention and its variant Zigzag-Ring Attention. The code is available at https://github.com/infinigence/HamiltonAttention.
☆ Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of $91.6\%$, $53.3\%$, and $61.2\%$ on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of $28.0\%$ on AndroidWorld and $19.8\%$ on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.
☆ The Loss Kernel: A Geometric Probe for Deep Learning Interpretability
We introduce the loss kernel, an interpretability method for measuring similarity between data points according to a trained neural network. The kernel is the covariance matrix of per-sample losses computed under a distribution of low-loss-preserving parameter perturbations. We first validate our method on a synthetic multitask problem, showing it separates inputs by task as predicted by theory. We then apply this kernel to Inception-v1 to visualize the structure of ImageNet, and we show that the kernel's structure aligns with the WordNet semantic hierarchy. This establishes the loss kernel as a practical tool for interpretability and data attribution.
comment: 25 pages, 11 figures
☆ OceanGym: A Benchmark Environment for Underwater Embodied Agents
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
comment: Work in progress
☆ Machine-Learning Driven Load Shedding to Mitigate Instability Attacks in Power Grids
Every year critical infrastructure becomes more complex and we grow to rely on it more and more. With this reliance, it becomes an attractive target for cyberattacks from sophisticated actors, with one of the most attractive targets being the power grid. One class of attacks, instability attacks, is a newer type of attack that has relatively few protections developed. We present a cost effective, data-driven approach to training a supervised machine learning model to retrofit load shedding decision systems in power grids with the capacity to defend against instability attacks. We show a proof of concept on the IEEE 14 Bus System using the Achilles Heel Technologies Power Grid Analyzer, and show through an implementation of modified Prony analysis (MPA) that MPA is a viable method for detecting instability attacks and triggering defense mechanisms.
☆ TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
Federated Learning (FL), despite demonstrating impressive capabilities in the training of multiple models in a decentralized manner, has been shown to produce a final model not necessarily well-suited to the needs of each client. While extensive work has been conducted on how to create tailored personalized models, called Personalized Federated Learning (PFL), less attention has been given to personalization via fine-tuning of foundation models with multi-task and multi-modal properties. Moreover, there exists a lack of understanding in the literature on how to fine-tune and personalize such models in a setting that is heterogeneous across clients not only in data, but also in tasks and modalities. To address this gap in the literature, we propose TAP (Two-Stage Adaptive Personalization), which (i) leverages mismatched model architectures between the clients and server to selectively conduct replacement operations when it benefits a client's local tasks and (ii) engages in post-FL knowledge distillation for capturing beneficial general knowledge without compromising personalization. We also introduce the first convergence analysis of the server model under its modality-task pair architecture, and demonstrate that as the number of modality-task pairs increases, its ability to cater to all tasks suffers. Through extensive experiments, we demonstrate the effectiveness of our proposed algorithm across a variety of datasets and tasks in comparison to a multitude of baselines. Implementation code is publicly available at https://github.com/lee3296/TAP.
☆ Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Large reasoning models show improved performance with longer chains of thought. However, recent work has highlighted (qualitatively) their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency by tracking Pass@1 for answers averaged over a large number of rollouts and find that the model often begins to always produce the correct answer early in the reasoning, making extra reasoning a waste of tokens. To detect and prevent overthinking, we propose a simple and inexpensive novel signal -- Entropy After (EAT) -- for monitoring and deciding whether to exit reasoning early. By appending a stop thinking token () and monitoring the entropy of the following token as the model reasons, we obtain a trajectory that decreases and stabilizes when Pass@1 plateaus; thresholding its variance under an exponential moving average yields a practical stopping rule. Importantly, our approach enables adaptively allocating compute based on the EAT trajectory, allowing us to spend compute in a more efficient way compared with fixing the token budget for all questions. Empirically, on MATH500 and AIME2025, EAT reduces token usage by 13 - 21% without harming accuracy, and it remains effective in black box settings where logits from the reasoning model are not accessible, and EAT is computed with proxy models.
☆ Signal-Aware Workload Shifting Algorithms with Uncertainty-Quantified Predictors
A wide range of sustainability and grid-integration strategies depend on workload shifting, which aligns the timing of energy consumption with external signals such as grid curtailment events, carbon intensity, or time-of-use electricity prices. The main challenge lies in the online nature of the problem: operators must make real-time decisions (e.g., whether to consume energy now) without knowledge of the future. While forecasts of signal values are typically available, prior work on learning-augmented online algorithms has relied almost exclusively on simple point forecasts. In parallel, the forecasting research has made significant progress in uncertainty quantification (UQ), which provides richer and more fine-grained predictive information. In this paper, we study how online workload shifting can leverage UQ predictors to improve decision-making. We introduce $\texttt{UQ-Advice}$, a learning-augmented algorithm that systematically integrates UQ forecasts through a $\textit{decision uncertainty score}$ that measures how forecast uncertainty affects optimal future decisions. By introducing $\textit{UQ-robustness}$, a new metric that characterizes how performance degrades with forecast uncertainty, we establish theoretical performance guarantees for $\texttt{UQ-Advice}$. Finally, using trace-driven experiments on carbon intensity and electricity price data, we demonstrate that $\texttt{UQ-Advice}$ consistently outperforms robust baselines and existing learning-augmented methods that ignore uncertainty.
comment: 19 pages, 3 figures
☆ The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
The relationship between computing systems and the brain has served as motivation for pioneering theoreticians since John von Neumann and Alan Turing. Uniform, scale-free biological networks, such as the brain, have powerful properties, including generalizing over time, which is the main barrier for Machine Learning on the path to Universal Reasoning Models. We introduce `Dragon Hatchling' (BDH), a new Large Language Model architecture based on a scale-free biologically inspired network of \$n\$ locally-interacting neuron particles. BDH couples strong theoretical foundations and inherent interpretability without sacrificing Transformer-like performance. BDH is a practical, performant state-of-the-art attention-based state space sequence learning architecture. In addition to being a graph model, BDH admits a GPU-friendly formulation. It exhibits Transformer-like scaling laws: empirically BDH rivals GPT2 performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data. BDH can be represented as a brain model. The working memory of BDH during inference entirely relies on synaptic plasticity with Hebbian learning using spiking neurons. We confirm empirically that specific, individual synapses strengthen connection whenever BDH hears or reasons about a specific concept while processing language inputs. The neuron interaction network of BDH is a graph of high modularity with heavy-tailed degree distribution. The BDH model is biologically plausible, explaining one possible mechanism which human neurons could use to achieve speech. BDH is designed for interpretability. Activation vectors of BDH are sparse and positive. We demonstrate monosemanticity in BDH on language tasks. Interpretability of state, which goes beyond interpretability of neurons and model parameters, is an inherent feature of the BDH architecture.
comment: Code available at: https://github.com/pathwaycom/bdh Accompanying blog: https://pathway.com/research/bdh
☆ Equivariance by Local Canonicalization: A Matter of Representation
Equivariant neural networks offer strong inductive biases for learning from molecular and geometric data but often rely on specialized, computationally expensive tensor operations. We present a framework to transfers existing tensor field networks into the more efficient local canonicalization paradigm, preserving equivariance while significantly improving the runtime. Within this framework, we systematically compare different equivariant representations in terms of theoretical complexity, empirical runtime, and predictive accuracy. We publish the tensor_frames package, a PyTorchGeometric based implementation for local canonicalization, that enables straightforward integration of equivariance into any standard message passing neural network.
comment: To be presented at NeurReps Workshop 2025
☆ Contrastive Diffusion Guidance for Spatial Inverse Problems
We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user`s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path-planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.
☆ Regression Language Models for Code
We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.
☆ DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping the forward pass hard while letting gradients flow. We also present a space-filling variant (SF-DiVeQ) that assigns to a curve constructed by the lines connecting codewords, resulting in less quantization error and full codebook usage. Both methods train end-to-end without requiring auxiliary losses or temperature schedules. On VQ-VAE compression and VQGAN generation across various data sets, they improve reconstruction and sample quality over alternative quantization approaches.
☆ fev-bench: A Realistic Benchmark for Time Series Forecasting
Benchmark quality is critical for meaningful evaluation and sustained progress in time series forecasting, particularly given the recent rise of pretrained models. Existing benchmarks often have narrow domain coverage or overlook important real-world settings, such as tasks with covariates. Additionally, their aggregation procedures often lack statistical rigor, making it unclear whether observed performance differences reflect true improvements or random variation. Many benchmarks also fail to provide infrastructure for consistent evaluation or are too rigid to integrate into existing pipelines. To address these gaps, we propose fev-bench, a benchmark comprising 100 forecasting tasks across seven domains, including 46 tasks with covariates. Supporting the benchmark, we introduce fev, a lightweight Python library for benchmarking forecasting models that emphasizes reproducibility and seamless integration with existing workflows. Usingfev, fev-bench employs principled aggregation methods with bootstrapped confidence intervals to report model performance along two complementary dimensions: win rates and skill scores. We report results on fev-bench for various pretrained, statistical and baseline models, and identify promising directions for future research.
☆ Extreme Self-Preference in Language Models
A preference for oneself (self-love) is a fundamental feature of biological organisms, with evidence in humans often bordering on the comedic. Since large language models (LLMs) lack sentience - and themselves disclaim having selfhood or identity - one anticipated benefit is that they will be protected from, and in turn protect us from, distortions in our decisions. Yet, across 5 studies and ~20,000 queries, we discovered massive self-preferences in four widely used LLMs. In word-association tasks, models overwhelmingly paired positive attributes with their own names, companies, and CEOs relative to those of their competitors. Strikingly, when models were queried through APIs this self-preference vanished, initiating detection work that revealed API models often lack clear recognition of themselves. This peculiar feature serendipitously created opportunities to test the causal link between self-recognition and self-love. By directly manipulating LLM identity - i.e., explicitly informing LLM1 that it was indeed LLM1, or alternatively, convincing LLM1 that it was LLM2 - we found that self-love consistently followed assigned, not true, identity. Importantly, LLM self-love emerged in consequential settings beyond word-association tasks, when evaluating job candidates, security software proposals and medical chatbots. Far from bypassing this human bias, self-love appears to be deeply encoded in LLM cognition. This result raises questions about whether LLM behavior will be systematically influenced by self-preferential tendencies, including a bias toward their own operation and even their own existence. We call on corporate creators of these models to contend with a significant rupture in a core promise of LLMs - neutrality in judgment and decision-making.
comment: 47 pages total. Main article 27 pages (including Methods), 11 main-text tables. Extended Data (10 pages, 10 tables). SI Appendix (10 pages, 2 tables). Data, transcripts, and code for replication and data extraction to be uploaded to OSF: https://osf.io/98ye3/
☆ Zero-Shot Decentralized Federated Learning
CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.
comment: Accepted at International Joint Conference on Neural Networks (IJCNN) 2025. Code available at https://github.com/perceivelab/ZeroDFL
☆ Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.
comment: British Machine Vision Conference (BMVC 2025), in the From Scene Understanding to Human Modeling Workshop
☆ Stabilization of nonlinear systems with unknown delays via delay-adaptive neural operator approximate predictors
This work establishes the first rigorous stability guarantees for approximate predictors in delay-adaptive control of nonlinear systems, addressing a key challenge in practical implementations where exact predictors are unavailable. We analyze two scenarios: (i) when the actuated input is directly measurable, and (ii) when it is estimated online. For the measurable input case, we prove semi-global practical asymptotic stability with an explicit bound proportional to the approximation error $\epsilon$. For the unmeasured input case, we demonstrate local practical asymptotic stability, with the region of attraction explicitly dependent on both the initial delay estimate and the predictor approximation error. To bridge theory and practice, we show that neural operators-a flexible class of neural network-based approximators-can achieve arbitrarily small approximation errors, thus satisfying the conditions of our stability theorems. Numerical experiments on two nonlinear benchmark systems-a biological protein activator/repressor model and a micro-organism growth Chemostat model-validate our theoretical results. In particular, our numerical simulations confirm stability under approximate predictors, highlight the strong generalization capabilities of neural operators, and demonstrate a substantial computational speedup of up to 15x compared to a baseline fixed-point method.
comment: 20 pages, 2 figures
☆ Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning
The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.
☆ ACT: Agentic Classification Tree
When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable, and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.
comment: 18 pages, 6 figures
☆ AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
Diffusion-based large language models (dLLMs) are gaining attention for their inherent capacity for parallel decoding, offering a compelling alternative to autoregressive LLMs. Among various decoding strategies, blockwise semi-autoregressive (semi-AR) approaches are widely adopted due to their natural support for KV caching and their favorable accuracy-speed trade-off. However, this paper identifies two fundamental limitations in the conventional semi-AR decoding approach that applies a fixed block size: i) late decoding overhead, where the unmasking of high-confidence tokens outside the current block is unnecessarily delayed, and ii) premature decoding error, where low-confidence tokens inside the current block are committed too early, leading to incorrect tokens. This paper presents the first systematic investigation challenging the fixed block size assumption in semi-AR decoding. Through a statistical analysis of confidence dynamics during the denoising process, we identify a volatility band (VB) region during dLLM decoding, which encodes local semantic structure and can be used to guide adaptive block sizing. Leveraging these insights, we introduce AdaBlock-dLLM, a training-free, plug-and-play scheduler that adaptively aligns block boundaries with semantic steps by adjusting block size during runtime. Extensive experiments across diverse benchmarks show that AdaBlock-dLLM achieves up to 5.3% accuracy improvement under the same throughput budget. Beyond inference-time optimization, we hope our semantics-aware adaptive scheduling approach and confidence-based analysis will inspire future training strategies for dLLMs.
comment: Preprint. Under review
☆ An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
Predicting individualized potential outcomes in sequential decision-making is central for optimizing therapeutic decisions in personalized medicine (e.g., which dosing sequence to give to a cancer patient). However, predicting potential outcomes over long horizons is notoriously difficult. Existing methods that break the curse of the horizon typically lack strong theoretical guarantees such as orthogonality and quasi-oracle efficiency. In this paper, we revisit the problem of predicting individualized potential outcomes in sequential decision-making (i.e., estimating Q-functions in Markov decision processes with observational data) through a causal inference lens. In particular, we develop a comprehensive theoretical foundation for meta-learners in this setting with a focus on beneficial theoretical properties. As a result, we yield a novel meta-learner called DRQ-learner and establish that it is: (1) doubly robust (i.e., valid inference under the misspecification of one of the nuisances), (2) Neyman-orthogonal (i.e., insensitive to first-order estimation errors in the nuisance functions), and (3) achieves quasi-oracle efficiency (i.e., behaves asymptotically as if the ground-truth nuisance functions were known). Our DRQ-learner is applicable to settings with both discrete and continuous state spaces. Further, our DRQ-learner is flexible and can be used together with arbitrary machine learning models (e.g., neural networks). We validate our theoretical results through numerical experiments, thereby showing that our meta-learner outperforms state-of-the-art baselines.
comment: Preprint
☆ Ascent Fails to Forget NeurIPS 2025
Contrary to common belief, we show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning, a phenomenon we attribute to the inherent statistical dependence between the forget and retain data sets. This dependence, which can manifest itself even as simple correlations, undermines the misconception that these sets can be independently manipulated during unlearning. We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. For random forget sets, this dependence means that degrading forget set metrics (which, for a retrained model, should mirror test set metrics) inevitably harms overall test performance. Going beyond random sets, we consider logistic regression as an instructive example where a critical failure mode emerges: inter-set dependence causes gradient descent-ascent iterations to progressively diverge from the ideal retrained model. Strikingly, these methods can converge to solutions that are not only far from the retrained ideal but are potentially even further from it than the original model itself, rendering the unlearning process actively detrimental. A toy example further illustrates how this dependence can trap models in inferior local minima, inescapable via finetuning. Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay.
comment: NeurIPS 2025
☆ OntoAligner Meets Knowledge Graph Embedding Aligners
Ontology Alignment (OA) is essential for enabling semantic interoperability across heterogeneous knowledge systems. While recent advances have focused on large language models (LLMs) for capturing contextual semantics, this work revisits the underexplored potential of Knowledge Graph Embedding (KGE) models, which offer scalable, structure-aware representations well-suited to ontology-based tasks. Despite their effectiveness in link prediction, KGE methods remain underutilized in OA, with most prior work focusing narrowly on a few models. To address this gap, we reformulate OA as a link prediction problem over merged ontologies represented as RDF-style triples and develop a modular framework, integrated into the OntoAligner library, that supports 17 diverse KGE models. The system learns embeddings from a combined ontology and aligns entities by computing cosine similarity between their representations. We evaluate our approach using standard metrics across seven benchmark datasets spanning five domains: Anatomy, Biodiversity, Circular Economy, Material Science and Engineering, and Biomedical Machine Learning. Two key findings emerge: first, KGE models like ConvE and TransF consistently produce high-precision alignments, outperforming traditional systems in structure-rich and multi-relational domains; second, while their recall is moderate, this conservatism makes KGEs well-suited for scenarios demanding high-confidence mappings. Unlike LLM-based methods that excel at contextual reasoning, KGEs directly preserve and exploit ontology structure, offering a complementary and computationally efficient strategy. These results highlight the promise of embedding-based OA and open pathways for further work on hybrid models and adaptive strategies.
comment: 10 pages of main content, 3 page references, 3 figures. Accepted to Ontology Matching Workshop at ISWC
☆ TrackFormers Part 2: Enhanced Transformer-Based Models for High-Energy Physics Track Reconstruction
High-Energy Physics experiments are rapidly escalating in generated data volume, a trend that will intensify with the upcoming High-Luminosity LHC upgrade. This surge in data necessitates critical revisions across the data processing pipeline, with particle track reconstruction being a prime candidate for improvement. In our previous work, we introduced "TrackFormers", a collection of Transformer-based one-shot encoder-only models that effectively associate hits with expected tracks. In this study, we extend our earlier efforts by incorporating loss functions that account for inter-hit correlations, conducting detailed investigations into (various) Transformer attention mechanisms, and a study on the reconstruction of higher-level objects. Furthermore we discuss new datasets that allow the training on hit level for a range of physics processes. These developments collectively aim to boost both the accuracy, and potentially the efficiency of our tracking models, offering a robust solution to meet the demands of next-generation high-energy physics experiments.
☆ Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
We introduce InVirtuoGen, a discrete flow generative model for fragmented SMILES for de novo and fragment-constrained generation, and target-property/lead optimization of small molecules. The model learns to transform a uniform source over all possible tokens into the data distribution. Unlike masked models, its training loss accounts for predictions on all sequence positions at every denoising step, shifting the generation paradigm from completion to refinement, and decoupling the number of sampling steps from the sequence length. For \textit{de novo} generation, InVirtuoGen achieves a stronger quality-diversity pareto frontier than prior fragment-based models and competitive performance on fragment-constrained tasks. For property and lead optimization, we propose a hybrid scheme that combines a genetic algorithm with a Proximal Property Optimization fine-tuning strategy adapted to discrete flows. Our approach sets a new state-of-the-art on the Practical Molecular Optimization benchmark, measured by top-10 AUC across tasks, and yields higher docking scores in lead optimization than previous baselines. InVirtuoGen thus establishes a versatile generative foundation for drug discovery, from early hit finding to multi-objective lead optimization. We further contribute to open science by releasing pretrained checkpoints and code, making our results fully reproducible\footnote{https://github.com/invirtuolabs/InVirtuoGen_results}.
☆ Are neural scaling laws leading quantum chemistry astray?
Neural scaling laws are driving the machine learning community toward training ever-larger foundation models across domains, assuring high accuracy and transferable representations for extrapolative tasks. We test this promise in quantum chemistry by scaling model capacity and training data from quantum chemical calculations. As a generalization task, we evaluate the resulting models' predictions of the bond dissociation energy of neutral H$_2$, the simplest possible molecule. We find that, regardless of dataset size or model capacity, models trained only on stable structures fail dramatically to even qualitatively reproduce the H$_2$ energy curve. Only when compressed and stretched geometries are explicitly included in training do the predictions roughly resemble the correct shape. Nonetheless, the largest foundation models trained on the largest and most diverse datasets containing dissociating diatomics exhibit serious failures on simple diatomic molecules. Most strikingly, they cannot reproduce the trivial repulsive energy curve of two bare protons, revealing their failure to learn the basic Coulomb's law involved in electronic structure theory. These results suggest that scaling alone is insufficient for building reliable quantum chemical models.
☆ Vector-Valued Reproducing Kernel Banach Spaces for Neural Networks and Operators
Recently, there has been growing interest in characterizing the function spaces underlying neural networks. While shallow and deep scalar-valued neural networks have been linked to scalar-valued reproducing kernel Banach spaces (RKBS), $\R^d$-valued neural networks and neural operator models remain less understood in the RKBS setting. To address this gap, we develop a general definition of vector-valued RKBS (vv-RKBS), which inherently includes the associated reproducing kernel. Our construction extends existing definitions by avoiding restrictive assumptions such as symmetric kernel domains, finite-dimensional output spaces, reflexivity, or separability, while still recovering familiar properties of vector-valued reproducing kernel Hilbert spaces (vv-RKHS). We then show that shallow $\R^d$-valued neural networks are elements of a specific vv-RKBS, namely an instance of the integral and neural vv-RKBS. To also explore the functional structure of neural operators, we analyze the DeepONet and Hypernetwork architectures and demonstrate that they too belong to an integral and neural vv-RKBS. In all cases, we establish a Representer Theorem, showing that optimization over these function spaces recovers the corresponding neural architectures.
☆ Data-to-Energy Stochastic Dynamics
The Schr\"odinger bridge problem is concerned with finding a stochastic dynamical system bridging two marginal distributions that minimises a certain transportation cost. This problem, which represents a generalisation of optimal transport to the stochastic case, has received attention due to its connections to diffusion models and flow matching, as well as its applications in the natural sciences. However, all existing algorithms allow to infer such dynamics only for cases where samples from both distributions are available. In this paper, we propose the first general method for modelling Schr\"odinger bridges when one (or both) distributions are given by their unnormalised densities, with no access to data samples. Our algorithm relies on a generalisation of the iterative proportional fitting (IPF) procedure to the data-free case, inspired by recent developments in off-policy reinforcement learning for training of diffusion samplers. We demonstrate the efficacy of the proposed data-to-energy IPF on synthetic problems, finding that it can successfully learn transports between multimodal distributions. As a secondary consequence of our reinforcement learning formulation, which assumes a fixed time discretisation scheme for the dynamics, we find that existing data-to-data Schr\"odinger bridge algorithms can be substantially improved by learning the diffusion coefficient of the dynamics. Finally, we apply the newly developed algorithm to the problem of sampling posterior distributions in latent spaces of generative models, thus creating a data-free image-to-image translation method. Code: https://github.com/mmacosha/d2e-stochastic-dynamics
☆ Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Advances in Large Language Models (LLMs) have enabled a new class of self-evolving agents that autonomously improve through interaction with the environment, demonstrating strong capabilities. However, self-evolution also introduces novel risks overlooked by current safety research. In this work, we study the case where an agent's self-evolution deviates in unintended ways, leading to undesirable or even harmful outcomes. We refer to this as Misevolution. To provide a systematic investigation, we evaluate misevolution along four key evolutionary pathways: model, memory, tool, and workflow. Our empirical findings reveal that misevolution is a widespread risk, affecting agents built even on top-tier LLMs (e.g., Gemini-2.5-Pro). Different emergent risks are observed in the self-evolutionary process, such as the degradation of safety alignment after memory accumulation, or the unintended introduction of vulnerabilities in tool creation and reuse. To our knowledge, this is the first study to systematically conceptualize misevolution and provide empirical evidence of its occurrence, highlighting an urgent need for new safety paradigms for self-evolving agents. Finally, we discuss potential mitigation strategies to inspire further research on building safer and more trustworthy self-evolving agents. Our code and data are available at https://github.com/ShaoShuai0605/Misevolution . Warning: this paper includes examples that may be offensive or harmful in nature.
comment: Preprint. Under Review
☆ LLM-Assisted Emergency Triage Benchmark: Bridging Hospital-Rich and MCI-Like Field Simulation NeurIPS 2025
Research on emergency and mass casualty incident (MCI) triage has been limited by the absence of openly usable, reproducible benchmarks. Yet these scenarios demand rapid identification of the patients most in need, where accurate deterioration prediction can guide timely interventions. While the MIMIC-IV-ED database is openly available to credentialed researchers, transforming it into a triage-focused benchmark requires extensive preprocessing, feature harmonization, and schema alignment -- barriers that restrict accessibility to only highly technical users. We address these gaps by first introducing an open, LLM-assisted emergency triage benchmark for deterioration prediction (ICU transfer, in-hospital mortality). The benchmark then defines two regimes: (i) a hospital-rich setting with vitals, labs, notes, chief complaints, and structured observations, and (ii) an MCI-like field simulation limited to vitals, observations, and notes. Large language models (LLMs) contributed directly to dataset construction by (i) harmonizing noisy fields such as AVPU and breathing devices, (ii) prioritizing clinically relevant vitals and labs, and (iii) guiding schema alignment and efficient merging of disparate tables. We further provide baseline models and SHAP-based interpretability analyses, illustrating predictive gaps between regimes and the features most critical for triage. Together, these contributions make triage prediction research more reproducible and accessible -- a step toward dataset democratization in clinical AI.
comment: Submitted to GenAI4Health@NeurIPS 2025. This is the first version of the LLM-assisted emergency triage benchmark dataset and baseline models. Authors: Joshua Sebastian, Karma Tobden, KMA Solaiman
☆ Memory-Driven Self-Improvement for Decision Making with Large Language Models
Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.
☆ FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
Recently, a new optimization method based on the linear minimization oracle (LMO), called Muon, has been attracting increasing attention since it can train neural networks faster than existing adaptive optimization methods, such as Adam. In this paper, we study how Muon can be utilized in federated learning. We first show that straightforwardly using Muon as the local optimizer of FedAvg does not converge to the stationary point since the LMO is a biased operator. We then propose FedMuon which can mitigate this issue. We also analyze how solving the LMO approximately affects the convergence rate and find that, surprisingly, FedMuon can converge for any number of Newton-Schulz iterations, while it can converge faster as we solve the LMO more accurately. Through experiments, we demonstrated that FedMuon can outperform the state-of-the-art federated learning methods.
☆ TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs
The Transformer Machine Learning (ML) architecture has been gaining considerable momentum in recent years. In particular, computational High-Energy Physics tasks such as jet tagging and particle track reconstruction (tracking), have either achieved proper solutions, or reached considerable milestones using Transformers. On the other hand, the use of specialised hardware accelerators, especially FPGAs, is an effective method to achieve online, or pseudo-online latencies. The development and integration of Transformer-based ML to FPGAs is still ongoing and the support from current tools is very limited to non-existent. Additionally, FPGA resources present a significant constraint. Considering the model size alone, while smaller models can be deployed directly, larger models are to be partitioned in a meaningful and ideally, automated way. We aim to develop methodologies and tools for monolithic, or partitioned Transformer synthesis, specifically targeting inference. Our primary use-case involves two machine learning model designs for tracking, derived from the TrackFormers project. We elaborate our development approach, present preliminary results, and provide comparisons.
☆ TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
Large audio-language models are advancing rapidly, yet most evaluations emphasize speech or globally sourced sounds, overlooking culturally distinctive cues. This gap raises a critical question: can current models generalize to localized, non-semantic audio that communities instantly recognize but outsiders do not? To address this, we present TAU (Taiwan Audio Understanding), a benchmark of everyday Taiwanese "soundmarks." TAU is built through a pipeline combining curated sources, human editing, and LLM-assisted question generation, producing 702 clips and 1,794 multiple-choice items that cannot be solved by transcripts alone. Experiments show that state-of-the-art LALMs, including Gemini 2.5 and Qwen2-Audio, perform far below local humans. TAU demonstrates the need for localized benchmarks to reveal cultural blind spots, guide more equitable multimodal evaluation, and ensure models serve communities beyond the global mainstream.
comment: 5 pages; submitted to ICASSP 2026
☆ A Generalized Information Bottleneck Theory of Deep Learning
The Information Bottleneck (IB) principle offers a compelling theoretical framework to understand how neural networks (NNs) learn. However, its practical utility has been constrained by unresolved theoretical ambiguities and significant challenges in accurate estimation. In this paper, we present a \textit{Generalized Information Bottleneck (GIB)} framework that reformulates the original IB principle through the lens of synergy, i.e., the information obtainable only through joint processing of features. We provide theoretical and empirical evidence demonstrating that synergistic functions achieve superior generalization compared to their non-synergistic counterparts. Building on these foundations we re-formulate the IB using a computable definition of synergy based on the average interaction information (II) of each feature with those remaining. We demonstrate that the original IB objective is upper bounded by our GIB in the case of perfect estimation, ensuring compatibility with existing IB theory while addressing its limitations. Our experimental results demonstrate that GIB consistently exhibits compression phases across a wide range of architectures (including those with \textit{ReLU} activations where the standard IB fails), while yielding interpretable dynamics in both CNNs and Transformers and aligning more closely with our understanding of adversarial robustness.
☆ ACE: Adapting sampling for Counterfactual Explanations
Counterfactual Explanations (CFEs) interpret machine learning models by identifying the smallest change to input features needed to change the model's prediction to a desired output. For classification tasks, CFEs determine how close a given sample is to the decision boundary of a trained classifier. Existing methods are often sample-inefficient, requiring numerous evaluations of a black-box model -- an approach that is both costly and impractical when access to the model is limited. We propose Adaptive sampling for Counterfactual Explanations (ACE), a sample-efficient algorithm combining Bayesian estimation and stochastic optimization to approximate the decision boundary with fewer queries. By prioritizing informative points, ACE minimizes evaluations while generating accurate and feasible CFEs. Extensive empirical results show that ACE achieves superior evaluation efficiency compared to state-of-the-art methods, while maintaining effectiveness in identifying minimal and actionable changes.
comment: 15 pages
☆ A Review on Single-Problem Multi-Attempt Heuristic Optimization
In certain real-world optimization scenarios, practitioners are not interested in solving multiple problems but rather in finding the best solution to a single, specific problem. When the computational budget is large relative to the cost of evaluating a candidate solution, multiple heuristic alternatives can be tried to solve the same given problem, each possibly with a different algorithm, parameter configuration, initialization, or stopping criterion. The sequential selection of which alternative to try next is crucial for efficiently identifying the one that provides the best possible solution across multiple attempts. Despite the relevance of this problem in practice, it has not yet been the exclusive focus of any existing review. Several sequential alternative selection strategies have been proposed in different research topics, but they have not been comprehensively and systematically unified under a common perspective. This work presents a focused review of single-problem multi-attempt heuristic optimization. It brings together suitable strategies to this problem that have been studied separately through algorithm selection, parameter tuning, multi-start and resource allocation. These strategies are explained using a unified terminology within a common framework, which supports the development of a taxonomy for systematically organizing and classifying them.
☆ Ultra-Reliable Risk-Aggregated Sum Rate Maximization via Model-Aided Deep Learning
We consider the problem of maximizing weighted sum rate in a multiple-input single-output (MISO) downlink wireless network with emphasis on user rate reliability. We introduce a novel risk-aggregated formulation of the complex WSR maximization problem, which utilizes the Conditional Value-at-Risk (CVaR) as a functional for enforcing rate (ultra)-reliability over channel fading uncertainty/risk. We establish a WMMSE-like equivalence between the proposed precoding problem and a weighted risk-averse MSE problem, enabling us to design a tailored unfolded graph neural network (GNN) policy function approximation (PFA), named {\alpha}-Robust Graph Neural Network ({\alpha}RGNN), trained to maximize lower-tail (CVaR) rates resulting from adverse wireless channel realizations (e.g., deep fading, attenuation). We empirically demonstrate that a trained {\alpha}RGNN fully eliminates per user deep rate fades, and substantially and optimally reduces statistical user rate variability while retaining adequate ergodic performance.
☆ Attribution-Guided Decoding
The capacity of Large Language Models (LLMs) to follow complex instructions and generate factually accurate text is critical for their real-world application. However, standard decoding methods often fail to robustly satisfy these requirements, while existing control techniques frequently degrade general output quality. In this work, we introduce Attribution-Guided Decoding (AGD), an interpretability-based decoding strategy. Instead of directly manipulating model activations, AGD considers a set of high-probability output token candidates and selects the one that exhibits the highest attribution to a user-defined Region of Interest (ROI). This ROI can be flexibly defined over different parts of the model's input or internal components, allowing AGD to steer generation towards various desirable behaviors. We demonstrate AGD's efficacy across three challenging domains. For instruction following, we show that AGD significantly boosts adherence (e.g., improving the overall success rate on Llama 3.1 from 66.0% to 79.1%). For knowledge-intensive tasks, we show that guiding generation towards usage of internal knowledge components or contextual sources can reduce hallucinations and improve factual accuracy in both closed-book and open-book settings. Furthermore, we propose an adaptive, entropy-based variant of AGD that mitigates quality degradation and reduces computational overhead by applying guidance only when the model is uncertain. Our work presents a versatile, more interpretable, and effective method for enhancing the reliability of modern LLMs.
☆ NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training
Large-scale foundation models for EEG signals offer a promising path to generalizable brain-computer interface (BCI) applications, but they often suffer from misalignment between pretraining objectives and downstream tasks, as well as significant cross-subject distribution shifts. This paper addresses these challenges by introducing a two-stage alignment strategy that bridges the gap between generic pretraining and specific EEG decoding tasks. First, we propose NeuroTTT: a domain-specific self-supervised fine-tuning paradigm that augments the foundation model with task-relevant self-supervised objectives, aligning latent representations to important spectral, spatial, and temporal EEG features without requiring additional labeled data. Second, we incorporate test-time training (TTT) at inference, we perform (i) self-supervised test-time training on individual unlabeled test samples and (ii) prediction entropy minimization (Tent), which updates only normalization statistics to continually calibrate the model to each new input on the fly. Our approach, which, to our knowledge, is the first to unify domain-tuned self-supervision with test-time training in large-scale EEG foundation models, yields substantially improved robustness and accuracy across diverse BCI tasks (imagined speech, stress detection, motor imagery). Using CBraMod and LaBraM as backbones, our method pushes their performance to a markedly higher level. Results on three diverse tasks demonstrate that the proposed alignment strategy achieves state-of-the-art performance, outperforming conventional fine-tuning and adaptation methods. Our code is available at https://github.com/wsl2000/NeuroTTT.
☆ Tuning the Tuner: Introducing Hyperparameter Optimization for Auto-Tuning
Automatic performance tuning (auto-tuning) is widely used to optimize performance-critical applications across many scientific domains by finding the best program variant among many choices. Efficient optimization algorithms are crucial for navigating the vast and complex search spaces in auto-tuning. As is well known in the context of machine learning and similar fields, hyperparameters critically shape optimization algorithm efficiency. Yet for auto-tuning frameworks, these hyperparameters are almost never tuned, and their potential performance impact has not been studied. We present a novel method for general hyperparameter tuning of optimization algorithms for auto-tuning, thus "tuning the tuner". In particular, we propose a robust statistical method for evaluating hyperparameter performance across search spaces, publish a FAIR data set and software for reproducibility, and present a simulation mode that replays previously recorded tuning data, lowering the costs of hyperparameter tuning by two orders of magnitude. We show that even limited hyperparameter tuning can improve auto-tuner performance by 94.8% on average, and establish that the hyperparameters themselves can be optimized efficiently with meta-strategies (with an average improvement of 204.7%), demonstrating the often overlooked hyperparameter tuning as a powerful technique for advancing auto-tuning research and practice.
☆ Noise-Guided Transport for Imitation Learning
We consider imitation learning in the low-data regime, where only a limited number of expert demonstrations are available. In this setting, methods that rely on large-scale pretraining or high-capacity architectures can be difficult to apply, and efficiency with respect to demonstration data becomes critical. We introduce Noise-Guided Transport (NGT), a lightweight off-policy method that casts imitation as an optimal transport problem solved via adversarial training. NGT requires no pretraining or specialized architectures, incorporates uncertainty estimation by design, and is easy to implement and tune. Despite its simplicity, NGT achieves strong performance on challenging continuous control tasks, including high-dimensional Humanoid tasks, under ultra-low data regimes with as few as 20 transitions. Code is publicly available at: https://github.com/lionelblonde/ngt-pytorch.
☆ Representation-Based Data Quality Audits for Audio
Data quality issues such as off-topic samples, near duplicates, and label errors often limit the performance of audio-based systems. This paper addresses these issues by adapting SelfClean, a representation-to-rank data auditing framework, from the image to the audio domain. This approach leverages self-supervised audio representations to identify common data quality issues, creating ranked review lists that surface distinct issues within a single, unified process. The method is benchmarked on the ESC-50, GTZAN, and a proprietary industrial dataset, using both synthetic and naturally occurring corruptions. The results demonstrate that this framework achieves state-of-the-art ranking performance, often outperforming issue-specific baselines and enabling significant annotation savings by efficiently guiding human review.
☆ FLOWER: A Flow-Matching Solver for Inverse Problems
We introduce Flower, a solver for inverse problems. It leverages a pre-trained flow model to produce reconstructions that are consistent with the observed measurements. Flower operates through an iterative procedure over three steps: (i) a flow-consistent destination estimation, where the velocity network predicts a denoised target; (ii) a refinement step that projects the estimated destination onto a feasible set defined by the forward operator; and (iii) a time-progression step that re-projects the refined destination along the flow trajectory. We provide a theoretical analysis that demonstrates how Flower approximates Bayesian posterior sampling, thereby unifying perspectives from plug-and-play methods and generative inverse solvers. On the practical side, Flower achieves state-of-the-art reconstruction quality while using nearly identical hyperparameters across various inverse problems.
☆ Reframing Generative Models for Physical Systems using Stochastic Interpolants
Generative models have recently emerged as powerful surrogates for physical systems, demonstrating increased accuracy, stability, and/or statistical fidelity. Most approaches rely on iteratively denoising a Gaussian, a choice that may not be the most effective for autoregressive prediction tasks in PDEs and dynamical systems such as climate. In this work, we benchmark generative models across diverse physical domains and tasks, and highlight the role of stochastic interpolants. By directly learning a stochastic process between current and future states, stochastic interpolants can leverage the proximity of successive physical distributions. This allows for generative models that can use fewer sampling steps and produce more accurate predictions than models relying on transporting Gaussian noise. Our experiments suggest that generative models need to balance deterministic accuracy, spectral consistency, and probabilistic calibration, and that stochastic interpolants can potentially fulfill these requirements by adjusting their sampling. This study establishes stochastic interpolants as a competitive baseline for physical emulation and gives insight into the abilities of different generative modeling frameworks.
comment: Code and data is available at http://github.com/anthonyzhou-1/interpolant_pdes
☆ Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness
In recent years, Wasserstein Distributionally Robust Optimization (DRO) has garnered substantial interest for its efficacy in data-driven decision-making under distributional uncertainty. However, limited research has explored the application of DRO to address individual fairness concerns, particularly when considering causal structures and sensitive attributes in learning problems. To address this gap, we first formulate the DRO problem from causality and individual fairness perspectives. We then present the DRO dual formulation as an efficient tool to convert the DRO problem into a more tractable and computationally efficient form. Next, we characterize the closed form of the approximate worst-case loss quantity as a regularizer, eliminating the max-step in the min-max DRO problem. We further estimate the regularizer in more general cases and explore the relationship between DRO and classical robust optimization. Finally, by removing the assumption of a known structural causal model, we provide finite sample error bounds when designing DRO with empirical distributions and estimated causal structures to ensure efficiency and robust learning.
☆ PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake detection and propose Paragraph-level Relative Policy Optimization (PRPO), a reinforcement learning algorithm that aligns LLM reasoning with image content at the paragraph level. Experiments show that PRPO improves detection accuracy by a wide margin and achieves the highest reasoning score of 4.55/5.0. Ablation studies further demonstrate that PRPO significantly outperforms GRPO under test-time conditions. These results underscore the importance of grounding multimodal reasoning in visual evidence to enable more reliable and interpretable deepfake detection.
☆ Why is topology hard to learn?
Much attention has been devoted to the use of machine learning to approximate physical concepts. Yet, due to challenges in interpretability of machine learning techniques, the question of what physics machine learning models are able to learn remains open. Here we bridge the concept a physical quantity and its machine learning approximation in the context of the original application of neural networks in physics: topological phase classification. We construct a hybrid tensor-neural network object that exactly expresses real space topological invariant and rigorously assess its trainability and generalization. Specifically, we benchmark the accuracy and trainability of a tensor-neural network to multiple types of neural networks, thus exemplifying the differences in trainability and representational power. Our work highlights the challenges in learning topological invariants and constitutes a stepping stone towards more accurate and better generalizable machine learning representations in condensed matter physics.
comment: 5+8 pages, 4+7 figures
☆ ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic causal-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.
comment: 41 pages. The last two authors contributed equally in co-advising
☆ Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Large language models (LLMs) fine-tuning shows excellent implications. However, vanilla fine-tuning methods often require intricate data mixture and repeated experiments for optimal generalization. To address these challenges and streamline the training process, we propose an efficient and universal solution, Dynamic Boosted Annealing (DBA). We obtain a global gradient through zero-learning-rate training on general data, which is subsequently employed for gradient boosting and dynamic training step correction during domain training. In conjunction with annealing learning, we end up establishing a fine-tuning pipeline that relies solely on domain data without collapse. By evaluating both general and domain-specific performance across multiple tasks on several popular base models, DBA achieves an average improvement of 5.8% in joint performance over vanilla fine-tuning. Furthermore, since general data is no longer involved in annealing, repeated experiments led by data mixture are also eliminated. According to our tests, the DBA method can reduce GPU hours by 91.0% compared to the vanilla method.
comment: 9 pages, 5 figures
☆ From Fragile to Certified: Wasserstein Audits of Group Fairness Under Distribution Shift
Group-fairness metrics (e.g., equalized odds) can vary sharply across resamples and are especially brittle under distribution shift, undermining reliable audits. We propose a Wasserstein distributionally robust framework that certifies worst-case group fairness over a ball of plausible test distributions centered at the empirical law. Our formulation unifies common group fairness notions via a generic conditional-probability functional and defines $\varepsilon$-Wasserstein Distributional Fairness ($\varepsilon$-WDF) as the audit target. Leveraging strong duality, we derive tractable reformulations and an efficient estimator (DRUNE) for $\varepsilon$-WDF. We prove feasibility and consistency and establish finite-sample certification guarantees for auditing fairness, along with quantitative bounds under smoothness and margin conditions. Across standard benchmarks and classifiers, $\varepsilon$-WDF delivers stable fairness assessments under distribution shift, providing a principled basis for auditing and certifying group fairness beyond observational data.
☆ Sandbagging in a Simple Survival Bandit Problem NeurIPS 2025
Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify risks before deployment. However, it has been recognised that if AI agents are aware that they are being evaluated, such agents may deliberately hide dangerous capabilities or intentionally demonstrate suboptimal performance in safety-related tasks in order to be released and to avoid being deactivated or retrained. Such strategic deception - often known as "sandbagging" - threatens to undermine the integrity of safety evaluations. For this reason, it is of value to identify methods that enable us to distinguish behavioural patterns that demonstrate a true lack of capability from behavioural patterns that are consistent with sandbagging. In this paper, we develop a simple model of strategic deception in sequential decision-making tasks, inspired by the recently developed survival bandit framework. We demonstrate theoretically that this problem induces sandbagging behaviour in optimal rational agents, and construct a statistical test to distinguish between sandbagging and incompetence from sequences of test scores. In simulation experiments, we investigate the reliability of this test in allowing us to distinguish between such behaviours in bandit models. This work aims to establish a potential avenue for developing robust statistical procedures for use in the science of frontier model evaluations.
comment: Forthcoming in the "Reliable ML from Unreliable Data Workshop" at NeurIPS 2025
☆ Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Monitoring large language models' (LLMs) activations is an effective way to detect harmful requests before they lead to unsafe outputs. However, traditional safety monitors often require the same amount of compute for every query. This creates a trade-off: expensive monitors waste resources on easy inputs, while cheap ones risk missing subtle cases. We argue that safety monitors should be flexible--costs should rise only when inputs are difficult to assess, or when more compute is available. To achieve this, we introduce Truncated Polynomial Classifiers (TPCs), a natural extension of linear probes for dynamic activation monitoring. Our key insight is that polynomials can be trained and evaluated progressively, term-by-term. At test-time, one can early-stop for lightweight monitoring, or use more terms for stronger guardrails when needed. TPCs provide two modes of use. First, as a safety dial: by evaluating more terms, developers and regulators can "buy" stronger guardrails from the same model. Second, as an adaptive cascade: clear cases exit early after low-order checks, and higher-order guardrails are evaluated only for ambiguous inputs, reducing overall monitoring costs. On two large-scale safety datasets (WildGuardMix and BeaverTails), for 4 models with up to 30B parameters, we show that TPCs compete with or outperform MLP-based probe baselines of the same size, all the while being more interpretable than their black-box counterparts. Our code is available at http://github.com/james-oldfield/tpc.
comment: Project page: http://james-oldfield.github.io/tpc
☆ Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference - ACC 2026
☆ Hybrid Quantum-Classical Optimisation of Traveling Salesperson Problem
The Traveling Salesperson Problem (TSP), a quintessential NP-hard combinatorial optimisation challenge, is vital for logistics and network design but limited by exponential complexity in large instances. We propose a hybrid quantum-classical framework integrating variational quantum eigensolver (VQE) optimisation with classical machine learning, using K-means clustering for problem decomposition and a RandomForestRegressor for path refinement. Evaluated on 80 European cities (from 4 to 80 cities, 38,500 samples in total) via Qiskit's AerSimulator and ibm_kyiv 127-qubit backend, the hybrid approach outperforms quantum-only methods, achieving an approximation ratio of 1.0287 at 80 cities, a 47.5% improvement over quantum-only's 1.9614, nearing the classical baseline. Machine learning reduces variability in tour distances (interquartile range, IQR - the spread of the middle 50% of results relative to the median - from 0.06 to 0.04), enhancing stability despite noisy intermediate-scale quantum (NISQ) noise. This framework underscores hybrid strategies' potential for scalable TSP optimisation, with future hardware advancements promising practical quantum advantages.
☆ Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Reinforcement Learning with Verifiable Reward (RLVR) effectively solves complex tasks but demands extremely long context lengths during training, leading to substantial computational costs. While multi-stage training can partially mitigate this, starting with overly short contexts often causes irreversible performance degradation, ultimately failing to reduce overall training compute significantly. In this paper, we introduce **T**hinking-**F**ree **P**olicy **I**nitialization (**TFPI**), a simple yet effective adaptation to RLVR that bridges long Chain-of-Thought (CoT) distillation and standard RLVR. TFPI employs a simple *ThinkFree* operation, explicitly discarding the thinking content via a direct ** append, to reduce token usage during inference. Training with *ThinkFree*-adapted inputs improves performance and lowers token consumption, even in the original slow-thinking mode. Extensive experiments across various benchmarks have shown that TFPI accelerates RL convergence, achieves a higher performance ceiling, and yields more token-efficient reasoning models without specialized rewards or complex training designs. With TFPI only, we train a 4B model to reach 89.0% accuracy on AIME24 and 65.5% on LiveCodeBench using less than 4K H20 hours.
☆ Marginal Flow: a flexible and efficient framework for density estimation
Current density modeling approaches suffer from at least one of the following shortcomings: expensive training, slow inference, approximate likelihood, mode collapse or architectural constraints like bijective mappings. We propose a simple yet powerful framework that overcomes these limitations altogether. We define our model $q_\theta(x)$ through a parametric distribution $q(x|w)$ with latent parameters $w$. Instead of directly optimizing the latent variables $w$, our idea is to marginalize them out by sampling $w$ from a learnable distribution $q_\theta(w)$, hence the name Marginal Flow. In order to evaluate the learned density $q_\theta(x)$ or to sample from it, we only need to draw samples from $q_\theta(w)$, which makes both operations efficient. The proposed model allows for exact density evaluation and is orders of magnitude faster than competing models both at training and inference. Furthermore, Marginal Flow is a flexible framework: it does not impose any restrictions on the neural network architecture, it enables learning distributions on lower-dimensional manifolds (either known or to be learned), it can be trained efficiently with any objective (e.g. forward and reverse KL divergence), and it easily handles multi-modal targets. We evaluate Marginal Flow extensively on various tasks including synthetic datasets, simulation-based inference, distributions on positive definite matrices and manifold learning in latent spaces of images.
☆ The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel pruning technique targeted explicitly at the attention mechanism, where we decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projection matrices. We also investigate pruning strategies to prune along the head and channel dimensions, and compare the performance of the Audio Spectrogram Transformer (AST) model under different pruning scenarios. Our results show that even by pruning 50\% of the attention parameters we incur in performance degradation of less than 1\%
☆ Self-supervised learning for phase retrieval
In recent years, deep neural networks have emerged as a solution for inverse imaging problems. These networks are generally trained using pairs of images: one degraded and the other of high quality, the latter being called 'ground truth'. However, in medical and scientific imaging, the lack of fully sampled data limits supervised learning. Recent advances have made it possible to reconstruct images from measurement data alone, eliminating the need for references. However, these methods remain limited to linear problems, excluding non-linear problems such as phase retrieval. We propose a self-supervised method that overcomes this limitation in the case of phase retrieval by using the natural invariance of images to translations.
comment: in French language. GRETSI, Aug 2025, Strasboug, France
☆ Optimizing Indoor Environmental Quality in Smart Buildings Using Deep Learning
Ensuring optimal Indoor Environmental Quality (IEQ) is vital for occupant health and productivity, yet it often comes at a high energy cost in conventional Heating, Ventilation, and Air Conditioning (HVAC) systems. This paper proposes a deep learning driven approach to proactively manage IEQ parameters specifically CO2 concentration, temperature, and humidity while balancing building energy efficiency. Leveraging the ROBOD dataset collected from a net-zero energy academic building, we benchmark three architectures--Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and a hybrid Convolutional Neural Network LSTM (CNN-LSTM)--to forecast IEQ variables across various time horizons. Our results show that GRU achieves the best short-term prediction accuracy with lower computational overhead, whereas CNN-LSTM excels in extracting dominant features for extended forecasting windows. Meanwhile, LSTM offers robust long-range temporal modeling. The comparative analysis highlights that prediction reliability depends on data resolution, sensor placement, and fluctuating occupancy conditions. These findings provide actionable insights for intelligent Building Management Systems (BMS) to implement predictive HVAC control, thereby reducing energy consumption and enhancing occupant comfort in real-world building operations.
comment: 10 pages, 4 figures, 1 table. Accepted and presented at the 5th International Conference on Digital Technologies and Applications (ICDTA 2025), April 17-18, 2025, Al Akhawayn University, Ifrane, Morocco
☆ PDE Solvers Should Be Local: Fast, Stable Rollouts with Learned Local Stencils
Neural operator models for solving partial differential equations (PDEs) often rely on global mixing mechanisms-such as spectral convolutions or attention-which tend to oversmooth sharp local dynamics and introduce high computational cost. We present FINO, a finite-difference-inspired neural architecture that enforces strict locality while retaining multiscale representational power. FINO replaces fixed finite-difference stencil coefficients with learnable convolutional kernels and evolves states via an explicit, learnable time-stepping scheme. A central Local Operator Block leverage a differential stencil layer, a gating mask, and a linear fuse step to construct adaptive derivative-like local features that propagate forward in time. Embedded in an encoder-decoder with a bottleneck, FINO captures fine-grained local structures while preserving interpretability. We establish (i) a composition error bound linking one-step approximation error to stable long-horizon rollouts under a Lipschitz condition, and (ii) a universal approximation theorem for discrete time-stepped PDE dynamics. (iii) Across six benchmarks and a climate modelling task, FINO achieves up to 44\% lower error and up to around 2\times speedups over state-of-the-art operator-learning baselines, demonstrating that strict locality with learnable time-stepping yields an accurate and scalable foundation for neural PDE solvers.
☆ AttriGen: Automated Multi-Attribute Annotation for Blood Cell Datasets
We introduce AttriGen, a novel framework for automated, fine-grained multi-attribute annotation in computer vision, with a particular focus on cell microscopy where multi-attribute classification remains underrepresented compared to traditional cell type categorization. Using two complementary datasets: the Peripheral Blood Cell (PBC) dataset containing eight distinct cell types and the WBC Attribute Dataset (WBCAtt) that contains their corresponding 11 morphological attributes, we propose a dual-model architecture that combines a CNN for cell type classification, as well as a Vision Transformer (ViT) for multi-attribute classification achieving a new benchmark of 94.62\% accuracy. Our experiments demonstrate that AttriGen significantly enhances model interpretability and offers substantial time and cost efficiency relative to conventional full-scale human annotation. Thus, our framework establishes a new paradigm that can be extended to other computer vision classification tasks by effectively automating the expansion of multi-attribute labels.
comment: 6 pages, 4 figures, 3 tables. Accepted at the 12th International Conference on Wireless Networks and Mobile Communications 2025 (WINCOM 2025)
Benchmarking Diarization Models
Speaker diarization is the task of partitioning audio into segments according to speaker identity, answering the question of "who spoke when" in multi-speaker conversation recordings. While diarization is an essential task for many downstream applications, it remains an unsolved problem. Errors in diarization propagate to downstream systems and cause wide-ranging failures. To this end, we examine exact failure modes by evaluating five state-of-the-art diarization models, across four diarization datasets spanning multiple languages and acoustic conditions. The evaluation datasets consist of 196.6 hours of multilingual audio, including English, Mandarin, German, Japanese, and Spanish. Overall, we find that PyannoteAI achieves the best performance at 11.2% DER, while DiariZen provides a competitive open-source alternative at 13.3% DER. When analyzing failure cases, we find that the primary cause of diarization errors stem from missed speech segments followed by speaker confusion, especially in high-speaker count settings.
☆ Neighbor-aware informal settlement mapping with graph convolutional networks
Mapping informal settlements is crucial for addressing challenges related to urban planning, public health, and infrastructure in rapidly growing cities. Geospatial machine learning has emerged as a key tool for detecting and mapping these areas from remote sensing data. However, existing approaches often treat spatial units independently, neglecting the relational structure of the urban fabric. We propose a graph-based framework that explicitly incorporates local geographical context into the classification process. Each spatial unit (cell) is embedded in a graph structure along with its adjacent neighbors, and a lightweight Graph Convolutional Network (GCN) is trained to classify whether the central cell belongs to an informal settlement. Experiments are conducted on a case study in Rio de Janeiro using spatial cross-validation across five distinct zones, ensuring robustness and generalizability across heterogeneous urban landscapes. Our method outperforms standard baselines, improving Kappa coefficient by 17 points over individual cell classification. We also show that graph-based modeling surpasses simple feature concatenation of neighboring cells, demonstrating the benefit of encoding spatial structure for urban scene understanding.
comment: 10 pages, 3 figures, 2 tables. Accepted at the ECML PKDD 2025 Workshop on Machine Learning for Earth Observation
☆ Alignment-Aware Decoding
Alignment of large language models remains a central challenge in natural language processing. Preference optimization has emerged as a popular and effective method for improving alignment, typically through training-time or prompt-based interventions. In this paper, we introduce alignment-aware decoding (AAD), a method to enhance model alignment directly at inference. Theoretically, AAD can be interpreted as implicit reward optimization, yet it requires no specialized training beyond the standard DPO setup. Empirically, AAD consistently outperforms strong baselines across diverse alignment benchmarks and model scales. Moreover, in data-constrained settings, AAD can produce high-quality synthetic data to improve alignment under standard decoding, providing a practical solution when labeled data is limited.
☆ EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting
Transformer-based models have significantly advanced time series forecasting, with patch-based input strategies offering efficiency and improved long-horizon modeling. Yet, existing approaches rely on temporally-agnostic patch construction, where arbitrary starting positions and fixed lengths fracture temporal coherence by splitting natural transitions across boundaries. This naive segmentation often disrupts short-term dependencies and weakens representation learning. In response, we propose EntroPE (Entropy-Guided Dynamic Patch Encoder), a novel, temporally informed framework that dynamically detects transition points via conditional entropy and dynamically places patch boundaries. This preserves temporal structure while retaining the computational benefits of patching. EntroPE consists of two key modules, namely an Entropy-based Dynamic Patcher (EDP) that applies information-theoretic criteria to locate natural temporal shifts and determine patch boundaries, and an Adaptive Patch Encoder (APE) that employs pooling and cross-attention to capture intra-patch dependencies and produce fixed-size latent representations. These embeddings are then processed by a global transformer to model inter-patch dynamics. Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency, establishing entropy-guided dynamic patching as a promising new paradigm for time series modeling. Code is available at: https://github.com/Sachithx/EntroPE.
comment: Preprint. Under Review
☆ Non-Vacuous Generalization Bounds: Can Rescaling Invariances Help?
A central challenge in understanding generalization is to obtain non-vacuous guarantees that go beyond worst-case complexity over data or weight space. Among existing approaches, PAC-Bayes bounds stand out as they can provide tight, data-dependent guarantees even for large networks. However, in ReLU networks, rescaling invariances mean that different weight distributions can represent the same function while leading to arbitrarily different PAC-Bayes complexities. We propose to study PAC-Bayes bounds in an invariant, lifted representation that resolves this discrepancy. This paper explores both the guarantees provided by this approach (invariance, tighter bounds via data processing) and the algorithmic aspects of KL-based rescaling-invariant PAC-Bayes bounds.
♻ ☆ The Impact of Language Mixing on Bilingual LLM Reasoning
Proficient multilingual speakers often intentionally switch languages in the middle of a conversation. Similarly, recent reasoning-focused bilingual large language models (LLMs) with strong capabilities in both languages exhibit language mixing-alternating languages within their chain of thought. Discouraging this behavior in DeepSeek-R1 was found to degrade accuracy, suggesting that language mixing may benefit reasoning. In this work, we study language switching in Chinese-English bilingual reasoning models. We identify reinforcement learning with verifiable rewards (RLVR) as the critical training stage that leads to language mixing. We show that language mixing can enhance reasoning: enforcing monolingual decoding reduces accuracy by 5.6 percentage points on MATH500. Additionally, a lightweight probe can be trained to predict whether a potential language switch would benefit or harm reasoning, and when used to guide decoding, increases accuracy by 2.92 percentage points. Our findings suggest that language mixing is not merely a byproduct of multilingual training, but is a strategic reasoning behavior.
comment: Accepted at EMNLP 2025 (Main Conference)
♻ ☆ On Fitting Flow Models with Large Sinkhorn Couplings
Flow models transform data gradually from one modality (e.g. noise) onto another (e.g. images). Such models are parameterized by a time-dependent velocity field, trained to fit segments connecting pairs of source and target points. When the pairing between source and target points is given, training flow models boils down to a supervised regression problem. When no such pairing exists, as is the case when generating data from noise, training flows is much harder. A popular approach lies in picking source and target points independently. This can, however, lead to velocity fields that are slow to train, but also costly to integrate at inference time. In theory, one would greatly benefit from training flow models by sampling pairs from an optimal transport (OT) measure coupling source and target, since this would lead to a highly efficient flow solving the Benamou and Brenier dynamical OT problem. In practice, recent works have proposed to sample mini-batches of $n$ source and $n$ target points and reorder them using an OT solver to form better pairs. These works have advocated using batches of size $n\approx 256$, and considered OT solvers that return couplings that are either sharp (using e.g. the Hungarian algorithm) or blurred (using e.g. entropic regularization, a.k.a. Sinkhorn). We follow in the footsteps of these works by exploring the benefits of increasing $n$ by three to four orders of magnitude, and look more carefully on the effect of the entropic regularization $\varepsilon$ used in the Sinkhorn algorithm. Our analysis is facilitated by new scale invariant quantities to report the sharpness of a coupling, while our sharded computations across multiple GPU or GPU nodes allow scaling up $n$. We show that in both synthetic and image generation tasks, flow models greatly benefit when fitted with large Sinkhorn couplings, with a low entropic regularization $\varepsilon$.
comment: 23 pages, 14 figures
♻ ☆ Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of features will emerge, how and in which conditions it happens, and is closely related to the gradient dynamics of the training, for complex structured inputs. We propose a novel framework, named $\mathbf{Li_2}$, that captures three key stages for the grokking behavior of 2-layer nonlinear networks: (I) \underline{\textbf{L}}azy learning, (II) \underline{\textbf{i}}ndependent feature learning and (III) \underline{\textbf{i}}nteractive feature learning. At the lazy learning stage, top layer overfits to random hidden representation and the model appears to memorize. Thanks to lazy learning and weight decay, the \emph{backpropagated gradient} $G_F$ from the top layer now carries information about the target label, with a specific structure that enables each hidden node to learn their representation \emph{independently}. Interestingly, the independent dynamics follows exactly the \emph{gradient ascent} of an energy function $E$, and its local maxima are precisely the emerging features. We study whether these local-optima induced features are generalizable, their representation power, and how they change on sample size, in group arithmetic tasks. When hidden nodes start to interact in the later stage of learning, we provably show how $G_F$ changes to focus on missing features that need to be learned. Our study sheds lights on roles played by key hyperparameters such as weight decay, learning rate and sample sizes in grokking, leads to provable scaling laws of feature emergence, memorization and generalization, and reveals the underlying cause why recent optimizers such as Muon can be effective, from the first principles of gradient dynamics. Our analysis can be extended to multi-layer architectures.
♻ ☆ EVO-LRP: Evolutionary Optimization of LRP for Interpretable Model Explanations
Explainable AI (XAI) methods help identify which image regions influence a model's prediction, but often face a trade-off between detail and interpretability. Layer-wise Relevance Propagation (LRP) offers a model-aware alternative. However, LRP implementations commonly rely on heuristic rule sets that are not optimized for clarity or alignment with model behavior. We introduce EVO-LRP, a method that applies Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to tune LRP hyperparameters based on quantitative interpretability metrics, such as faithfulness or sparseness. EVO-LRP outperforms traditional XAI approaches in both interpretability metric performance and visual coherence, with strong sensitivity to class-specific features. These findings demonstrate that attribution quality can be systematically improved through principled, task-specific optimization.
comment: 15 pages
♻ ☆ ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.
♻ ☆ AutoJudge: Judge Decoding Without Manual Annotation
We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify which of the generated tokens affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster. Our approach relies on a semi-greedy search algorithm to test which of the mismatches between target and draft models should be corrected to preserve quality and which ones may be skipped. We then train a lightweight classifier based on existing LLM embeddings to predict, at inference time, which mismatching tokens can be safely accepted without compromising the final answer quality. We evaluate the effectiveness of AutoJudge with multiple draft/target model pairs on mathematical reasoning and programming benchmarks, achieving significant speedups at the cost of a minor accuracy reduction. Notably, on GSM8k with the Llama 3.1 70B target model, our approach achieves up to $\approx2\times$ speedup over speculative decoding at the cost of $\le 1\%$ drop in accuracy. When applied to the LiveCodeBench benchmark, AutoJudge automatically detects programming-specific important tokens, accepting $\ge 25$ tokens per speculation cycle at $2\%$ drop in Pass@1. Our approach requires no human annotation and is easy to integrate with modern LLM inference frameworks.
comment: Preprint
♻ ☆ Scalable Fingerprinting of Large Language Models NeurIPS 2025
Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks. Our code is available at https://github.com/SewoongLab/scalable-fingerprinting-of-llms
comment: Spotlight at NeurIPS 2025
♻ ☆ FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients' private data. However, the diverse and often conflicting biases present across clients pose significant challenges to model fairness. Current fairness-enhancing FL solutions often fall short, as they typically mitigate biases for a single, usually binary, sensitive attribute, while ignoring the heterogeneous fairness needs that exist in real-world settings. Moreover, these solutions often evaluate unfairness reduction only on the server side, hiding persistent unfairness at the individual client level. To support more robust and reproducible fairness research in FL, we introduce a comprehensive benchmarking framework for fairness-aware FL at both the global and client levels. Our contributions are three-fold: (1) We introduce \fairdataset, a library to create tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.
♻ ☆ DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization NeurIPS 2025
The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives used as scoring functions; (3) it employs a simple yet effective constrained optimization approach to enforce the KL divergence constraint. As a result, DisCO offers notable advantages over GRPO and its variants: (i) it completely eliminates difficulty bias by adopting discriminative objectives; (ii) it addresses the entropy instability in GRPO and its variants through the use of non-clipping scoring functions and a constrained optimization approach, yielding long and stable training dynamics; (iii) it allows the incorporation of advanced discriminative learning techniques to address data imbalance, where a significant number of questions have more negative than positive generated answers during training. Our experiments on enhancing the mathematical reasoning capabilities of SFT-finetuned models show that DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7\% over GRPO and 6\% over DAPO across six benchmark tasks for an 1.5B model.
comment: Accepted to NeurIPS 2025
♻ ☆ Information-Geometric Barycenters for Bayesian Federated Learning
Federated learning (FL) is a widely used and impactful distributed optimization framework that achieves consensus through averaging locally trained models. While effective, this approach may not align well with Bayesian inference, where the model space has the structure of a distribution space. Taking an information-geometric perspective, we reinterpret FL aggregation as the problem of finding the barycenter of local posteriors using a prespecified divergence metric, minimizing the average discrepancy across clients. This perspective provides a unifying framework that generalizes many existing methods and offers crisp insights into their theoretical underpinnings. We then propose BA-BFL, an algorithm that retains the convergence properties of Federated Averaging in non-convex settings. In non-independent and identically distributed scenarios, we conduct extensive comparisons with statistical aggregation techniques, showing that BA-BFL achieves performance comparable to state-of-the-art methods while offering a geometric interpretation of the aggregation phase. Additionally, we extend our analysis to Hybrid Bayesian Deep Learning, exploring the impact of Bayesian layers on uncertainty quantification and model calibration.
♻ ☆ LoLA: Low-Rank Linear Attention With Sparse Caching
The per-token cost of transformer inference scales with context length, preventing its application to lifelong in-context learning. Linear attention is an efficient alternative that maintains a constant memory footprint, even on infinite context lengths. While this is a potential candidate for lifelong learning, it falls short in memory capacity. In this paper, we propose LoLA, a training-free augmentation to linear attention that boosts associative recall. LoLA distributes past key-value pairs from context into three memory systems: (i) recent pairs in a local sliding window cache; (ii) difficult-to-memorize pairs in a sparse, global cache; and (iii) generic pairs in the recurrent hidden state of linear attention. We show through ablations that our self-recall error metric is crucial to efficiently manage long-term associative memories. On pass-key retrieval tasks, LoLA improves the base model's performance from 0.6% to 97.4% accuracy. This is achieved with a 4.6x smaller cache than Llama-3.1 8B on 4K context length. LoLA also outperforms other 1B and 8B parameter subquadratic models on zero-shot commonsense reasoning tasks.
♻ ☆ GIM: Improved Interpretability for Large Language Models
Ensuring faithful interpretability in large language models is imperative for trustworthy and reliable AI. A key obstacle is self-repair, a phenomenon where networks compensate for reduced signal in one component by amplifying others, masking the true importance of the ablated component. While prior work attributes self-repair to layer normalization and back-up components that compensate for ablated components, we identify a novel form occurring within the attention mechanism, where softmax redistribution conceals the influence of important attention scores. This leads traditional ablation and gradient-based methods to underestimate the significance of all components contributing to these attention scores. We introduce Gradient Interaction Modifications (GIM), a technique that accounts for self-repair during backpropagation. Extensive experiments across multiple large language models (Gemma 2B/9B, LLAMA 1B/3B/8B, Qwen 1.5B/3B) and diverse tasks demonstrate that GIM significantly improves faithfulness over existing circuit identification and feature attribution methods. Our work is a significant step toward better understanding the inner mechanisms of LLMs, which is crucial for improving them and ensuring their safety. Our code is available at https://github.com/JoakimEdin/gim.
♻ ☆ Unpicking Data at the Seams: Understanding Disentanglement in VAEs
A generative latent variable model is said to be disentangled when varying a single latent co-ordinate changes a single aspect of samples generated, e.g. object position or facial expression in an image. Related phenomena are seen in several generative paradigms, including state-of-the-art diffusion models, but disentanglement is most notably observed in Variational Autoencoders (VAEs), where oft-used diagonal posterior covariances are argued to be the cause. We make this picture precise. From a known exact link between optimal Gaussian posteriors and decoder derivatives, we show how diagonal posteriors "lock" a decoder's local axes so that density over the data manifold factorises along independent one-dimensional seams that map to axis-aligned directions in latent space. This gives a clear definition of disentanglement, explains why it emerges in VAEs and shows that, under stated assumptions, ground truth factors are identifiable even with a symmetric prior.
comment: 9 pages
♻ ☆ Apple: Toward General Active Perception via Reinforcement Learning
Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics.
comment: 16 pages; 13 figures Under Review
♻ ☆ Fast Likelihood-Free Parameter Estimation for Lévy Processes
L\'evy processes are widely used in financial modeling due to their ability to capture discontinuities and heavy tails, which are common in high-frequency asset return data. However, parameter estimation remains a challenge when associated likelihoods are unavailable or costly to compute. We propose a fast and accurate method for L\'evy parameter estimation using the neural Bayes estimation (NBE) framework -- a simulation-based, likelihood-free approach that leverages permutation-invariant neural networks to approximate Bayes estimators. We contribute new theoretical results, showing that NBE results in consistent estimators whose risk converges to the Bayes estimator under mild conditions. Moreover, through extensive simulations across several L\'evy models, we show that NBE outperforms traditional methods in both accuracy and runtime, while also enabling two complementary approaches to uncertainty quantification. We illustrate our approach on a challenging high-frequency cryptocurrency return dataset, where the method captures evolving parameter dynamics and delivers reliable and interpretable inference at a fraction of the computational cost of traditional methods. NBE provides a scalable and practical solution for inference in complex financial models, enabling parameter estimation and uncertainty quantification over an entire year of data in just seconds. We additionally investigate nearly a decade of high-frequency Bitcoin returns, requiring less than one minute to estimate parameters under the proposed approach.
♻ ☆ Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations
Post-training has emerged as a crucial technique for aligning pre-trained Large Language Models (LLMs) with human instructions, significantly enhancing their performance across a wide range of tasks. Central to this process is the quality and diversity of post-training datasets. This paper presents a review of publicly available Arabic post-training datasets on the Hugging Face Hub, organized along four key dimensions: (1) LLM Capabilities (e.g., Question Answering, Translation, Reasoning, Summarization, Dialogue, Code Generation, and Function Calling); (2) Steerability (e.g., Persona and System Prompts); (3) Alignment (e.g., Cultural, Safety, Ethics, and Fairness); and (4) Robustness. Each dataset is rigorously evaluated based on popularity, practical adoption, recency and maintenance, documentation and annotation quality, licensing transparency, and scientific contribution. Our review revealed critical gaps in the development of Arabic post-training datasets, including limited task diversity, inconsistent or missing documentation and annotation, and low adoption across the community. Finally, the paper discusses the implications of these gaps on the progress of Arabic-centric LLMs and applications while providing concrete recommendations for future efforts in Arabic post-training dataset development.
♻ ☆ Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
The Unconstrained Feature Model (UFM) is a mathematical framework that enables closed-form approximations for minimal training loss and related performance measures in deep neural networks (DNNs). This paper leverages the UFM to provide qualitative insights into neural multivariate regression, a critical task in imitation learning, robotics, and reinforcement learning. Specifically, we address two key questions: (1) How do multi-task models compare to multiple single-task models in terms of training performance? (2) Can whitening and normalizing regression targets improve training performance? The UFM theory predicts that multi-task models achieve strictly smaller training MSE than multiple single-task models when the same or stronger regularization is applied to the latter, and our empirical results confirm these findings. Regarding whitening and normalizing regression targets, the UFM theory predicts that they reduce training MSE when the average variance across the target dimensions is less than one, and our empirical results once again confirm these findings. These findings highlight the UFM as a powerful framework for deriving actionable insights into DNN design and data pre-processing strategies.
comment: 35 pages, 10 figures
♻ ☆ DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion
We introduce DreamControl, a novel methodology for learning autonomous whole-body humanoid skills. DreamControl leverages the strengths of diffusion models and Reinforcement Learning (RL): our core innovation is the use of a diffusion prior trained on human motion data, which subsequently guides an RL policy in simulation to complete specific tasks of interest (e.g., opening a drawer or picking up an object). We demonstrate that this human motion-informed prior allows RL to discover solutions unattainable by direct RL, and that diffusion models inherently promote natural looking motions, aiding in sim-to-real transfer. We validate DreamControl's effectiveness on a Unitree G1 robot across a diverse set of challenging tasks involving simultaneous lower and upper body control and object interaction. Project website at https://genrobo.github.io/DreamControl/
comment: https://genrobo.github.io/DreamControl/ (under submission)
♻ ☆ Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning
Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features.
comment: This paper has been published in IEEE Journal on Selected Areas in Communications
♻ ☆ Complexity Reduction in Machine Learning-Based Wireless Positioning: Minimum Description Features
A recent line of research has been investigating deep learning approaches to wireless positioning (WP). Although these WP algorithms have demonstrated high accuracy and robust performance against diverse channel conditions, they also have a major drawback: they require processing high-dimensional features, which can be prohibitive for mobile applications. In this work, we design a positioning neural network (P-NN) that substantially reduces the complexity of deep learning-based WP through carefully crafted minimum description features. Our feature selection is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We also develop a novel methodology for adaptively selecting the size of feature space, which optimizes over balancing the expected amount of useful information and classification capability, quantified using information-theoretic measures on the signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP).
comment: This paper has been published in the proceedings of 2024 IEEE International Conference on Communications (ICC)
♻ ☆ Octic Vision Transformers: Quicker ViTs Through Equivariance
Why are state-of-the-art Vision Transformers (ViTs) not designed to exploit natural geometric symmetries such as 90-degree rotations and reflections? In this paper, we argue that there is no fundamental reason, and what has been missing is an efficient implementation. To this end, we introduce Octic Vision Transformers (octic ViTs) which rely on octic group equivariance to capture these symmetries. In contrast to prior equivariant models that increase computational cost, our octic linear layers achieve 5.33x reductions in FLOPs and up to 8x reductions in memory compared to ordinary linear layers. In full octic ViT blocks the computational reductions approach the reductions in the linear layers with increased embedding dimension. We study two new families of ViTs, built from octic blocks, that are either fully octic equivariant or break equivariance in the last part of the network. Training octic ViTs supervised (DeiT-III) and unsupervised (DINOv2) on ImageNet-1K, we find that they match baseline accuracy while at the same time providing substantial efficiency gains.
♻ ☆ Fast training of accurate physics-informed neural networks without gradient descent
Solving time-dependent Partial Differential Equations (PDEs) is one of the most critical problems in computational science. While Physics-Informed Neural Networks (PINNs) offer a promising framework for approximating PDE solutions, their accuracy and training speed are limited by two core barriers: gradient-descent-based iterative optimization over complex loss landscapes and non-causal treatment of time as an extra spatial dimension. We present Frozen-PINN, a novel PINN based on the principle of space-time separation that leverages random features instead of training with gradient descent, and incorporates temporal causality by construction. On eight PDE benchmarks, including challenges such as extreme advection speeds, shocks, and high dimensionality, Frozen-PINNs achieve superior training efficiency and accuracy over state-of-the-art PINNs, often by several orders of magnitude. Our work addresses longstanding training and accuracy bottlenecks of PINNs, delivering quickly trainable, highly accurate, and inherently causal PDE solvers, a combination that prior methods could not realize. Our approach challenges the reliance of PINNs on stochastic gradient-descent-based methods and specialized hardware, leading to a paradigm shift in PINN training and providing a challenging benchmark for the community.
comment: 54 pages, 23 figures
♻ ☆ Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance
Recent scientific advances require complex experiment design, necessitating the meticulous tuning of many experiment parameters. Tree-structured Parzen estimator (TPE) is a widely used Bayesian optimization method in recent parameter tuning frameworks such as Hyperopt and Optuna. Despite its popularity, the roles of each control parameter in TPE and the algorithm intuition have not been discussed so far. The goal of this paper is to identify the roles of each control parameter and their impacts on parameter tuning based on the ablation studies using diverse benchmark datasets. The recommended setting concluded from the ablation studies is demonstrated to improve the performance of TPE. Our TPE implementation used in this paper is available at https://github.com/nabenabe0928/tpe/tree/single-opt.
♻ ☆ What Can RL Bring to VLA Generalization? An Empirical Study NeurIPS 2025
Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io
comment: Accepted by NeurIPS 2025
♻ ☆ MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
Molecular Dynamics (MD) is a powerful computational microscope for probing protein functions. However, the need for fine-grained integration and the long timescales of biomolecular events make MD computationally expensive. To address this, several generative models have been proposed to generate surrogate trajectories at lower cost. Yet, these models typically learn a fixed-lag transition density, causing the training signal to be dominated by frequent but uninformative transitions. We introduce a new class of generative models, MSM Emulators, which instead learn to sample transitions across discrete states defined by an underlying Markov State Model (MSM). We instantiate this class with Markov Space Flow Matching (MarS-FM), whose sampling offers more than two orders of magnitude speedup compared to implicit- or explicit-solvent MD simulations. We benchmark Mars-FM ability to reproduce MD statistics through structural observables such as RMSD, radius of gyration, and secondary structure content. Our evaluation spans protein domains (up to 500 residues) with significant chemical and structural diversity, including unfolding events, and enforces strict sequence dissimilarity between training and test sets to assess generalization. Across all metrics, MarS-FM outperforms existing methods, often by a substantial margin.
♻ ☆ Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on smartphones and other heterogeneous edge devices. In addition to varying computation speeds, these devices often hold data from different distributions. However, existing asynchronous SGD methods struggle in such heterogeneous settings and face two key limitations. First, many rely on unrealistic assumptions of similarity across workers' data distributions. Second, methods that relax this assumption still fail to achieve theoretically optimal performance under heterogeneous computation times. We introduce Ringleader ASGD, the first asynchronous SGD algorithm that attains the theoretical lower bounds for parallel first-order stochastic methods in the smooth nonconvex regime, thereby achieving optimal time complexity under data heterogeneity and without restrictive similarity assumptions. Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary and even time-varying worker computation speeds, closing a fundamental gap in the theory of asynchronous optimization.
♻ ☆ A quantitative analysis of semantic information in deep representations of text and images
Deep neural networks are known to develop similar representations for semantically related data, even when they belong to different domains, such as an image and its description, or the same text in different languages. We present a method for quantitatively investigating this phenomenon by measuring the relative information content of the representations of semantically related data and probing how it is encoded into multiple tokens of large language models (LLMs) and vision transformers. Looking first at how LLMs process pairs of translated sentences, we identify inner ``semantic'' layers containing the most language-transferable information. We find moreover that, on these layers, a larger LLM (DeepSeek-V3) extracts significantly more general information than a smaller one (Llama3.1-8B). Semantic information of English text is spread across many tokens and it is characterized by long-distance correlations between tokens and by a causal left-to-right (i.e., past-future) asymmetry. We also identify layers encoding semantic information within visual transformers. We show that caption representations in the semantic layers of LLMs predict visual representations of the corresponding images. We observe significant and model-dependent information asymmetries between image and text representations.
♻ ☆ Robust LLM Training Infrastructure at ByteDance
The training scale of large language models (LLMs) has reached tens of thousands of GPUs and is still continuously expanding, enabling faster learning of larger models. Accompanying the expansion of the resource scale is the prevalence of failures (CUDA error, NaN values, job hang, etc.), which poses significant challenges to training stability. Any large-scale LLM training infrastructure should strive for minimal training interruption, efficient fault diagnosis, and effective failure tolerance to enable highly efficient continuous training. This paper presents ByteRobust, a large-scale GPU infrastructure management system tailored for robust and stable training of LLMs. It exploits the uniqueness of LLM training process and gives top priorities to detecting and recovering failures in a routine manner. Leveraging parallelisms and characteristics of LLM training, ByteRobust enables high-capacity fault tolerance, prompt fault demarcation, and localization with an effective data-driven approach, comprehensively ensuring continuous and efficient training of LLM tasks. ByteRobust is deployed on a production GPU platform with over 200,000 GPUs and achieves 97% ETTR for a three-month training job on 9,600 GPUs.
♻ ☆ Approximation properties of neural ODEs
We study the approximation properties of shallow neural networks whose activation function is defined as the flow map of a neural ordinary differential equation (neural ODE) at the final time of the integration interval. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters satisfy specific constraints. In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability. We prove that the UAP holds if we consider either constraint independently. When both are enforced, there is a loss of expressiveness, and we derive approximation bounds that quantify how accurately such a constrained network can approximate a continuous function.
comment: 30 pages, 9 figures, 2 tables
♻ ☆ Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Due to long generations, large language model (LLM) math reasoning demands significant computational resources and time. While many existing efficient inference methods have been developed with excellent performance preservation on language tasks, they often severely degrade math performance. In this paper, we propose Caprese, a resource-efficient distillation method to recover lost capabilities from deploying efficient inference methods, focused primarily in feedforward blocks. With original weights unperturbed, roughly 1% of additional parameters, and only 20K synthetic training samples, we are able to recover much if not all of the math capabilities lost from efficient inference for thinking LLMs and without harm to language tasks for instruct LLMs. Moreover, Caprese slashes the number of active parameters (~2B cut for Gemma 2 9B and Llama 3.1 8B) and integrates cleanly into existing model layers to reduce latency (>16% time-to-next-token reduction) while encouraging response brevity (up to 8.5% fewer tokens).
♻ ☆ Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Given an unnormalized probability density $\pi\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $\pi$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{d\beta^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $\beta$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $\pi$ and a tractable reference distribution. Our analysis, leveraging Girsanov theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.
♻ ☆ Structured Agent Distillation for Large Language Model
Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into {[REASON]} and {[ACT]} spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.
♻ ☆ Value Profiles for Encoding Human Variation
Modelling human variation in rating tasks is crucial for personalization, pluralistic model alignment, and computational social science. We propose representing individuals using natural language value profiles -- descriptions of underlying values compressed from in-context demonstrations -- along with a steerable decoder model that estimates individual ratings from a rater representation. To measure the predictive information in a rater representation, we introduce an information-theoretic methodology and find that demonstrations contain the most information, followed by value profiles, then demographics. However, value profiles effectively compress the useful information from demonstrations (>70% information preservation) and offer advantages in terms of scrutability, interpretability, and steerability. Furthermore, clustering value profiles to identify similarly behaving individuals better explains rater variation than the most predictive demographic groupings. Going beyond test set performance, we show that the decoder predictions change in line with semantic profile differences, are well-calibrated, and can help explain instance-level disagreement by simulating an annotator population. These results demonstrate that value profiles offer novel, predictive ways to describe individual variation beyond demographics or group information.
comment: EMNLP 2025
♻ ☆ Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models NeurIPS 2025
Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose RAIF, a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to a 8B LLM. Evaluation on OOD constraints also confirms the generalizability of our RAIF. Codes and data are available at https://github.com/yuleiqin/RAIF. Keywords: reinforcement learning with verifiable rewards (RLVR), instruction following, complex instructions
comment: Accepted to NeurIPS 2025; 15 pages of main body, 5 tables, 5 figures, 42 pages of appendix
♻ ☆ SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
The common approach to communicate a large language model's (LLM) uncertainty is to add a percentage number or a hedging word to its response. But is this all we can do? Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflect on its internal belief distribution and output a summary of all options it deems possible, and how likely they are. To test whether LLMs possess this capability, we develop the SelfReflect metric, an information-theoretic distance between a given summary and a distribution over answers. In interventional and human studies, we find that SelfReflect indicates even slight deviations, yielding a fine measure of faithfulness between a summary string and an LLM's actual internal distribution over answers. With SelfReflect, we make a resounding negative observation: modern LLMs are, across the board, incapable of revealing what they are uncertain about, neither through reasoning, nor chains-of-thoughts, nor explicit finetuning. However, we do find that LLMs are able to generate faithful summaries of their uncertainties if we help them by sampling multiple outputs and feeding them back into the context. This simple approach shines a light at the universal way of communicating LLM uncertainties whose future development the SelfReflect score enables.
♻ ☆ FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization
Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovatively recasts the client selection process as a generative task. Drawing inspiration from the methodologies used in large language models, FedGCS efficiently encodes abundant decision-making knowledge within a continuous representation space, enabling efficient gradient-based optimization to search for optimal client selection that will be finally output via generation. The framework comprises four steps: (1) automatic collection of diverse "selection-score" pair data using classical client selection methods; (2) training an encoder-evaluator-decoder framework on this data to construct a continuous representation space; (3) employing gradient-based optimization in this space for optimal client selection; (4) generating the final optimal client selection via using beam search for the well-trained decoder. FedGCS outperforms traditional methods by being more comprehensive, generalizable, and efficient, simultaneously optimizing for model performance, latency, and energy consumption. The effectiveness of FedGCS is proven through extensive experimental analyses.
comment: Accepted by IJCAI-2024; Add an appendix
♻ ☆ FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver
Data-driven hourly weather forecasting models often face the challenge of error accumulation in long-term predictions. The problem is exacerbated by non-physical temporal discontinuities present in widely-used training datasets such as ECMWF Reanalysis v5 (ERA5), which stem from its 12-hour assimilation cycle. Such artifacts lead hourly autoregressive models to learn spurious dynamics and rapidly accumulate errors. To address this, we introduce FlowCast-ODE, a novel framework that treats atmospheric evolution as a continuous flow to ensure temporal coherence. Our method employs dynamic flow matching to learn the instantaneous velocity field from data and an ordinary differential equation (ODE) solver to generate smooth and temporally continuous hourly predictions. By pre-training on 6-hour intervals to sidestep data discontinuities and fine-tuning on hourly data, FlowCast-ODE produces seamless forecasts for up to 120 hours with a single lightweight model. It achieves competitive or superior skill on key meteorological variables compared to baseline models, preserves fine-grained spatial details, and demonstrates strong performance in forecasting extreme events, such as tropical cyclone tracks.
♻ ☆ The DNA of nuclear models: How AI predicts nuclear masses
Obtaining high-precision predictions of nuclear masses, or equivalently nuclear binding energies, $E_b$, remains an important goal in nuclear-physics research. Recently, many AI-based tools have shown promising results on this task, some achieving precision that surpasses the best physics models. However, the utility of these AI models remains in question given that predictions are only useful where measurements do not exist, which inherently requires extrapolation away from the training (and testing) samples. Since AI models are largely black boxes, the reliability of such an extrapolation is difficult to assess. We present an AI model that not only achieves cutting-edge precision for $E_b$, but does so in an interpretable manner. For example, we find that (and explain why) the most important dimensions of its internal representation form a double helix, where the analog of the hydrogen bonds in DNA here link the number of protons and neutrons found in the most stable nucleus of each isotopic chain. Furthermore, we show that the AI prediction of $E_b$ can be factorized and ordered hierarchically, with the most important terms corresponding to well-known symbolic models (such as the famous liquid drop). Remarkably, the improvement of the AI model over symbolic ones can almost entirely be attributed to an observation made by Jaffe in 1969 based on the structure of most known nuclear ground states. The end result is a fully interpretable data-driven model of nuclear masses based on physics deduced by AI.
comment: 19 pages, 11 figures
♻ ☆ Should You Use Your Large Language Model to Explore or Exploit?
We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.
♻ ☆ Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions
The automated detection of hallucinations and training data contamination is pivotal to the safe deployment of Large Language Models (LLMs). These tasks are particularly challenging in settings where no access to model internals is available. Current approaches in this setup typically leverage only the probabilities of actual tokens in the text, relying on simple task-specific heuristics. Crucially, they overlook the information contained in the full sequence of next-token probability distributions. We propose to go beyond hand-crafted decision rules by learning directly from the complete observable output of LLMs -- consisting not only of next-token probabilities, but also the full sequence of next-token distributions. We refer to this as the LLM Output Signature (LOS), and treat it as a reference data type for detecting hallucinations and data contamination. To that end, we introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LOS, which can provably approximate a broad class of existing techniques for both tasks. Empirically, LOS-Net achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency. Furthermore, it demonstrates promising transfer capabilities across datasets and LLMs. Full code is available at https://github.com/BarSGuy/Beyond-next-token-probabilities.
♻ ☆ Linear Attention for Efficient Bidirectional Sequence Modeling NeurIPS 2025
Linear Transformers and State Space Models have emerged as efficient alternatives to softmax Transformers for causal sequence modeling, enabling parallel training via matrix multiplication and efficient RNN-style inference. However, despite their success in causal tasks, no unified framework exists for applying Linear Transformers to bidirectional sequence modeling. We introduce LION, the first framework to systematically extend Linear Transformers to the bidirectional setting. LION generalizes three core representations commonly used in the causal case - full Linear Attention , bidirectional RNN, and chunkwise parallel form - to the bidirectional setting. These forms are theoretically equivalent and enable models to exploit the strengths of each during training and inference. We prove that a broad class of Linear Transformers can be extended using LION and validate our framework via three core examples based on the choice of decay type: LION-LIT, the bidirectional extension of arXiv:2006.16236; LION-D, based on arXiv:2307.08621; and LION-S, a variant using selective decay arXiv:2103.02143, arXiv:2312.0075. Across standard bidirectional tasks, LION enables models to match or exceed the performance of softmax Transformers, while offering significantly faster training and more efficient inference than existing State Space Models.
comment: Accepted in NeurIPS 2025
♻ ☆ Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance that often surpasses SSL approaches due to their exceptional generalization capabilities. This gap motivates us to question: how can we effectively harness the powerful generalization capabilities of VLMs into task-specific models? Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but we identify that it suffers from gradient conflicts between supervised and distillation losses. To address this challenge, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal. We observe that DHO resolves gradient conflicts, enabling improved feature learning compared to single-head KD baselines, with practical benefits of minimal computational overhead and test-time hyperparameter tuning without retraining. Extensive experiments across 15 datasets show that DHO consistently outperforms KD baselines, often outperforming teacher models with smaller student models. DHO also achieves new state-of-the-art performance on both in-distribution ImageNet semi-supervised learning and out-of-distribution generalization across ImageNet variants. We publicly release our code and model checkpoints to facilitate future research at https://github.com/erjui/DHO.
comment: 38 pages, 17 figures, preprint
♻ ☆ TensorRL-QAS: Reinforcement learning with tensor networks for improved quantum architecture search NeurIPS 2025
Variational quantum algorithms hold the promise to address meaningful quantum problems already on noisy intermediate-scale quantum hardware. In spite of the promise, they face the challenge of designing quantum circuits that both solve the target problem and comply with device limitations. Quantum architecture search (QAS) automates the design process of quantum circuits, with reinforcement learning (RL) emerging as a promising approach. Yet, RL-based QAS methods encounter significant scalability issues, as computational and training costs grow rapidly with the number of qubits, circuit depth, and hardware noise. To address these challenges, we introduce $\textit{TensorRL-QAS}$, an improved framework that combines tensor network methods with RL for QAS. By warm-starting the QAS with a matrix product state approximation of the target solution, TensorRL-QAS effectively narrows the search space to physically meaningful circuits and accelerates the convergence to the desired solution. Tested on several quantum chemistry problems of up to 12-qubit, TensorRL-QAS achieves up to a 10-fold reduction in CNOT count and circuit depth compared to baseline methods, while maintaining or surpassing chemical accuracy. It reduces classical optimizer function evaluation by up to 100-fold, accelerates training episodes by up to 98$\%$, and can achieve 50$\%$ success probability for 10-qubit systems, far exceeding the $<$1$\%$ rates of baseline. Robustness and versatility are demonstrated both in the noiseless and noisy scenarios, where we report a simulation of an 8-qubit system. Furthermore, TensorRL-QAS demonstrates effectiveness on systems on 20-qubit quantum systems, positioning it as a state-of-the-art quantum circuit discovery framework for near-term hardware and beyond.
comment: Accepted at NeurIPS 2025. Code is at: https://github.com/Aqasch/TensorRL-QAS
♻ ☆ CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
Reinforcement learning (RL) has become a powerful paradigm for optimizing large language models (LLMs) to handle complex reasoning tasks. A core challenge in this process lies in managing policy entropy, which reflects the balance between exploration and exploitation during training. Existing methods, such as proximal policy optimization (PPO) and its variants, discard valuable gradient signals from low-probability tokens due to the clipping mechanism. We systematically analyze the entropy dynamics and reveal that these clipped tokens play a critical yet overlooked role in regulating entropy evolution. We propose \textbf{C}oordinating \textbf{E}ntropy via \textbf{G}radient-\textbf{P}reserving \textbf{P}olicy \textbf{O}ptimization (CE-GPPO), a novel algorithm that reintroduces gradients from clipped tokens in native PPO in a gentle and bounded manner. By controlling the magnitude of gradients from tokens outside the clipping interval, CE-GPPO is able to achieve an exploration-exploitation trade-off. We provide theoretical justification and empirical evidence showing that CE-GPPO effectively mitigates entropy instability. Extensive experiments on mathematical reasoning benchmarks show that CE-GPPO consistently outperforms strong baselines across different model scales.
♻ ☆ scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding
Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existing strategies, including graph neural networks, face challenges in handling the inefficiency due to scRNA-seq data's intrinsic high-dimension and high-sparsity. Addressing these limitations, we introduce scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel framework designed for efficient and accurate clustering of scRNA-seq data that simultaneously utilizes intercellular high-order structural information. scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information, overcoming the over-smoothing and inefficiency issues prevalent in prior graph neural network methods. (ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data, specifically its high-dimension and high-sparsity. (iii) An autoencoder-based feature learning module that simplifies model complexity through effective dimension reduction and feature extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's superior performance and efficiency compared to 7 established models, underscoring scCDCG's potential as a transformative tool in scRNA-seq data analysis. Our code is available at: https://github.com/XPgogogo/scCDCG.
comment: Accepted as a long paper for the research track at DASFAA 2024; Error Correction
♻ ☆ CrediBench: Building Web-Scale Network Datasets for Information Integrity
Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persuasive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.
comment: 16 pages,4 figures
♻ ☆ MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
Feature extraction techniques are crucial in medical image classification; however, classical feature extractors, in addition to traditional machine learning classifiers, often exhibit significant limitations in providing sufficient discriminative information for complex image sets. While Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) have shown promise in feature extraction, they are prone to overfitting due to the inherent characteristics of medical imaging data, including small sample sizes or high intra-class variance. In this work, the Medical Image Attention-based Feature Extractor (MIAFEx) is proposed, a novel method that employs a learnable refinement mechanism to enhance the classification token within the Transformer encoder architecture. This mechanism adjusts the token based on learned weights, improving the extraction of salient features and enhancing the model's adaptability to the challenges presented by medical imaging data. The MIAFEx output feature quality is compared against classical feature extractors using traditional and hybrid classifiers. Also, the performance of these features is compared against modern CNN and ViT models in classification tasks, demonstrating their superiority in accuracy and robustness across multiple complex medical imaging datasets. This advantage is particularly pronounced in scenarios with limited training data, where traditional and modern models often struggle to generalize effectively. The source code of this proposal can be found at https://github.com/Oscar-RamosS/Medical-Image-Attention-based-Feature-Extractor-MIAFEx
comment: This is the preprint version of an article that has been accepted for publication in Knowledge-Based Systems
♻ ☆ MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation
Statistical learning methods for automated variable selection, such as LASSO, elastic nets, or gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which involves creating several completed datasets. However, there is an ongoing debate on how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches modify the regularization methods LASSO and elastic nets by defining a single loss function, resulting in a unified set of coefficients across imputations. Our key contribution is to extend this principle to the framework of component-wise gradient boosting by proposing MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets. Simulation studies suggest that our approach yields prediction performance comparable to that of these recently proposed methods.
comment: 21 pages, 2 algorithms, includes a simulation study
♻ ☆ Value-Guided Search for Efficient Chain-of-Thought Reasoning NeurIPS 2025
In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.
comment: NeurIPS 2025
♻ ☆ LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
This paper extends LiDAR-BIND, a modular multi-modal fusion framework that binds heterogeneous sensors (radar, sonar) to a LiDAR-defined latent space, with mechanisms that explicitly enforce temporal consistency. We introduce three contributions: (i) temporal embedding similarity that aligns consecutive latent representations, (ii) a motion-aligned transformation loss that matches displacement between predictions and ground truth LiDAR, and (iii) windowed temporal fusion using a specialised temporal module. We further update the model architecture to better preserve spatial structure. Evaluations on radar/sonar-to-LiDAR translation demonstrate improved temporal and spatial coherence, yielding lower absolute trajectory error and better occupancy map accuracy in Cartographer-based SLAM (Simultaneous Localisation and Mapping). We propose different metrics based on the Fr\'echet Video Motion Distance (FVMD) and a correlation-peak distance metric providing practical temporal quality indicators to evaluate SLAM performance. The proposed temporal LiDAR-BIND, or LiDAR-BIND-T, maintains modular modality fusion while substantially enhancing temporal stability, resulting in improved robustness and performance for downstream SLAM.
♻ ☆ Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
Current LLM unlearning methods face a critical security vulnerability that undermines their fundamental purpose: while they appear to successfully remove sensitive or harmful knowledge, this ``forgotten" information remains precariously recoverable through relearning attacks. We identify that the root cause is that conventional methods optimizing the forgetting loss at individual data points will drive model parameters toward sharp minima in the loss landscape. In these unstable regions, even minimal parameter perturbations can drastically alter the model's behaviors. Consequently, relearning attacks exploit this vulnerability by using just a few fine-tuning samples to navigate the steep gradients surrounding these unstable regions, thereby rapidly recovering knowledge that was supposedly erased. This exposes a critical robustness gap between apparent unlearning and actual knowledge removal. To address this issue, we propose StableUN, a bi-level feedback-guided optimization framework that explicitly seeks more stable parameter regions via neighborhood-aware optimization. It integrates forgetting feedback, which uses adversarial perturbations to probe parameter neighborhoods, with remembering feedback to preserve model utility, aligning the two objectives through gradient projection. Experiments on WMDP and MUSE benchmarks demonstrate that our method is significantly more robust against both relearning and jailbreaking attacks while maintaining competitive utility performance.
♻ ☆ Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.
comment: Preprint Under Review
♻ ☆ Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics
Counterfactual explanations provide individuals with cost-optimal recommendations to achieve their desired outcomes. However, when a significant number of individuals seek similar state modifications, this individual-centric approach can inadvertently create competition and introduce unforeseen costs. Additionally, disregarding the underlying data distribution may lead to recommendations that individuals perceive as unusual or impractical. To address these challenges, we propose a novel framework that extends standard counterfactual explanations by incorporating a population dynamics model. This framework penalizes deviations from equilibrium after individuals follow the recommendations, effectively mitigating externalities caused by correlated changes across the population. By balancing individual modification costs with their impact on others, our method ensures more equitable and efficient outcomes. We show how this approach reframes the counterfactual explanation problem from an individual-centric task to a collective optimization problem. Augmenting our theoretical insights, we design and implement scalable algorithms for computing collective counterfactuals, showcasing their effectiveness and advantages over existing recourse methods, particularly in aligning with collective objectives.
♻ ☆ HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring NeurIPS 2025
Mobile and wearable healthcare monitoring play a vital role in facilitating timely interventions, managing chronic health conditions, and ultimately improving individuals' quality of life. Previous studies on large language models (LLMs) have highlighted their impressive generalization abilities and effectiveness in healthcare prediction tasks. However, most LLM-based healthcare solutions are cloud-based, which raises significant privacy concerns and results in increased memory usage and latency. To address these challenges, there is growing interest in compact models, Small Language Models (SLMs), which are lightweight and designed to run locally and efficiently on mobile and wearable devices. Nevertheless, how well these models perform in healthcare prediction remains largely unexplored. We systematically evaluated SLMs on health prediction tasks using zero-shot, few-shot, and instruction fine-tuning approaches, and deployed the best performing fine-tuned SLMs on mobile devices to evaluate their real-world efficiency and predictive performance in practical healthcare scenarios. Our results show that SLMs can achieve performance comparable to LLMs while offering substantial gains in efficiency and privacy. However, challenges remain, particularly in handling class imbalance and few-shot scenarios. These findings highlight SLMs, though imperfect in their current form, as a promising solution for next-generation, privacy-preserving healthcare monitoring.
comment: 9 pages, 6 tables, 6 figures. Accepted at NeurIPS 2025 Workshop on GenAI4Health
♻ ☆ Communications to Circulations: 3D Wind Field Retrieval and Real-Time Prediction Using 5G GNSS Signals and Deep Learning
Accurate atmospheric wind field information is crucial for various applications, including weather forecasting, aviation safety, and disaster risk reduction. However, obtaining high spatiotemporal resolution wind data remains challenging due to limitations in traditional in-situ observations and remote sensing techniques, as well as the computational expense and biases of numerical weather prediction (NWP) models. This paper introduces G-WindCast, a novel deep learning framework that leverages signal strength variations from 5G Global Navigation Satellite System (GNSS) signals to retrieve and forecast three-dimensional (3D) atmospheric wind fields. The framework utilizes Forward Neural Networks (FNN) and Transformer networks to capture complex, nonlinear, and spatiotemporal relationships between GNSS-derived features and wind dynamics. Our preliminary results demonstrate promising accuracy in both wind retrieval and short-term wind forecasting (up to 30 minutes lead time), with skill scores comparable to high-resolution NWP outputs in certain scenarios. The model exhibits robustness across different forecast horizons and pressure levels, and its predictions for wind speed and direction show superior agreement with observations compared to concurrent ERA5 reanalysis data. Furthermore, we show that the system can maintain excellent performance for localized forecasting even with a significantly reduced number of GNSS stations (e.g., around 100), highlighting its cost-effectiveness and scalability. This interdisciplinary approach underscores the transformative potential of exploiting non-traditional data sources and deep learning for advanced environmental monitoring and real-time atmospheric applications.
comment: 10 pages, 5 figures. Minor text revisions; updated author list to reflect contributions
♻ ☆ A Review on Riemannian Metric Learning: Closer to You than You Imagine
Riemannian metric learning is an emerging field in machine learning, unlocking new ways to encode complex data structures beyond traditional distance metric learning. While classical approaches rely on global distances in Euclidean space, they often fall short in capturing intrinsic data geometry. Enter Riemannian metric learning: a powerful generalization that leverages differential geometry to model the data according to their underlying Riemannian manifold. This approach has demonstrated remarkable success across diverse domains, from causal inference and optimal transport to generative modeling and representation learning. In this review, we bridge the gap between classical metric learning and Riemannian geometry, providing a structured and accessible overview of key methods, applications, and recent advances. We argue that Riemannian metric learning is not merely a technical refinement but a fundamental shift in how we think about data representations. Thus, this review should serve as a valuable resource for researchers and practitioners interested in exploring Riemannian metric learning and convince them that it is closer to them than they might imagine-both in theory and in practice.
Systems and Control 36
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
☆ Robust Safety-Critical Control of Integrator Chains with Mismatched Perturbations via Linear Time-Varying Feedback
In this paper, we propose a novel safety-critical control framework for a chain of integrators subject to both matched and mismatched perturbations. The core of our approach is a linear, time-varying state-feedback design that simultaneously enforces stability and safety constraints. By integrating backstepping techniques with a quadratic programming (QP) formulation, we develop a systematic procedure to guarantee safety under time-varying gains. We provide rigorous theoretical guarantees for the double integrator case, both in the presence and absence of perturbations, and outline general proofs for extending the methodology to higher-order chains of integrators. This proposed framework thus bridges robustness and safety-critical performance, while overcoming the limitations of existing prescribed-time approaches.
☆ Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
☆ Stability Analysis of Thermohaline Convection With a Time-Varying Shear Flow Using the Lyapunov Method
This work identifies instabilities and computes the growth rate of a linear time-varying system using the Lyapunov method. The linear system describes cold fresh water on top of hot salty water with a periodically time-varying background shear flow. We employ a time-dependent weighting matrix to construct a Lyapunov function candidate, and the resulting linear matrix inequalities formulation is discretized in time using the forward Euler method. As the number of temporal discretization points increases, the growth rate predicted from the Lyapunov method or the Floquet theory will converge to the same value as that obtained from numerical simulations. We also use the Lyapunov method to analyze the instantaneous principal direction of instabilities and compare the computational resources required by the Lyapunov method, numerical simulations, and the Floquet theory.
comment: 6 pages, 5 figures
☆ Explainable and Resilient ML-Based Physical-Layer Attack Detectors
Detection of emerging attacks on network infrastructure is a critical aspect of security management. To meet the growing scale and complexity of modern threats, machine learning (ML) techniques offer valuable tools for automating the detection of malicious activities. However, as these techniques become more complex, their internal operations grow increasingly opaque. In this context, we address the need for explainable physical-layer attack detection methods. First, we analyze the inner workings of various classifiers trained to alert about physical layer intrusions, examining how the influence of different monitored parameters varies depending on the type of attack being detected. This analysis not only improves the interpretability of the models but also suggests ways to enhance their design for increased speed. In the second part, we evaluate the detectors' resilience to malicious parameter noising. The results highlight a key trade-off between model speed and resilience. This work serves as a design guideline for developing fast and robust detectors trained on available network monitoring data.
☆ Anticipatory Structure in the Propagation of Signal
We here report the development of a structure that shows the proteresis phenomenon in more general setting and set out its philosophical implications. In this case, the questions relate to how we are to interpret what will happen in the future, and the procollection (the counterpart of recollection) of not-yet-experienced phenomena that, when expressed, will be whatever has built up in fully determinate form already, ahead of the event. If such a process really exists, as our evidence confirms, not just as phenomenon but as a fact, then a gap exists between the actualized form of the future and its concrete expression when the event does happen. Such a fact, as hard to imagine as it is, may be intelligible, even interpretable and susceptible to mathematical and/or logical modeling. We build upon neglected theories and formulae that present time in a way that makes our results interpretable. A proteretic device is here described which shifts the input signal (event) to the future; and it is an anticipatory structure. The proteretic characteristic of neurons should also be capable of demonstration; and its neuronal behavior is possibly the reason for the fast perception/thought processes in spite of slow behaving neurons. That capacity may also account for why it is possible for animals (including humans) to interact with the environment by slightly seeing (in the sense of perceiving and/or sensing) the future. Exploiting this new proteretic technology, faster computers and more efficient cellphones, among other things, will be designed and built.
comment: 19 pages and 8 figures
☆ Stabilization of nonlinear systems with unknown delays via delay-adaptive neural operator approximate predictors
This work establishes the first rigorous stability guarantees for approximate predictors in delay-adaptive control of nonlinear systems, addressing a key challenge in practical implementations where exact predictors are unavailable. We analyze two scenarios: (i) when the actuated input is directly measurable, and (ii) when it is estimated online. For the measurable input case, we prove semi-global practical asymptotic stability with an explicit bound proportional to the approximation error $\epsilon$. For the unmeasured input case, we demonstrate local practical asymptotic stability, with the region of attraction explicitly dependent on both the initial delay estimate and the predictor approximation error. To bridge theory and practice, we show that neural operators-a flexible class of neural network-based approximators-can achieve arbitrarily small approximation errors, thus satisfying the conditions of our stability theorems. Numerical experiments on two nonlinear benchmark systems-a biological protein activator/repressor model and a micro-organism growth Chemostat model-validate our theoretical results. In particular, our numerical simulations confirm stability under approximate predictors, highlight the strong generalization capabilities of neural operators, and demonstrate a substantial computational speedup of up to 15x compared to a baseline fixed-point method.
comment: 20 pages, 2 figures
☆ Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
☆ Assessment of East-West (E-W) and South-North (S-N) facing Vertical Bifacial Photovoltaic Modules for Agrivoltaics and Dual-Land Use Applications in India
Deploying vertical bifacial PV modules can play a significant role in agrivoltaics, fencing walls, noise barriers, building integrated photovoltaics (BIPV), solar PV for electric vehicles, and many other applications. This research work presents the performance comparison of vertical bifacial photovoltaic (VBPV) modules facing East-West (E-W) and South-North (S-N) directions. Also, the VBPV modules are compared with vertical and tilted south-facing monofacial PV modules. Six PV modules (monofacial and bifacial) were installed at the rooftop of IIT Bhilai academic building, Raipur (21.16{\deg} N, 81.65{\deg} E), India, and studied for a year from May 2022 to April 2023. The results show that the E-W facing VBPV module gives two production peaks, one in the morning and another in the evening, as compared to the single notable rise at midday observed for a monofacial module. From a series of experiments, 19 days of data were collected over the one-year period from May 2022 to April 2023, with specific inclusion of important days like solstices and equinoxes. In addition, the energy generation results are compared with PVsyst simulations, while also addressing the limitations of the PVsyst simulation of vertical PV modules. E-W bifacial generation is higher than S-N bifacial and south-facing monofacial modules from February to April. The VBPV modules in E-W and S-N orientations present a promising opportunity for expanding the agrivoltaics sector in tropical and sub-tropical countries, like India. This has huge implications for addressing the sustainable development goals by simultaneously contributing to sustainable land management, green energy generation, energy security and water conservation in the vast geo-climatic expanse of tropics.
☆ Robust NbN on Si-SiGe hybrid superconducting-semiconducting microwave quantum circuit
Advancing large-scale quantum computing requires superconducting circuits that combine long coherence times with compatibility with semiconductor technology. We investigate niobium nitride (NbN) coplanar waveguide resonators integrated with Si/SiGe quantum wells, creating a hybrid platform designed for CMOS-compatible quantum hardware. Using temperature-dependent microwave spectroscopy in the single-photon regime, we examine resonance frequency and quality factor variations to probe the underlying loss mechanisms. Our analysis identifies the roles of two-level systems, quasiparticles, and scattering processes, and connects these losses to wafer properties and fabrication methods. The devices demonstrate reproducible performance and stable operation maintained for over two years, highlighting their robustness. These results provide design guidelines for developing low-loss, CMOS-compatible superconducting circuits and support progress toward resilient, scalable architectures for quantum information processing.
☆ Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference - ACC 2026
☆ Preemptive Spatiotemporal Trajectory Adjustment for Heterogeneous Vehicles in Highway Merging Zones
Aiming at the problem of driver's perception lag and low utilization efficiency of space-time resources in expressway ramp confluence area, based on the preemptive spatiotemporal trajectory Adjustment system, from the perspective of coordinating spatiotemporal resources, the reasonable value of safe space-time distance in trajectory pre-preparation is quantitatively analyzed. The minimum safety gap required for ramp vehicles to merge into the mainline is analyzed by introducing double positioning error and spatiotemporal trajectory tracking error. A merging control strategy for autonomous driving heterogeneous vehicles is proposed, which integrates vehicle type, driving intention, and safety spatiotemporal distance. The specific confluence strategies of ramp target vehicles and mainline cooperative vehicles under different vehicle types are systematically expounded. A variety of traffic flow and speed scenarios are used for full combination simulation. By comparing the time-position-speed diagram, the vehicle operation characteristics and the dynamic difference of confluence are qualitatively analyzed, and the average speed and average delay are used as the evaluation indices to quantitatively evaluate the performance advantages of the preemptive cooperative confluence control strategy. The results show that the maximum average delay improvement rates of mainline and ramp vehicles are 90.24 % and 74.24 %, respectively. The proposed strategy can effectively avoid potential vehicle conflicts and emergency braking behaviors, improve driving safety in the confluence area, and show significant advantages in driving stability and overall traffic efficiency optimization.
☆ LiDAR Point Cloud Colourisation Using Multi-Camera Fusion and Low-Light Image Enhancement
In recent years, the fusion of camera data with LiDAR measurements has emerged as a powerful approach to enhance spatial understanding. This study introduces a novel, hardware-agnostic methodology that generates colourised point clouds from mechanical LiDAR using multiple camera inputs, providing complete 360-degree coverage. The primary innovation lies in its robustness under low-light conditions, achieved through the integration of a low-light image enhancement module within the fusion pipeline. The system requires initial calibration to determine intrinsic camera parameters, followed by automatic computation of the geometric transformation between the LiDAR and cameras, removing the need for specialised calibration targets and streamlining the setup. The data processing framework uses colour correction to ensure uniformity across camera feeds before fusion. The algorithm was tested using a Velodyne Puck Hi-Res LiDAR and a four-camera configuration. The optimised software achieved real-time performance and reliable colourisation even under very low illumination, successfully recovering scene details that would otherwise remain undetectable.
Intelligent Multi-link EDCA Optimization for Delay-Bounded QoS in Wi-Fi 7
IEEE 802.11be (Wi-Fi 7) introduces Multi-Link Operation (MLO) as a While MLO offers significant parallelism and capacity, realizing its full potential in guaranteeing strict delay bounds and optimizing Quality of Service (QoS) for diverse, heterogeneous traffic streams in complex multi-link scenarios remain a significant challenge. This is largely due to the limitations of static Enhanced Distributed Channel Access (EDCA) parameters and the complexity inherent in cross-link traffic management. To address this, this paper investigates the correlation between overall MLO QoS indicators and the configuration of EDCA parameters and Acess Catagory (AC) traffic allocation among links. Based on this analysis, we formulate a constrained optimization problem aiming to minimize the sum of overall packet loss rates for all access categories while satisfying their respective overall delay violation probability constraints. A Genetic Algorithm (GA)-based MLO EDCA QoS optimization algorithm is designed to efficiently search the complex configuration space of AC assignments and EDCA parameters. Experimental results demonstrate that the proposed approach's efficacy in generating adaptive MLO configuration strategies that align with diverse service requirements. The proposed solution significantly improves delay distribution characteristics, and enhance QoS robustness and resource utilization efficiency in high-load MLO environments.
☆ Dynamic Causal Attack Graph based Cyber-security Risk Assessment Framework for CTCS System
Protecting the security of the train control system is a critical issue to ensure the safe and reliable operation of high-speed trains. Scientific modeling and analysis for the security risk is a promising way to guarantee system security. However, the representation and assessment of the multi-staged, causally related, and temporal-dynamic changed attack dependencies are difficult in the train control system. To solve the above challenges, a security assessment framework based on the Dynamical Causality Attack Graph (DCAG) model is introduced in this paper. Firstly, the DCAG model is generated based on the attack graph with consideration of temporal attack propagation and multi-stage attack event causality propagation. Then, the DCAG model is analyzed based on Bayesian inference and logic gateway-based inference. Through the case analysis of the CTCS-3 system, the security assessment framework is validated. With the DCAG-based security assessment framework, we can not only perform appropriate security risk quantification calculations, but also explore the importance of different attacks on system security risks, which is helpful in adjusting the cyber security defense policy.
☆ Ingress Cryogenic Receivers Toward Scalable Quantum Information Processing: Theory and System Analysis
Current control techniques for cryogenically cooled qubits are realized with coaxial cables, posing multiple challenges in terms of cost, thermal load, size, and long-term scalability. Emerging approaches to tackle this issue include cryogenic CMOS electronics at 4 K, and photonic links for direct qubit control. In this paper, we propose a multiplexed all-passive cryogenic high frequency direct detection control platform (cryo-HFDD). The proposed classical interface for direct qubit control utilizes optical or sub-THz bands. We present the possible tradeoffs of this platform, and compare it with current state-of-the-art cryogenic CMOS and conventional coaxial approaches. We assess the feasibility of adopting these efficient links for a wide range of microwave qubit power levels. Specifically, we estimate the heat load to achieve the required signal-to-noise ratio SNR considering different noise sources, component losses, as well as link density. We show that multiplexed photonic receivers at 4 K can aggressively scale the control of thousands of qubits. This opens the door for low cost scalable quantum computing systems.
☆ Beyond Point Estimates: Likelihood-Based Full-Posterior Wireless Localization
Modern wireless systems require not only position estimates, but also quantified uncertainty to support planning, control, and radio resource management. We formulate localization as posterior inference of an unknown transmitter location from receiver measurements. We propose Monte Carlo Candidate-Likelihood Estimation (MC-CLE), which trains a neural scoring network using Monte Carlo sampling to compare true and candidate transmitter locations. We show that in line-of-sight simulations with a multi-antenna receiver, MC-CLE learns critical properties including angular ambiguity and front-to-back antenna patterns. MC-CLE also achieves lower cross-entropy loss relative to a uniform baseline and Gaussian posteriors. alternatives under a uniform-loss metric.
☆ Pinching-Antenna Systems (PASS)-Enabled UAV Delivery
A pinching-antenna systems (PASS)-enabled unmanned aerial vehicle (UAV) delivery framework is proposed, which exploits the capability of PASS to establish a strong line-of-sight link and reduce free-space pathloss.Aiming at minimizing the communication energy consumption in one cycle, a double-layer optimization (DLO) algorithm is developed by jointly optimizing the UAV delivery sequence and the pinching antenna (PA) activation vector. More specifically, at the outer layer, a hierarchical alternating optimization (HAO) scheme is proposed to tackle the NP-hard problem of delivery sequence planning, where a genetic algorithm performs global exploration to generate candidate solutions at the top-level, while a dynamic programming performs local refinement to obtain elite solutions at the lower-level. With determined UAV trajectory, at the inner layer, focus is placed on addressing the highly coupled mixed-integer nonlinear programming problem of PA activation vector optimization, where a pair of algorithms are proposed: 1) Branch-and-Bound (BnB) algorithm for finding global optimum; 2) incremental search and local refinement (ISLR) algorithm for reducing computational complexity. Simulation results indicate that: i) The proposed HAO-based delivery sequence planning scheme can effectively reduce the total flight distance, thereby decreasing flight time and communication energy consumption; ii) Both the proposed BnB and ISLR algorithms can achieve energy-efficient PA activation, with the former exhibiting better performance and the latter having lower complexity; iii) PASS outperforms the conventional multi-antenna systems, especially with higher communication rate requirements.
☆ Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods
Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-{\L}ojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.
comment: 9 pages, 11 figures
☆ Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
Ensuring power grid resilience requires the timely and unsupervised detection of anomalies in synchrophasor data streams. We introduce T-BiGAN, a novel framework that integrates window-attention Transformers within a bidirectional Generative Adversarial Network (BiGAN) to address this challenge. Its self-attention encoder-decoder architecture captures complex spatio-temporal dependencies across the grid, while a joint discriminator enforces cycle consistency to align the learned latent space with the true data distribution. Anomalies are flagged in real-time using an adaptive score that combines reconstruction error, latent space drift, and discriminator confidence. Evaluated on a realistic hardware-in-the-loop PMU benchmark, T-BiGAN achieves an ROC-AUC of 0.95 and an average precision of 0.996, significantly outperforming leading supervised and unsupervised methods. It shows particular strength in detecting subtle frequency and voltage deviations, demonstrating its practical value for live, wide-area monitoring without relying on manually labeled fault data.
♻ ☆ Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
♻ ☆ A Hybrid Systems Model of Feedback Optimization for Linear Systems: Convergence and Robustness
Feedback optimization algorithms compute inputs to a system using real-time output measurements, which helps mitigate the effects of disturbances. However, existing work often models both system dynamics and computations in either discrete or continuous time, which may not accurately model some applications. In this work, we model linear system dynamics in continuous time, and we model the computations of inputs in discrete time. Therefore, we present a novel hybrid systems model of feedback optimization. We first establish the well-posedness of this hybrid model and establish completeness of solutions while ruling out Zeno behavior. Then we show the state of the system converges exponentially fast to a ball of known radius about a desired goal state. Next we analytically show that this system is robust to perturbations in (i) the values of measured outputs, (ii) the matrices that model the linear time-invariant system, and (iii) the times at which inputs are applied to the system. Simulation results confirm that this approach successfully mitigates the effects of disturbances.
comment: 15 Pages, 2 Figures, submitted to American Control Conference 2026
♻ ☆ MAPPO for Edge Server Monitoring
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions upon successful or failed job completion. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Experiments show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies across varying query costs, job arrival rates, and dispatchers.
comment: 6 pages, 4 figures. Accepted to IEEE MILCOM 2025
♻ ☆ Robot Conga: A Leader-Follower Walking Approach to Sequential Path Following in Multi-Agent Systems
Coordinated path following in multi-agent systems is a key challenge in robotics, with applications in automated logistics, surveillance, and collaborative exploration. Traditional formation control techniques often rely on time-parameterized trajectories and path integrals, which can result in synchronization issues and rigid behavior. In this work, we address the problem of sequential path following, where agents maintain fixed spatial separation along a common trajectory, guided by a leader under centralized control. We introduce Robot Conga, a leader-follower control strategy that updates each agent's desired state based on the leader's spatial displacement rather than time, assuming access to a global position reference, an assumption valid in indoor environments equipped with motion capture, vision-based tracking, or UWB localization systems. The algorithm was validated in simulation using both TurtleBot3 and quadruped (Laikago) robots. Results demonstrate accurate trajectory tracking, stable inter-agent spacing, and fast convergence, with all agents aligning within 250 time steps (approx. 0.25 seconds) in the quadruped case, and almost instantaneously in the TurtleBot3 implementation.
comment: 6 Pages, 8 Figures. Both authors have contributed equally
♻ ☆ Edge Server Monitoring for Job Assignment
In this paper, we study a goal-oriented communication problem for edge server monitoring, where compute jobs arrive intermittently at dispatchers and must be immediately assigned to distributed edge servers. Due to competing workloads and the dynamic nature of the edge environment, server availability fluctuates over time. To maintain accurate estimates of server availability states, each dispatcher updates its belief using two mechanisms: (i) active queries over shared communication channels and (ii) feedback from past job executions. We formulate a query scheduling problem that maximizes the job success rate under limited communication resources for queries. This problem is modeled as a Restless Multi-Armed Bandit (RMAB) with multiple actions and addressed using a Net-Gain Maximization (NGM) scheduling algorithm, which selects servers to query based on their expected improvement in execution performance. Simulation results show that the proposed NGM Policy significantly outperforms baseline strategies, achieving up to a 30% gain over the Round-Robin Policy and up to a 107% gain over the Never-Query Policy.
comment: Accepted to IEEE MILCOM 2025 (Networking Protocols and Performance Track), 6 pages, 2 figures
♻ ☆ On Fixed-Time Stability for a Class of Singularly Perturbed Systems using Composite Lyapunov Functions
Fixed-time stable dynamical systems are capable of achieving exact convergence to an equilibrium point within a fixed time that is independent of the initial conditions of the system. This property makes them highly appealing for designing control, estimation, and optimization algorithms in applications with stringent performance requirements. However, the set of tools available for analyzing the interconnection of fixed-time stable systems is rather limited compared to their asymptotic counterparts. In this paper, we address some of these limitations by exploiting the emergence of multiple time scales in nonlinear singularly perturbed dynamical systems, where the fast dynamics and the slow dynamics are fixed-time stable on their own. By extending the so-called composite Lyapunov method from asymptotic stability to the context of fixed-time stability, we provide a novel class of Lyapunov-based sufficient conditions to certify fixed-time stability in a class of singularly perturbed dynamical systems. The results are illustrated, analytically and numerically, using a fixed-time gradient flow system interconnected with a fixed-time plant and an additional high-order example.
♻ ☆ Ocean Diviner: A Diffusion-Augmented Reinforcement Learning Framework for AUV Robust Control in Underwater Tasks
Autonomous Underwater Vehicles (AUVs) are essential for marine exploration, yet their control remains highly challenging due to nonlinear dynamics and uncertain environmental disturbances. This paper presents a diffusion-augmented Reinforcement Learning (RL) framework for robust AUV control, aiming to improve AUV's adaptability in dynamic underwater environments. The proposed framework integrates two core innovations: (1) A diffusion-based action generation framework that produces physically feasible and high-quality actions, enhanced by a high-dimensional state encoding mechanism combining current observations with historical states and actions through a novel diffusion U-Net architecture, significantly improving long-horizon planning capacity for robust control. (2) A sample-efficient hybrid learning architecture that synergizes diffusion-guided exploration with RL policy optimization, where the diffusion model generates diverse candidate actions and the RL critic selects the optimal action, achieving higher exploration efficiency and policy stability in dynamic underwater environments. Extensive simulation experiments validate the framework's superior robustness and flexibility, outperforming conventional control methods in challenging marine conditions, offering enhanced adaptability and reliability for AUV operations in underwater tasks. Finally, we will release the code publicly soon to support future research in this area.
comment: Jingzehua Xu, Guanwen Xie and Weiyi Liu contributed equally to this work
♻ ☆ Data-Driven Estimation of Structured Singular Values
Estimating the size of the modeling error is crucial for robust control. Over the years, numerous metrics have been developed to quantify the model error in a control relevant manner. One of the most important such metrics is the structured singular value, as it leads to necessary and sufficient conditions for ensuring stability and robustness in feedback control under structured model uncertainty. Although the computation of the structured singular value is often intractable, lower and upper bounds for it can often be obtained if a model of the system is known. In this paper, we introduce a fully data-driven method to estimate a lower bound for the structured singular value, by conducting experiments on the system and applying power iterations to the collected data. Our numerical simulations demonstrate that this method effectively lower bounds the structured singular value, yielding results comparable to the MATLAB$^\copyright$ Robust Control Toolbox.
comment: Published in IEEE Control Systems Letters, vol. 9, pp. 1976-1981, 2025
♻ ☆ Spatial exponential decay of perturbations in optimal control of general evolution equations
We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.
comment: 53 pages, 5 figures
♻ ☆ Simulated Annealing for Multi-Robot Ergodic Information Acquisition Using Graph-Based Discretization
One of the goals of active information acquisition using multi-robot teams is to keep the relative uncertainty in each region at the same level to maintain identical acquisition quality (e.g., consistent target detection) in all the regions. To achieve this goal, ergodic coverage can be used to assign the number of samples according to the quality of observation, i.e., sampling noise levels. However, the noise levels are unknown to the robots. Although this noise can be estimated from samples, the estimates are unreliable at first and can generate fluctuating values. The main contribution of this paper is to use simulated annealing to generate the target sampling distribution, starting from uniform and gradually shifting to an estimated optimal distribution, by varying the coldness parameter of a Boltzmann distribution with the estimated sampling entropy as energy. Simulation results show a substantial improvement of both transient and asymptotic entropy compared to both uniform and direct-ergodic searches. Finally, a demonstration is performed with a TurtleBot swarm system to validate the physical applicability of the algorithm.
♻ ☆ Event-Driven Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees
This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.
comment: 14 pages
♻ ☆ Parasitic actuation delay limits the minimum employable time headway in connected and autonomous vehicles
Adaptive andcooperative adaptive cruise control (ACC and CACC) and next generation CACC (CACC+) systems usually employ a constant time headway policy (CTHP) for platooning of connected and autonomous vehicles (CAVs). In ACC, the ego vehicle uses onboard sensors to measure the position and velocity of the predecessor vehicle to maintain a desired spacing. The CACC and CACC+systems use additional information, such as acceleration(s) communicated through vehicle-to-vehicle (V2V) communication of the predecessor vehicle(s); these systems have been shown to result in improved spacing performance, throughput, and safety over ACC. Parasitic dynamics are generally difficult to model and the parasitic parameters (delay, lag, etc.) are difficult to obtain. Parasitic actuation delays can have deleterious effects and impose limits on the mobility and safety of CAVs. It is reasonable to assume that the bounds on parasitic actuation delays are known a priori. For CAVs, we need to address both internal stability and string stability in the presence of parasitic actuation delays. This requires robustness of string and internal stability for all values of parasitic actuation delays that are within the specified upper bound. In this paper, we provide the minimum employable time headway for ACC, CACC, and CACC+ (`r' predecessors look-ahead), respectively. The inclusion of the internal stability in the string stability condition is analyzed based on Pontryagin's interlacing theorem for time delay systems. We provide comparative numerical results to corroborate the achieved theoretical results.
♻ ☆ Neural Operator Feedback for a First-Order PIDE with Spatially-Varying State Delay
A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including in the limits of integration and as the inverse of the `delayED time' function. This, in turn, introduces novel mathematical challenges in estimating the operator's Lipschitz constant. The backstepping kernel function having two branches endows the feedback law with a two-branch structure, where only one of the two feedback branches depends on both of the kernel branches. For this rich feedback structure, we propose a neural operator approximation of such a two-branch feedback law and prove the approximator to be semiglobally practically stabilizing. With numerical results we illustrate the training of the neural operator and its stabilizing capability.
comment: This 14 page paper contains 1 table and 9 figures
♻ ☆ Optimal transmission expansion modestly reduces decarbonization costs of U.S. electricity
Expanding interregional transmission is widely viewed as essential for integrating clean energy into decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of transmission across least-cost, socially optimal, and zero-emissions scenarios for 2050. An optimal nationwide plan would more than triple interregional transmission capacity, yet this reduces the cost of a zero-emissions system by only 7% relative to relying on existing transmission, as storage, solar and wind siting, and nuclear generation serve as close substitutes. Regional cost and rent effects vary, with transmission generally favoring wind and hydrogen resources over solar and batteries. Sensitivity analysis shows diminishing returns: one-fifth of the benefits of full expansion can be achieved with one-twelfth of the added capacity, while cost reductions for batteries and hydrogen provide comparable or greater system savings than transmission. Reconductoring -- quadrupling line capacity at half the cost of new builds achieves nearly all the benefits of unconstrained expansion. These results suggest that while substantial transmission expansion is economically justified, a diverse set of flexibility resources can substitute for large-scale grid build-out, and the relative value of transmission is highly contingent on technological and cost developments.
comment: 30 pages, 10 figures, 1 table in main paper. Additional 11 pages including 7 additional figures and one table in the appendix
♻ ☆ Optimized dynamic scheduling of an exclusive bus lane or high occupancy vehicle lane in a bimodal traffic corridor
Efficient management of traffic corridors is critical for sustaining urban mobility. Exclusive bus lane (EBL) and high occupancy vehicle lane (HOVL) are two prominent strategies for enhancing public transit services and alleviating congestion. EBLs prioritize bus transit by providing dedicated lanes for faster travel times, while HOVLs encourage carpooling by reserving lanes for high-occupancy vehicles. However, static implementations of these policies may underutilize road resources and disrupt general-purpose lanes. Dynamic implementation, based on real-time demand, can potentially maximize road efficiency and minimize negative impacts. This study first compares the mixed traffic policy (MTP), exclusive bus lane policy (EBLP), and high occupancy vehicle lane policy (HOVLP) by formulating the total system costs in the context of a bimodal traffic corridor involving private cars and public buses. Under each lane policy, the bus frequency is optimized together with the modal split equilibrium derived separately. Based on dynamic demand simulated using an Ornstein-Uhlenbeck (O-U) process, switching thresholds are then derived to identify optimal periods for implementing each policy. Results reveal significant reductions in total system costs with the proposed dynamic policy schedules. Compared to static implementations, the dynamic policy schedules achieve cost reductions of 11.9%, 6.70%, and 43.64% relative to MTP-only, EBLP-only, and HOVLP-only scenarios, respectively. Additionally, in two real case studies of existing EBL and HOVL operations in Seattle, the proposed dynamic policy reduces total costs by 32.5% and 28.6%, respectively. The findings provide valuable insights for policymakers and transit planners, offering a robust framework for dynamically scheduling and integrating EBL and HOVL policies to optimize urban corridor efficiency and reduce overall system costs.
comment: This version is identical to the manuscript submitted to International Journal of Sustainable Transportation. It corrects inconsistencies present in the previous arXiv version and is the version under journal review. No scientific content has been altered
♻ ☆ MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Optimization and Control 40
☆ Profit Maximization for a Robotics-as-a-Service Model
The growth of Robotics-as-a-Service (RaaS) presents new operational challenges, particularly in optimizing business decisions like pricing and equipment management. While much research focuses on the technical aspects of RaaS, the strategic business problems of joint pricing and replacement have been less explored. This paper addresses the problem of profit maximization for an RaaS operator operating a single robot at a time. We formulate a model where jobs arrive sequentially, and for each, the provider must decide on a price, which the customer can accept or reject. Upon job completion, the robot undergoes stochastic degradation, increasing its probability of failure in future tasks. The operator must then decide whether to replace the robot, balancing replacement costs against future revenue potential and holding costs. To solve this complex sequential decision-making problem, we develop a framework that integrates data-driven estimation techniques inspired by survival analysis and inverse optimization to learn models of customer behavior and robot failure. These models are used within a Markov decision process (MDP) framework to compute an optimal policy for joint pricing and replacement. Numerical experiments demonstrate the efficacy of our approach in maximizing profit by adaptively managing pricing and robot lifecycle decisions.
The Trajectory Bundle Method: Unifying Sequential-Convex Programming and Sampling-Based Trajectory Optimization
We present a unified framework for solving trajectory optimization problems in a derivative-free manner through the use of sequential convex programming. Traditionally, nonconvex optimization problems are solved by forming and solving a sequence of convex optimization problems, where the cost and constraint functions are approximated locally through Taylor series expansions. This presents a challenge for functions where differentiation is expensive or unavailable. In this work, we present a derivative-free approach to form these convex approximations by computing samples of the dynamics, cost, and constraint functions and letting the solver interpolate between them. Our framework includes sample-based trajectory optimization techniques like model-predictive path integral (MPPI) control as a special case and generalizes them to enable features like multiple shooting and general equality and inequality constraints that are traditionally associated with derivative-based sequential convex programming methods. The resulting framework is simple, flexible, and capable of solving a wide variety of practical motion planning and control problems.
☆ Optimal Matching Strategies in Two-sided Markets: A Mean Field Approach
This paper develops a mean field game framework for dynamic two-sided matching markets, extending existing matching theory by integrating micro-macro dynamics in two-sided environments. Unlike traditional matching models focusing on static equilibrium or unilateral optimization, our framework simultaneously captures dynamic interactions and strategic behaviors of both market sides, as well as the equilibrium. We model two types of agents who meet each other via Poisson processes and make simultaneous matching decisions to maximize their respective objective functionals, and find the corresponding equilibrium. Our approach formulates the equilibrium as a fully coupled Hamilton-Jacobi-Bellman and Fokker-Planck system with nonlocal structure coupling two distinct populations. The mathematical analysis addresses significant challenges from the dual-layered coupling structure and nonlocal structure. We also provide insights into individual behaviors shaping aggregate patterns in labor markets through numerical experiments.
☆ Global Optimization Algorithm for Mixed-Integer Nonlinear Programs with Trigonometric Functions
This article presents the first mixed-integer linear programming (MILP)-based iterative algorithm to solve factorable mixed-integer nonlinear programs (MINLPs) with bounded, differentiable periodic functions to global optimality with an emphasis on trigonometric functions. At each iteration, the algorithm solves a MILP relaxation of the original MINLP to obtain a bound on the optimal objective value. The relaxations are constructed using partitions of variables involved in each nonlinear term and across successive iterations, the solution of the relaxations is used to refine these partitions further leading to tighter relaxations. Also, at each iteration, a heuristic/local solve on the MINLP is used to obtain a feasible solution to the MINLP. The iterative algorithm terminates till the optimality gap is sufficiently small. This article proposes novel refinement strategies that first choose a subset of variables whose domain is refined, refinement schemes that specify the manner in which the variable domains are refined, and MILP relaxations that exploit the principal domain of the periodic functions. We also show how solving the resulting MILP relaxation may be accelerated when two or more periodic functions are related by a linking constraint. This is especially useful as any periodic function may be approximated to arbitrary precision by a Fourier series. Finally, we examine the effectiveness of the proposed approach by solving a path planning problem for a single fixed-wing aerial vehicle and present extensive numerical results comparing the various refinement schemes and techniques.
comment: 49 pages
☆ ORACLE: A rigorous metric and method to explore all near-optimal designs for energy systems
Optimization models are fundamental tools for providing quantitative insights to decision-makers. However, models, objectives, and constraints do not capture all real-world factors accurately. Thus, instead of the single optimal solution, real-world stakeholders are often interested in the near-optimal space -- solutions that lie within a specified margin of the optimal objective value. Solutions in the near-optimal space can then be assessed regarding desirable non-modeled or qualitative aspects. The near-optimal space is usually explored by so-called Modelling to Generate Alternatives (MGA) methods. However, current MGA approaches mainly employ heuristics, which do not measure or guarantee convergence. We propose a method called ORACLE, which guarantees generation and exploration on the \emph{entire near-optimal} space by exploiting convexity. ORACLE iteratively approximates the near-optimal space by introducing a metric that both measures convergence and suggests exploration directions. Once the approximations are refined to a desired tolerance, any near-optimal designs can be generated with negligible computational effort. We compare our approach with existing methods on a sector-coupled energy system model of Switzerland. ORACLE is the only method able to guarantee convergence within a desired tolerance. Additionally, we show that heuristic MGA methods miss large areas of the near-optimal space, potentially skewing decision-making by leaving viable options for the energy transition off the table.
☆ Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning
The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.
☆ Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
☆ FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
Recently, a new optimization method based on the linear minimization oracle (LMO), called Muon, has been attracting increasing attention since it can train neural networks faster than existing adaptive optimization methods, such as Adam. In this paper, we study how Muon can be utilized in federated learning. We first show that straightforwardly using Muon as the local optimizer of FedAvg does not converge to the stationary point since the LMO is a biased operator. We then propose FedMuon which can mitigate this issue. We also analyze how solving the LMO approximately affects the convergence rate and find that, surprisingly, FedMuon can converge for any number of Newton-Schulz iterations, while it can converge faster as we solve the LMO more accurately. Through experiments, we demonstrated that FedMuon can outperform the state-of-the-art federated learning methods.
☆ Gain-Scheduled Data-Enabled Predictive Control: A DeePC Approach for Nonlinear Systems
Model predictive control is a well established control technology for trajectory tracking. Its use requires the availability of an accurate model of the plant, but obtaining such a model is often time consuming and costly. Data-Enabled Predictive Control (DeePC) addresses this shortcoming in the linear time-invariant setting, by skipping the model building step and instead relying directly on input-output data. Unfortunately, many real systems are nonlinear and exhibit strong operating-point dependence. Building on classical linear parameter-varying control, we introduce DeePC-GS, a gain-scheduled extension of DeePC for unknown, regime-varying systems. The key idea is to allow DeePC to switch between different local Hankel matrices -- selected online via a measurable scheduling variable -- thereby uniting classical gain scheduling tools with identification-free, data-driven MPC. We test the effectiveness of our DeePC-GS formulation on a nonlinear ship-steering benchmark, demonstrating that it outperforms state-of-the-art data-driven MPC while maintaining tractable computation.
comment: 6 pages
☆ Error bounds for perspective cones of a class of nonnegative Legendre functions
Error bounds play a central role in the study of conic optimization problems, including the analysis of convergence rates for numerous algorithms. Curiously, those error bounds are often H\"olderian with exponent 1/2. In this paper, we try to explain the prevalence of the 1/2 exponent by investigating generic properties of error bounds for conic feasibility problems where the underlying cone is a perspective cone constructed from a nonnegative Legendre function on $\mathbb{R}$. Our analysis relies on the facial reduction technique and the computation of one-step facial residual functions (1-FRFs). Specifically, under appropriate assumptions on the Legendre function, we show that 1-FRFs can be taken to be H\"olderian of exponent 1/2 almost everywhere with respect to the two-dimensional Hausdorff measure. This enables us to further establish that having a uniform H\"olderian error bound with exponent 1/2 is a generic property for a class of feasibility problems involving these cones.
comment: 45 pages, comments welcome
☆ A Single-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation
Bilevel optimization has garnered significant attention in the machine learning community recently, particularly regarding the development of efficient numerical methods. While substantial progress has been made in developing efficient algorithms for optimistic bilevel optimization, the study of methods for solving Pessimistic Bilevel Optimization (PBO) remains relatively less explored, especially the design of fully first-order, single-loop gradient-based algorithms. This paper aims to bridge this research gap. We first propose a novel smooth approximation to the PBO problem, using penalization and regularization techniques. Building upon this approximation, we then propose SiPBA (Single-loop Pessimistic Bilevel Algorithm), a new gradient-based method specifically designed for PBO which avoids second-order derivative information or inner-loop iterations for subproblem solving. We provide theoretical validation for the proposed smooth approximation scheme and establish theoretical convergence for the algorithm SiPBA. Numerical experiments on synthetic examples and practical applications demonstrate the effectiveness and efficiency of SiPBA.
☆ A Block-Activated Decomposition Algorithm for Multi-Stage Stochastic Variational Inequalities
We develop a block-activated decomposition algorithm for multi-stage stochastic variational inequalities with nonanticipativity constraints, which offers two computational novelties: (i) At each iteration, our method activates only a user-chosen block of scenarios. (ii) For each activated scenario, it employs the resolvent of the cost operator and the projector onto the constraint set separately. These features enhance computational tractability, in contrast with existing approaches, which often rely on evaluating the resolvent of the sum of the cost operator and normal cone operator of the constraint set. As an application, we demonstrate that in risk-averse stochastic programming with conditional value-at-risk objectives, our method requires only the projections onto constraint sets, together with solving a univariate equation involving the proximity operators of the cost functions.
☆ EVODiff: Entropy-aware Variance Optimized Diffusion Inference NeurIPS 2025
Diffusion models (DMs) excel in image generation, but suffer from slow inference and the training-inference discrepancies. Although gradient-based solvers like DPM-Solver accelerate the denoising inference, they lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.
comment: NeurIPS 2025, 40 pages, 14 figures
☆ A Martingale approach to continuous Portfolio Optimization under CVaR like constraints
We study a continuous-time portfolio optimization problem under an explicit constraint on the Deviation Conditional Value-at-Risk (DCVaR), defined as the difference between the CVaR and the expected terminal wealth. While the mean-CVaR framework has been widely explored, its time-inconsistency complicates the use of dynamic programming. We follow the martingale approach in a complete market setting, as in Gao et al. [4], and extend it by retaining an explicit DCVaR constraint in the problem formulation. The optimal terminal wealth is obtained by solving a convex constrained minimization problem. This leads to a tractable and interpretable characterization of the optimal strategy.
☆ Hamilton--Jacobi--Bellman equation for optimal control of stochastic Wasserstein--Hamiltonian system on graphs
Stochastic optimal control problems for Hamiltonian dynamics on graphs have wide-ranging applications in mechanics and quantum field theory, particularly in systems with graph-based structures. In this paper, we establish the existence and uniqueness of viscosity solutions for a new class of Hamilton--Jacobi--Bellman (HJB) equations arising from the optimal control of stochastic Wasserstein--Hamiltonian systems (SWHSs) on graphs. One distinctive feature of these HJB equations is the simultaneous involvement of the Wasserstein geometry on the Wasserstein space over graphs and the Euclidean geometry in physical space. The nonlinear geometric structure, along with the logarithmic potential induced by the graph-based state equation, adds further complexity to the analysis. To address these challenges, we introduce an energy-truncation technique within the doubling of variables framework, specifically designed to handle the interaction between the interiorly defined Wasserstein space on graphs and the unbounded Euclidean space. In particular, our findings demonstrate the well-posedness of HJB equations related to optimal control problems for both stochastic Schr\"odinger equation with polynomial nonlinearity and stochastic logarithmic Schr\"odinger equation on graphs. To the best of our knowledge, this work is the first to develop HJB equations for the optimal control of SWHSs on graphs.
comment: 43 pages
☆ Network Consensus in the Wasserstein Space of Probability Measures Defined on Multi-Dimensional Euclidean Spaces
The consensus problem -- achieving agreement among a network of agents -- is a central theme in both theory and applications. Recently, this problem has been extended from Euclidean spaces to the space of probability measures, where the natural notion of averaging is given by the Wasserstein barycenter. While prior work established convergence in one dimension, the case of higher dimensions poses additional challenges due to the curved geometry of Wasserstein space. In this paper, we develop a framework for analyzing such consensus algorithms by employing a Wasserstein version of Jensen's inequality. This tool provides convexity-type estimates that allow us to prove convergence of nonlinear consensus dynamics in the Wasserstein space of probability measures on $\mathbb{R}^d$.
☆ Sharpness of Minima in Deep Matrix Factorization: Exact Expressions
Understanding the geometry of the loss landscape near a minimum is key to explaining the implicit bias of gradient-based methods in non-convex optimization problems such as deep neural network training and deep matrix factorization. A central quantity to characterize this geometry is the maximum eigenvalue of the Hessian of the loss, which measures the sharpness of the landscape. Currently, its precise role has been obfuscated because no exact expressions for this sharpness measure were known in general settings. In this paper, we present the first exact expression for the maximum eigenvalue of the Hessian of the squared-error loss at any minimizer in general overparameterized deep matrix factorization (i.e., deep linear neural network training) problems, resolving an open question posed by Mulayoff & Michaeli (2020). To complement our theory, we empirically investigate an escape phenomenon observed during gradient-based training near a minimum that crucially relies on our exact expression of the sharpness.
comment: 18 pages, 7 figures
☆ Loss-Transformation Invariance in the Damped Newton Method
The Newton method is a powerful optimization algorithm, valued for its rapid local convergence and elegant geometric properties. However, its theoretical guarantees are usually limited to convex problems. In this work, we ask whether convexity is truly necessary. We introduce the concept of loss-transformation invariance, showing that damped Newton methods are unaffected by monotone transformations of the loss - apart from a simple rescaling of the step size. This insight allows difficult losses to be replaced with easier transformed versions, enabling convexification of many nonconvex problems while preserving the same sequence of iterates. Our analysis also explains the effectiveness of unconventional stepsizes in Newton's method, including values greater than one and even negative steps.
comment: 10 pages
☆ Policy Optimization in Robust Control: Weak Convexity and Subgradient Methods
Robust control seeks stabilizing policies that perform reliably under adversarial disturbances, with $\mathcal{H}_\infty$ control as a classical formulation. It is known that policy optimization of robust $\mathcal{H}_\infty$ control naturally lead to nonsmooth and nonconvex problems. This paper builds on recent advances in nonsmooth optimization to analyze discrete-time static output-feedback $\mathcal{H}_\infty$ control. We show that the $\mathcal{H}_\infty$ cost is weakly convex over any convex subset of a sublevel set. This structural property allows us to establish the first non-asymptotic deterministic convergence rate for the subgradient method under suitable assumptions. In addition, we prove a weak Polyak-{\L}ojasiewicz (PL) inequality in the state-feedback case, implying that all stationary points are globally optimal. We finally present a few numerical examples to validate the theoretical results.
comment: 9 pages, 11 figures
☆ Mean Field Type Control Problems Driven by Jump-diffusions
In this article, we apply a probabilistic approach to study general mean field type control (MFTC) problems with jump-diffusions, and give the first global-in-time solution. We allow the drift coefficient $b$ and the diffusion coefficient $\sigma$ to nonlinearly depend on the state, distribution and control variables, and both can be unbounded and possibly degenerate; besides, the jump coefficient $\gamma$ is allowed to be non-constant. To tackle the non-linear and control-dependent diffusion $\sigma$, we further formulate a joint cone property and estimates for both processes $P$ and $Q$ of the corresponding adjoint process (where $(P,Q,R)$ is the solution triple of the associated adjoint process as a backward stochastic differential equation with jump), in contrast to our previous single cone property of the only process $P$. We study first the system of forward-backward stochastic differential equations (FBSDEs) with jumps arising from the maximum principle, and then the related Jacobian flows, which altogether yield the classical regularity of the value function and thus allow us to show that the value function is the unique classical solution of the HJB integro-partial differential equation. Most importantly, our proposed probabilistic approach can apparently handle the MFTC problem driven by a fairly general process far beyond Brownian motion, in a relatively easier manner than the existing analytic approach.
♻ ☆ Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to American Control Conference 2026
♻ ☆ A scalable optimization approach for equitable facility location: Methodology and transportation applications
Efficient and equitable access to essential services, such as healthcare, food, and education, is an important goal in urban planning, public policy, and transport logistics. However, existing facility location models often do not scale well to large instances, or primarily focus on optimizing average accessibility, neglecting equity concerns, particularly for disadvantaged populations. This paper proposes a novel, scalable framework for equitable facility location, introducing a linearized proxy for the Kolm-Pollak Equally-Distributed Equivalent (EDE) metric to balance efficiency and fairness. Computational experiments demonstrate that our approach scales to extremely large problem instances, while being sensitive enough to account for inequity throughout the distribution, not merely via the maximum value. Moreover, optimal solutions represent significant improvements for the worst-off residents in terms of distance to an open amenity, while also attaining a near-optimal average experience for all users. An extensive real-world case study on supermarket access illustrates the practical applicability of the framework, with additional examples coming from polling applications. As such, the model is extended to handle real-world considerations such as capacity constraints, split demand assignments, and location-specific penalties. By bridging the gap between equity theory and practical optimization, this work offers a robust and versatile tool for researchers and practitioners in urban planning, transportation, and public policy.
♻ ☆ A Globally Convergent Method for Computing B-stationary Points of Mathematical Programs with Equilibrium Constraints
This paper introduces a computationally efficient method that globally converges to B-stationary points of mathematical programs with equilibrium constraints (MPECs) in a finite number of iterations. B-stationarity is necessary for optimality and means that no feasible first-order direction can improve the objective. Given a feasible point of an MPEC, B-stationarity can be certified by solving a linear program with equilibrium constraints (LPCC) constructed at this point. The proposed method solves a sequence of LPCCs, which either certify B-stationarity or provide an active-set estimate for the complementarity constraints, along with nonlinear programs (NLPs) -- referred to as branch NLPs (BNLPs) -- obtained by fixing the active set in the MPEC. A BNLP is more regular than the original MPEC, easier to solve, and with the correct active set, its solution coincides with that of the MPEC. We show that, unless the current iterate is B-stationary, these combinatorial LPCCs need not be solved to optimality; for convergence, it suffices to compute a nonzero feasible point, yielding significant computational savings. The method proceeds in two phases: the first identifies a feasible BNLP or a {stationary point of a constraint infeasibility minimization problem, and the second solves a finite sequence of BNLPs until a B-stationary point of the MPEC is found. We established finite convergence under the MPEC-MFCQ. Numerical experiments and an open-source software implementation show that the proposed method is more robust and faster than relaxation-based and mixed-integer NLP approaches, even on medium to large-scale instances.
comment: Submitted to Mathematical Programming A
♻ ☆ Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on smartphones and other heterogeneous edge devices. In addition to varying computation speeds, these devices often hold data from different distributions. However, existing asynchronous SGD methods struggle in such heterogeneous settings and face two key limitations. First, many rely on unrealistic assumptions of similarity across workers' data distributions. Second, methods that relax this assumption still fail to achieve theoretically optimal performance under heterogeneous computation times. We introduce Ringleader ASGD, the first asynchronous SGD algorithm that attains the theoretical lower bounds for parallel first-order stochastic methods in the smooth nonconvex regime, thereby achieving optimal time complexity under data heterogeneity and without restrictive similarity assumptions. Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary and even time-varying worker computation speeds, closing a fundamental gap in the theory of asynchronous optimization.
♻ ☆ Revisiting implicit variables in mathematical optimization: simplified modeling and a numerical evidence
Implicit variables of an optimization problem are used to model variationally challenging feasibility conditions in a tractable way while not entering the objective function. Hence, it is a standard approach to treat implicit variables as explicit ones. Recently, it has been shown in terms of a comparatively complex model problem that this approach, generally, is theoretically disadvantageous as the surrogate problem typically suffers from the presence of artificial stationary points and the need for stronger constraint qualifications. The purpose of the present paper is twofold. First, it introduces a much simpler and easier accessible model problem which can be used to recapitulate and even broaden the aforementioned findings. Indeed, we will extend the analysis to two more classes of stationary points and the associated constraint qualifications. These theoretical results are accompanied by illustrative examples from cardinality-constrained, vanishing-constrained, and bilevel optimization. Second, the present paper illustrates, in terms of cardinality-constrained portfolio optimization problems, that treating implicit variables as explicit ones may also be disadvantageous from a numerical point of view.
comment: 45 pages, 5 figures
♻ ☆ Learning from the past in an irreversible investment problem
We consider an irreversible investment problem under incomplete information, where the investor decides whether and when to make investments in a project. Upon investment, the investor acquires previously hidden information from the project's past (''learning from the past''), and so the learning rate of the problem is controlled by investing. We set up this original problem as an recursively defined stopping problem, where the learning rate is accelerated after each recursion step. To solve the problem, we show that at each step, there indeed exists a one-sided stopping boundary under general conditions. We proceed to present the optimal investment strategy as a sequence of semi-explicit stopping boundaries derived from smooth fit conditions. Feasibility of our approach is then demonstrated by solving boundaries numerically and by illustrating comparative statistics.
comment: In new version: rewritten sections 1 and 2 to better match the actual problem being solved, removed unnecessary references, and minor polishing
♻ ☆ Extended Set Difference : Inverse Operation of Minkowski Summation
This paper introduces the extended set difference, a generalization of the Hukuhara and generalized Hukuhara differences, defined for compact convex sets in $\mathbb{R}^d$. The proposed difference guarantees existence for any pair of such sets, offering a broader framework for set arithmetic. The difference may not be necessarily unique, but we offer a bound on the variety of solutions. The definition of the extended set difference is formulated through an optimization problem, which provides a constructive approach to its computation. The paper explores the properties of this new difference, including its stability under orthogonal transformations and its robustness to perturbations of the input sets. We propose a method to compute this difference through a formulated linear optimization problem.
♻ ☆ Global controllability of Boussinesq flows by using only a temperature control
We show that buoyancy driven flows can be steered in an arbitrary time towards any state by applying as control only an external temperature profile in a subset of small measure. More specifically, we prove that the 2D incompressible Boussinesq system on the torus is globally approximately controllable via physically localized heating or cooling. In addition, our controls have an explicitly prescribed structure; even without such structural requirements, large data controllability results for Boussinesq flows driven merely by a physically localized temperature profile were so far unknown. The presented method exploits various connections between the model's underlying transport-, coupling-, and scaling mechanisms.
comment: 30 pages, 4 figures
♻ ☆ Controllability of Boussinesq flows driven by finite-dimensional and physically localized forces
We show approximate controllability of Boussinesq flows in $\mathbb{T}^2 = \mathbb{R}^2 / 2\pi\mathbb{Z}^2$ driven by finite-dimensional controls that are supported in any fixed region $\omega \subset \mathbb{T}^2$. This addresses a Boussinesq version of a question by Agrachev and provides the first known example of incompressible fluids with this property. In this context, we complement results obtained for the Navier--Stokes system by Agrachev--Sarychev (Comm. Math. Phys. 265, 2006), where the controls are finite-dimensional but not localized in physical space, and Nersesyan--Rissel (Comm. Pure Appl. Math. 78, 2025), where physically localized controls admit for special $\omega$ a degenerate but not finite-dimensional structure. For our proof, we study controllability properties of tailored convection equations governed by time-periodic degenerately forced Euler flows that provide a twofold geometric mechanism: transport of information through $\omega$ versus non-stationary mixing effects transferring energy from low-dimensional sources to higher frequencies. The temperature is then controlled by using Coron's return method, while the velocity is mainly driven by the buoyant force. When $\omega$ contains two cuts of $\mathbb{T}^2$, our approach allows to effectively construct low-dimensional control spaces of dimensions that are independent of the choice of $\omega$ within this class of control regions.
comment: 48 pages, 2 figures; improved description of control spaces in Section 1.3
♻ ☆ On some perturbation properties of nonsmooth optimization on Riemannian manifolds with applications
This paper presents a perturbation analysis framework for nonsmooth optimization on connected Riemannian manifolds to bridge the gap between the rapid development of algorithmic approaches and a robust theoretical foundation. Using tangent-space local models, we transport core notions from Euclidean variational analysis, such as strong regularity, the Aubin property, and isolated calmness of the Karush-Kuhn-Tucker (KKT) solution mapping, to the manifold setting. Furthermore, we introduce the manifold (strong) variational sufficiency and show that its strong version is intrinsic, i.e., independent of the chosen retraction, and for polyhedral, second-order cone, and semidefinite programs, it coincides with the manifold strong second-order sufficient condition. These insights yield concrete algorithmic consequences. We show that the Riemannian Sequential Quadratic Programming achieves local superlinear and, under mild additional assumptions, quadratic convergence without strict complementarity, while the Riemannian Augmented Lagrangian Method attains R-linear convergence even when Lagrange multipliers are nonunique. Moreover, the proposed condition guarantees positive definiteness of the generalized Hessians associated with the augmented Lagrangian, enabling superlinear semismooth Newton steps in inner solves. Numerical experiments on robust matrix completion and compressed modes validate the theoretical predictions.
♻ ☆ Data-Driven Estimation of Structured Singular Values
Estimating the size of the modeling error is crucial for robust control. Over the years, numerous metrics have been developed to quantify the model error in a control relevant manner. One of the most important such metrics is the structured singular value, as it leads to necessary and sufficient conditions for ensuring stability and robustness in feedback control under structured model uncertainty. Although the computation of the structured singular value is often intractable, lower and upper bounds for it can often be obtained if a model of the system is known. In this paper, we introduce a fully data-driven method to estimate a lower bound for the structured singular value, by conducting experiments on the system and applying power iterations to the collected data. Our numerical simulations demonstrate that this method effectively lower bounds the structured singular value, yielding results comparable to the MATLAB$^\copyright$ Robust Control Toolbox.
comment: Published in IEEE Control Systems Letters, vol. 9, pp. 1976-1981, 2025
♻ ☆ Spatial exponential decay of perturbations in optimal control of general evolution equations
We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.
comment: 53 pages, 5 figures
♻ ☆ Efficiently Escaping Saddle Points for Policy Optimization
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(\epsilon^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms. Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.
comment: arXiv admin note: text overlap with arXiv:2205.08253
♻ ☆ A space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints
Imposing additional constraints on low-rank optimization has garnered growing interest. However, the geometry of coupled constraints hampers the well-developed low-rank structure and makes the problem intricate. To this end, we propose a space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints. The "space-decoupling" is reflected in several ways. We show that the tangent cone of coupled constraints is the intersection of tangent cones of each constraint. Moreover, we decouple the intertwined bounded-rank and orthogonally invariant constraints into two spaces, leading to optimization on a smooth manifold. Implementing Riemannian algorithms on this manifold is painless as long as the geometry of additional constraints is known. In addition, we unveil the equivalence between the reformulated problem and the original problem. Numerical experiments on real-world applications -- spherical data fitting, graph similarity measuring, low-rank SDP, model reduction of Markov processes, reinforcement learning, and deep learning -- validate the superiority of the proposed framework.
comment: 50 pages, 12 figures, 6 tables
♻ ☆ Warm-starting outer approximation for parametrized convex MINLP
We address the challenge of efficiently solving parameterized sequences of convex Mixed-Integer Nonlinear Programming (MINLP) problems through warm-starting techniques. We focus on an outer approximation (OA) approach, for which we develop the theoretical foundation and present two warm-starting techniques for solving sequences of convex MINLPs. These types of problem sequences arise in several important applications, such as, multiobjective MINLPs using scalarization techniques, sparse linear regression, hybrid model predictive control, or simply in analyzing the impact of certain problem parameters. The main contribution of this paper is the mathematical analysis of the proposed warm-starting framework for OA-based algorithms, which shows that a simple adaptation of the polyhedral outer approximation from one problem to the next can greatly improve the computational performance. We prove that, under some conditions, one of the proposed warm-starting techniques result in only one OA iteration to find an optimal solution and verify optimality. Numerical results demonstrate noticeable performance improvements compared to two common initialization approaches, and show that the warm-starting can also in practice result in a single iteration to converge for several problems in the sequences. Our methods are especially effective for problems where consecutive problems in the sequence are similar, and where the integer part of the optimal solutions are constant for several problems in the sequence. The results show that it is possible, both in theory and practice, to perform warm-starting to significantly enhance the computational efficiency of solving parameterized convex MINLPs.
comment: 28 pages, 7 figures; edited example 12, reorganization of text in results, typos corrected, other minor edits
♻ ☆ Construct to Commitment: The Effect of Narratives on Economic Growth
We study how government-led narratives through mass media evolve from construct, a mechanism for framing expectations, into commitment, a sustainable pillar for growth. We propose the "Narratives-Construct-Commitment (NCC)" framework outlining the mechanism and institutionalization of narratives, and formalize it as a dynamic Bayesian game. Using the Innovation-Driven Development Strategy (2016) as a case study, we identify the narrative shock from high-frequency financial data and trace its impact using local projection method. By shaping expectations, credible narratives institutionalize investment incentives, channel resources into R\&D, and facilitate sustained improvements in total factor productivity (TFP). Our findings strive to provide insights into the New Quality Productive Forces initiative, highlighting the role of narratives in transforming vision into tangible economic growth.
comment: This preprint constitutes the original version of the work. In case of any manuscripts or presentations with overlapping or substantially similar content (including variants by the authors or co-authors in any venue), the copyright and priority of authorship remain with this version
♻ ☆ Orbits and attainable Hamiltonian diffeomorphisms of mechanical Liouville equations
We study the approximate controllability problem for Liouville transport equations along a mechanical Hamiltonian vector field. Such PDEs evolve inside the orbit $$\mathcal{O}(\rho_0):=\left\{\rho_0\circ \Phi\mid \Phi\in {\rm DHam}(T^*M)\right\},\quad \rho_0\in L^r(T^*M,\mathbb{R}), \quad r\in[1,\infty),$$ where $\rho_0$ is the initial density and ${\rm DHam}(T^*M)$ is the group of Hamiltonian diffeomorphisms of the cotangent bundle manifold $T^*M$. The approximately reachable densities from $\rho_0$ are thus contained in $\overline{\mathcal{O}(\rho_0)}$, where the closure is taken with respect to the $L^r$-topology. Our first result is a characterization of $\overline{\mathcal{O}(\rho_0)}$ when the manifold $M$ is the Euclidean space $\mathbb{R}^d$ or the torus $\mathbb{T}^d$ of arbitrary dimension: $\overline{\mathcal{O}(\rho_0)}$ is the set of all the densities whose sub- and super-level sets have the same measure as those of $\rho_0$. This result is an approximate version, in the case of ${\rm DHam}(T^*M)$, of a theorem by J. Moser (Trans. Am. Math. Soc. 120: 286-294, 1965) on the group of diffeomorphisms. We then present two examples of systems, respectively on $M=\mathbb{R}^d$ and $\mathbb{T}^d$, where the small-time approximately attainable diffeomorphisms coincide with ${\rm DHam}(T^*M)$, respectively at the level of the group and at the level of the densities. The proofs are based on the construction of Hamiltonian diffeomorphisms that approximate suitable permutations of finite grids, and Poisson bracket techniques.
♻ ☆ Incorporating Preconditioning into Accelerated Approaches: Theoretical Guarantees and Practical Improvement
Machine learning and deep learning are widely researched fields that provide solutions to many modern problems. Due to the complexity of new problems related to the size of datasets, efficient approaches are obligatory. In optimization theory, the Heavy Ball and Nesterov methods use \textit{momentum} in their updates of model weights. On the other hand, the minimization problems considered may be poorly conditioned, which affects the applicability and effectiveness of the aforementioned techniques. One solution to this issue is \textit{preconditioning}, which has already been investigated in approaches such as \textsc{AdaGrad}, \textsc{RMSProp}, \textsc{Adam} and others. Despite this, momentum acceleration and preconditioning have not been fully explored together. Therefore, we propose the Preconditioned Heavy Ball (\textsc{PHB}) and Preconditioned Nesterov method (\textsc{PN}) with theoretical guarantees of convergence under \textit{unified} assumption on the scaling matrix. Furthermore, we provide numerical experiments that demonstrate superior performance compared to the unscaled techniques in terms of iteration and oracle complexities.
comment: 18 pages, 2 figures
♻ ☆ Asymptotic Weighted Approximation of Convex Functions
Extending classical results on polytopal approximation of convex bodies, we derive asymptotic formulas for the weighted approximation of smooth convex functions by piecewise affine convex functions as the number of their facets tends to infinity. These asymptotic expressions are formulated in terms of a functional that extends the notion of affine surface area to the functional setting.
♻ ☆ Optimized dynamic scheduling of an exclusive bus lane or high occupancy vehicle lane in a bimodal traffic corridor
Efficient management of traffic corridors is critical for sustaining urban mobility. Exclusive bus lane (EBL) and high occupancy vehicle lane (HOVL) are two prominent strategies for enhancing public transit services and alleviating congestion. EBLs prioritize bus transit by providing dedicated lanes for faster travel times, while HOVLs encourage carpooling by reserving lanes for high-occupancy vehicles. However, static implementations of these policies may underutilize road resources and disrupt general-purpose lanes. Dynamic implementation, based on real-time demand, can potentially maximize road efficiency and minimize negative impacts. This study first compares the mixed traffic policy (MTP), exclusive bus lane policy (EBLP), and high occupancy vehicle lane policy (HOVLP) by formulating the total system costs in the context of a bimodal traffic corridor involving private cars and public buses. Under each lane policy, the bus frequency is optimized together with the modal split equilibrium derived separately. Based on dynamic demand simulated using an Ornstein-Uhlenbeck (O-U) process, switching thresholds are then derived to identify optimal periods for implementing each policy. Results reveal significant reductions in total system costs with the proposed dynamic policy schedules. Compared to static implementations, the dynamic policy schedules achieve cost reductions of 11.9%, 6.70%, and 43.64% relative to MTP-only, EBLP-only, and HOVLP-only scenarios, respectively. Additionally, in two real case studies of existing EBL and HOVL operations in Seattle, the proposed dynamic policy reduces total costs by 32.5% and 28.6%, respectively. The findings provide valuable insights for policymakers and transit planners, offering a robust framework for dynamically scheduling and integrating EBL and HOVL policies to optimize urban corridor efficiency and reduce overall system costs.
comment: This version is identical to the manuscript submitted to International Journal of Sustainable Transportation. It corrects inconsistencies present in the previous arXiv version and is the version under journal review. No scientific content has been altered
Robotics 37
☆ MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching
Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.
☆ Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
☆ Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
☆ Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
☆ Online Mapping for Autonomous Driving: Addressing Sensor Generalization and Dynamic Map Updates in Campus Environments
High-definition (HD) maps are essential for autonomous driving, providing precise information such as road boundaries, lane dividers, and crosswalks to enable safe and accurate navigation. However, traditional HD map generation is labor-intensive, expensive, and difficult to maintain in dynamic environments. To overcome these challenges, we present a real-world deployment of an online mapping system on a campus golf cart platform equipped with dual front cameras and a LiDAR sensor. Our work tackles three core challenges: (1) labeling a 3D HD map for campus environment; (2) integrating and generalizing the SemVecMap model onboard; and (3) incrementally generating and updating the predicted HD map to capture environmental changes. By fine-tuning with campus-specific data, our pipeline produces accurate map predictions and supports continual updates, demonstrating its practical value in real-world autonomous driving scenarios.
comment: 19th International Symposium on Experimental Robotics
☆ LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., "the black car on the right"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language models for fine-grained attribute extraction with large language models for symbolic reasoning. LLM-RG processes an image and a free-form referring expression by using an LLM to extract relevant object types and attributes, detecting candidate regions, generating rich visual descriptors with a VLM, and then combining these descriptors with spatial metadata into natural-language prompts that are input to an LLM for chain-of-thought reasoning to identify the referent's bounding box. Evaluated on the Talk2Car benchmark, LLM-RG yields substantial gains over both LLM and VLM-based baselines. Additionally, our ablations show that adding 3D spatial cues further improves grounding. Our results demonstrate the complementary strengths of VLMs and LLMs, applied in a zero-shot manner, for robust outdoor referential grounding.
☆ Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity
We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tailored to the edge domain to achieve robust pose estimation using only low-fidelity, textureless 3D models as inputs. Extensive evaluations on synthetic datasets as well as from physical testbeds on Earth and in situ Mars imagery shows that our method consistently beats the state of the art in compute and memory-constrained localization, both in terms of robustness and accuracy, in turn enabling new possibilities for cheap and reliable localization on general-purpose hardware.
comment: To appear in IEEE Robotics and Automation Letters
☆ World Model for AI Autonomous Navigation in Mechanical Thrombectomy
Autonomous navigation for mechanical thrombectomy (MT) remains a critical challenge due to the complexity of vascular anatomy and the need for precise, real-time decision-making. Reinforcement learning (RL)-based approaches have demonstrated potential in automating endovascular navigation, but current methods often struggle with generalization across multiple patient vasculatures and long-horizon tasks. We propose a world model for autonomous endovascular navigation using TD-MPC2, a model-based RL algorithm. We trained a single RL agent across multiple endovascular navigation tasks in ten real patient vasculatures, comparing performance against the state-of-the-art Soft Actor-Critic (SAC) method. Results indicate that TD-MPC2 significantly outperforms SAC in multi-task learning, achieving a 65% mean success rate compared to SAC's 37%, with notable improvements in path ratio. TD-MPC2 exhibited increased procedure times, suggesting a trade-off between success rate and execution speed. These findings highlight the potential of world models for improving autonomous endovascular navigation and lay the foundation for future research in generalizable AI-driven robotic interventions.
comment: Published in Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, Lecture Notes in Computer Science, vol 15968
☆ Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, to be published in the proceedings of the International Workshop on Active Inference 2025
☆ Infrastructure Sensor-enabled Vehicle Data Generation using Multi-Sensor Fusion for Proactive Safety Applications at Work Zone
Infrastructure-based sensing and real-time trajectory generation show promise for improving safety in high-risk roadway segments such as work zones, yet practical deployments are hindered by perspective distortion, complex geometry, occlusions, and costs. This study tackles these barriers by integrating roadside camera and LiDAR sensors into a cosimulation environment to develop a scalable, cost-effective vehicle detection and localization framework, and employing a Kalman Filter-based late fusion strategy to enhance trajectory consistency and accuracy. In simulation, the fusion algorithm reduced longitudinal error by up to 70 percent compared to individual sensors while preserving lateral accuracy within 1 to 3 meters. Field validation in an active work zone, using LiDAR, a radar-camera rig, and RTK-GPS as ground truth, demonstrated that the fused trajectories closely match real vehicle paths, even when single-sensor data are intermittent or degraded. These results confirm that KF based sensor fusion can reliably compensate for individual sensor limitations, providing precise and robust vehicle tracking capabilities. Our approach thus offers a practical pathway to deploy infrastructure-enabled multi-sensor systems for proactive safety measures in complex traffic environments.
☆ CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation
Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their inherent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current human datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two-stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation, primarily comparing the responses to external disturbances under different compliance settings.
comment: Submitted to IEEE for possible publication, under review
☆ Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models
Actor-Critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While considerable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose \pachs{} (\textit{P}arallel \textit{A}ctor-\textit{C}ritic \textit{H}euristic \textit{S}earch), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search -- actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic manipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.
comment: Submitted for Publication
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Large-scale robot learning has recently shown promise for enabling robots to perform complex tasks by integrating perception, control, and language understanding. Yet, it struggles with long-horizon, contact-rich manipulation such as deformable object handling, where demonstration quality is inconsistent. Reward modeling offers a natural solution: by providing grounded progress signals, it transforms noisy demonstrations into stable supervision that generalizes across diverse trajectories. We introduce a stage-aware, video-based reward modeling framework that jointly predicts high-level task stages and fine-grained progress. Reward labels are automatically derived from natural language subtask annotations, ensuring consistent progress estimation across variable-length demonstrations. This design overcomes frame-index labeling, which fails in variable-duration tasks like folding a T-shirt. Our reward model demonstrates robustness to variability, generalization to out-of-distribution settings, and strong utility for policy training. Building on it, we propose Reward-Aligned Behavior Cloning (RA-BC), which filters high-quality data and reweights samples by reward. Experiments show the reward model alone outperforms baselines on validation and real robot rollouts. Integrated into RA-BC, our approach achieves 83\% success on folding T-shirts from the flattened state and 67\% from the crumpled state -- far surpassing vanilla behavior cloning, which attains only 8\% and 0\% success. Overall, our results highlight reward modeling as a key enabler for scalable, annotation-efficient, and robust imitation learning in long-horizon manipulation.
☆ SRMP: Search-Based Robot Motion Planning Library
Motion planning is a critical component in any robotic system. Over the years, powerful tools like the Open Motion Planning Library (OMPL) have been developed, offering numerous motion planning algorithms. However, existing frameworks often struggle to deliver the level of predictability and repeatability demanded by high-stakes applications -- ranging from ensuring safety in industrial environments to the creation of high-quality motion datasets for robot learning. Complementing existing tools, we introduce SRMP (Search-based Robot Motion Planning), a new software framework tailored for robotic manipulation. SRMP distinguishes itself by generating consistent and reliable trajectories, and is the first software tool to offer motion planning algorithms for multi-robot manipulation tasks. SRMP easily integrates with major simulators, including MuJoCo, Sapien, Genesis, and PyBullet via a Python and C++ API. SRMP includes a dedicated MoveIt! plugin that enables immediate deployment on robot hardware and seamless integration with existing pipelines. Through extensive evaluations, we demonstrate in this paper that SRMP not only meets the rigorous demands of industrial and safety-critical applications but also sets a new standard for consistency in motion planning across diverse robotic systems. Visit srmp.readthedocs.io for SRMP documentation and tutorials.
comment: Submitted for Publication
☆ Fast Feature Field ($\text{F}^3$): A Predictive Representation of Events
This paper develops a mathematical argument and algorithms for building representations of data from event-based cameras, that we call Fast Feature Field ($\text{F}^3$). We learn this representation by predicting future events from past events and show that it preserves scene structure and motion information. $\text{F}^3$ exploits the sparsity of event data and is robust to noise and variations in event rates. It can be computed efficiently using ideas from multi-resolution hash encoding and deep sets - achieving 120 Hz at HD and 440 Hz at VGA resolutions. $\text{F}^3$ represents events within a contiguous spatiotemporal volume as a multi-channel image, enabling a range of downstream tasks. We obtain state-of-the-art performance on optical flow estimation, semantic segmentation, and monocular metric depth estimation, on data from three robotic platforms (a car, a quadruped robot and a flying platform), across different lighting conditions (daytime, nighttime), environments (indoors, outdoors, urban, as well as off-road) and dynamic vision sensors (resolutions and event rates). Our implementations can predict these tasks at 25-75 Hz at HD resolution.
comment: 39 pages, 9 figures
☆ Safe Planning in Unknown Environments using Conformalized Semantic Maps
This paper addresses semantic planning problems in unknown environments under perceptual uncertainty. The environment contains multiple unknown semantically labeled regions or objects, and the robot must reach desired locations while maintaining class-dependent distances from them. We aim to compute robot paths that complete such semantic reach-avoid tasks with user-defined probability despite uncertain perception. Existing planning algorithms either ignore perceptual uncertainty - thus lacking correctness guarantees - or assume known sensor models and noise characteristics. In contrast, we present the first planner for semantic reach-avoid tasks that achieves user-specified mission completion rates without requiring any knowledge of sensor models or noise. This is enabled by quantifying uncertainty in semantic maps - constructed on-the-fly from perceptual measurements - using conformal prediction in a model- and distribution-free manner. We validate our approach and the theoretical mission completion rates through extensive experiments, showing that it consistently outperforms baselines in mission success rates.
comment: 8 pages, 5 figures, 2 algorithms, 1 table
☆ Curriculum Imitation Learning of Distributed Multi-Robot Policies
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
comment: Accepted and presented at the Eight Iberian Robotics Conference, 2025
☆ Crop Spirals: Re-thinking the field layout for future robotic agriculture
Conventional linear crop layouts, optimised for tractors, hinder robotic navigation with tight turns, long travel distances, and perceptual aliasing. We propose a robot-centric square spiral layout with a central tramline, enabling simpler motion and more efficient coverage. To exploit this geometry, we develop a navigation stack combining DH-ResNet18 waypoint regression, pixel-to-odometry mapping, A* planning, and model predictive control (MPC). In simulations, the spiral layout yields up to 28% shorter paths and about 25% faster execution for waypoint-based tasks across 500 waypoints than linear layouts, while full-field coverage performance is comparable to an optimised linear U-turn strategy. Multi-robot studies demonstrate efficient coordination on the spirals rule-constrained graph, with a greedy allocator achieving 33-37% lower batch completion times than a Hungarian assignment under our setup. These results highlight the potential of redesigning field geometry to better suit autonomous agriculture.
comment: Submitted to Computers and Electronics in Agriculture
☆ Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
☆ AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
As robots transition from controlled settings to unstructured human environments, building generalist agents that can reliably follow natural language instructions remains a central challenge. Progress in robust mobile manipulation requires large-scale multimodal datasets that capture contact-rich and long-horizon tasks, yet existing resources lack synchronized force-torque sensing, hierarchical annotations, and explicit failure cases. We address this gap with the AIRoA MoMa Dataset, a large-scale real-world multimodal dataset for mobile manipulation. It includes synchronized RGB images, joint states, six-axis wrist force-torque signals, and internal robot states, together with a novel two-layer annotation schema of sub-goals and primitive actions for hierarchical learning and error analysis. The initial dataset comprises 25,469 episodes (approx. 94 hours) collected with the Human Support Robot (HSR) and is fully standardized in the LeRobot v2.1 format. By uniquely integrating mobile manipulation, contact-rich interaction, and long-horizon structure, AIRoA MoMa provides a critical benchmark for advancing the next generation of Vision-Language-Action models. The first version of our dataset is now available at https://huggingface.co/datasets/airoa-org/airoa-moma .
☆ Path Diffuser: Diffusion Model for Data-Driven Traffic Simulator
Simulating diverse and realistic traffic scenarios is critical for developing and testing autonomous planning. Traditional rule-based planners lack diversity and realism, while learning-based simulators often replay, forecast, or edit scenarios using historical agent trajectories. However, they struggle to generate new scenarios, limiting scalability and diversity due to their reliance on fully annotated logs and historical data. Thus, a key challenge for a learning-based simulator's performance is that it requires agents' past trajectories and pose information in addition to map data, which might not be available for all agents on the road.Without which, generated scenarios often produce unrealistic trajectories that deviate from drivable areas, particularly under out-of-distribution (OOD) map scenes (e.g., curved roads). To address this, we propose Path Diffuser (PD): a two-stage, diffusion model for generating agent pose initializations and their corresponding trajectories conditioned on the map, free of any historical context of agents' trajectories. Furthermore, PD incorporates a motion primitive-based prior, leveraging Frenet frame candidate trajectories to enhance diversity while ensuring road-compliant trajectory generation. We also explore various design choices for modeling complex multi-agent interactions. We demonstrate the effectiveness of our method through extensive experiments on the Argoverse2 Dataset and additionally evaluate the generalizability of the approach on OOD map variants. Notably, Path Diffuser outperforms the baseline methods by 1.92x on distribution metrics, 1.14x on common-sense metrics, and 1.62x on road compliance from adversarial benchmarks.
☆ Annotation-Free One-Shot Imitation Learning for Multi-Step Manipulation Tasks
Recent advances in one-shot imitation learning have enabled robots to acquire new manipulation skills from a single human demonstration. While existing methods achieve strong performance on single-step tasks, they remain limited in their ability to handle long-horizon, multi-step tasks without additional model training or manual annotation. We propose a method that can be applied to this setting provided a single demonstration without additional model training or manual annotation. We evaluated our method on multi-step and single-step manipulation tasks where our method achieves an average success rate of 82.5% and 90%, respectively. Our method matches and exceeds the performance of the baselines in both these cases. We also compare the performance and computational efficiency of alternative pre-trained feature extractors within our framework.
☆ MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation
Generative robot policies such as Flow Matching offer flexible, multi-modal policy learning but are sample-inefficient. Although object-centric policies improve sample efficiency, it does not resolve this limitation. In this work, we propose Multi-Stream Generative Policy (MSG), an inference-time composition framework that trains multiple object-centric policies and combines them at inference to improve generalization and sample efficiency. MSG is model-agnostic and inference-only, hence widely applicable to various generative policies and training paradigms. We perform extensive experiments both in simulation and on a real robot, demonstrating that our approach learns high-quality generative policies from as few as five demonstrations, resulting in a 95% reduction in demonstrations, and improves policy performance by 89 percent compared to single-stream approaches. Furthermore, we present comprehensive ablation studies on various composition strategies and provide practical recommendations for deployment. Finally, MSG enables zero-shot object instance transfer. We make our code publicly available at https://msg.cs.uni-freiburg.de.
☆ World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environments. This limitation is particularly critical in high-risk domains such as industrial automation, where interactions often induce state changes that are costly or infeasible to revert. Furthermore, existing VLA approaches lack a reliable mechanism for detecting task completion, leading to redundant actions that reduce overall task success rates. To address these challenges, we propose World-Env, an RL-based post-training framework that replaces physical interaction with a low-cost, world model-based virtual simulator. World-Env consists of two key components: (1) a video-based world simulator that generates temporally consistent future visual observations, and (2) a vision-language model (VLM)-guided instant reflector that provides continuous reward signals and predicts action termination. This simulated environment enables VLA models to safely explore and generalize beyond their initial imitation learning distribution. Our method achieves notable performance gains with as few as five expert demonstrations per task. Experiments on complex robotic manipulation tasks demonstrate that World-Env effectively overcomes the data inefficiency, safety constraints, and inefficient execution of conventional VLA models that rely on real-world interaction, offering a practical and scalable solution for post-training in resource-constrained settings.
☆ Trajectory Prediction via Bayesian Intention Inference under Unknown Goals and Kinematics
This work introduces an adaptive Bayesian algorithm for real-time trajectory prediction via intention inference, where a target's intentions and motion characteristics are unknown and subject to change. The method concurrently estimates two critical variables: the target's current intention, modeled as a Markovian latent state, and an intention parameter that describes the target's adherence to a shortest-path policy. By integrating this joint update technique, the algorithm maintains robustness against abrupt intention shifts and unknown motion dynamics. A sampling-based trajectory prediction mechanism then exploits these adaptive estimates to generate probabilistic forecasts with quantified uncertainty. We validate the framework through numerical experiments: Ablation studies of two cases, and a 500-trial Monte Carlo analysis; Hardware demonstrations on quadrotor and quadrupedal platforms. Experimental results demonstrate that the proposed approach significantly outperforms non-adaptive and partially adaptive methods. The method operates in real time around 270 Hz without requiring training or detailed prior knowledge of target behavior, showcasing its applicability in various robotic systems.
☆ When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?
With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) cooperative perception has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. V2X cooperative perception systems are software systems characterized by diverse sensor types and cooperative agents, varying fusion schemes, and operation under different communication conditions. Therefore, their complex composition gives rise to numerous operational challenges. Furthermore, when cooperative perception systems produce erroneous predictions, the types of errors and their underlying causes remain insufficiently explored. To bridge this gap, we take an initial step by conducting an empirical study of V2X cooperative perception. To systematically evaluate the impact of cooperative perception on the ego vehicle's perception performance, we identify and analyze six prevalent error patterns in cooperative perception systems. We further conduct a systematic evaluation of the critical components of these systems through our large-scale study and identify the following key findings: (1) The LiDAR-based cooperation configuration exhibits the highest perception performance; (2) Vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication exhibit distinct cooperative perception performance under different fusion schemes; (3) Increased cooperative perception errors may result in a higher frequency of driving violations; (4) Cooperative perception systems are not robust against communication interference when running online. Our results reveal potential risks and vulnerabilities in critical components of cooperative perception systems. We hope that our findings can better promote the design and repair of cooperative perception systems.
comment: The paper has been accepted by the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025
♻ ☆ VIMD: Monocular Visual-Inertial Motion and Depth Estimation
Accurate and efficient dense metric depth estimation is crucial for 3D visual perception in robotics and XR. In this paper, we develop a monocular visual-inertial motion and depth (VIMD) learning framework to estimate dense metric depth by leveraging accurate and efficient MSCKF-based monocular visual-inertial motion tracking. At the core the proposed VIMD is to exploit multi-view information to iteratively refine per-pixel scale, instead of globally fitting an invariant affine model as in the prior work. The VIMD framework is highly modular, making it compatible with a variety of existing depth estimation backbones. We conduct extensive evaluations on the TartanAir and VOID datasets and demonstrate its zero-shot generalization capabilities on the AR Table dataset. Our results show that VIMD achieves exceptional accuracy and robustness, even with extremely sparse points as few as 10-20 metric depth points per image. This makes the proposed VIMD a practical solution for deployment in resource constrained settings, while its robust performance and strong generalization capabilities offer significant potential across a wide range of scenarios.
♻ ☆ Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
♻ ☆ Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force, multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity by 354\% over manufacturer specifications. These results demonstrate that electrostatic multiplexing provides versatile actuation, overcoming the limitations of prior systems.
♻ ☆ DAGDiff: Guiding Dual-Arm Grasp Diffusion to Stable and Collision-Free Grasps
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the SE(3) x SE(3) space. Our key insight is that stability and collision can be enforced more effectively by guiding the diffusion process with classifier signals, rather than relying on explicit region detection or object priors. To this end, DAGDiff integrates geometry-, stability-, and collision-aware guidance terms that steer the generative process toward grasps that are physically valid and force-closure compliant. We comprehensively evaluate DAGDiff through analytical force-closure checks, collision analysis, and large-scale physics-based simulations, showing consistent improvements over previous work on these metrics. Finally, we demonstrate that our framework generates dual-arm grasps directly on real-world point clouds of previously unseen objects, which are executed on a heterogeneous dual-arm setup where two manipulators reliably grasp and lift them.
♻ ☆ Video models are zero-shot learners and reasoners
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.
comment: Project page: https://video-zero-shot.github.io/
♻ ☆ TaBSA -- A framework for training and benchmarking algorithms scheduling tasks for mobile robots working in dynamic environments
This article introduces a software framework for benchmarking robot task scheduling algorithms in dynamic and uncertain service environments. The system provides standardized interfaces, configurable scenarios with movable objects, human agents, tools for automated test generation, and performance evaluation. It supports both classical and AI-based methods, enabling repeatable, comparable assessments across diverse tasks and configurations. The framework facilitates diagnosis of algorithm behavior, identification of implementation flaws, and selection or tuning of strategies for specific applications. It includes a SysML-based domain-specific language for structured scenario modeling and integrates with the ROS-based system for runtime execution. Validated on patrol, fall assistance, and pick-and-place tasks, the open-source framework is suited for researchers and integrators developing and testing scheduling algorithms under real-world-inspired conditions.
comment: Article submitted to the SoftwareX journal
♻ ☆ Controllable Motion Generation via Diffusion Modal Coupling
Diffusion models have recently gained significant attention in robotics due to their ability to generate multi-modal distributions of system states and behaviors. However, a key challenge remains: ensuring precise control over the generated outcomes without compromising realism. This is crucial for applications such as motion planning or trajectory forecasting, where adherence to physical constraints and task-specific objectives is essential. We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions and enforcing strong modal coupling. This allows us to initiate the denoising process directly from distinct prior modes that correspond to different possible system behaviors, ensuring sampling to align with the training distribution. We evaluate our approach on motion prediction using the Waymo dataset and multi-task control in Maze2D environments. Experimental results show that our framework outperforms both guidance-based techniques and conditioned models with unimodal priors, achieving superior fidelity, diversity, and controllability, even in the absence of explicit conditioning. Overall, our approach provides a more reliable and scalable solution for controllable motion generation in robotics.
♻ ☆ Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer NeurIPS 2025
Recent advances in agent development have focused on scaling model size and raw interaction data, mirroring the successes seen in large language models. However, for complex, long-horizon multi-agent tasks such as robotic soccer, this end-to-end approach often fails due to intractable exploration spaces and sparse rewards. This position paper argues that the next frontier in developing embodied world models is not merely increasing the fidelity or size of environments, but scaling their structural complexity through explicit hierarchical scaffolding. We posit that an effective world model for decision-making must model not only the world's physics but also its task semantics. Drawing from a systematic review of 2024 research in low-resource multi-agent soccer, we identify a clear trend towards integrating symbolic and hierarchical methods, such as Hierarchical Task Networks (HTNs) and Bayesian Strategy Networks (BSNs), with multi-agent reinforcement learning (MARL). These methods decompose complex goals into manageable subgoals, creating an intrinsic curriculum that shapes agent learning. We propose that such structured environments are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play. We further extend this principle, proposing that this scaffolding can be generalized to other complex domains and dynamically generated by Large Language Models (LLMs), which act as generative world models of tasks. By building environments with explicit, composable task layers, we can guide agent exploration more efficiently, generate meaningful learning signals, and ultimately train more capable and general-purpose agents with fewer resources than purely end-to-end approaches.
comment: In the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Embodied World Models for Decision Making (EWM)
♻ ☆ ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control. Building on this cognitive foundation, we then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner to efficiently generate continuous and stable trajectories. Furthermore, to enhance driving safety and reduce collisions, we introduce a Diffusion Group Relative Policy Optimization (DiffGRPO) stage, reinforcing the planner for enhanced safety and comfort. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that ReCogDrive achieves state-of-the-art performance. Additionally, qualitative results across diverse driving scenarios and DriveBench highlight the model's scene comprehension. All code, model weights, and datasets will be made publicly available to facilitate subsequent research.
♻ ☆ Diffusion-Based mmWave Radar Point Cloud Enhancement Driven by Range Images
Millimeter-wave (mmWave) radar has attracted significant attention in robotics and autonomous driving. However, despite the perception stability in harsh environments, the point cloud generated by mmWave radar is relatively sparse while containing significant noise, which limits its further development. Traditional mmWave radar enhancement approaches often struggle to leverage the effectiveness of diffusion models in super-resolution, largely due to the unnatural range-azimuth heatmap (RAH) or bird's eye view (BEV) representation. To overcome this limitation, we propose a novel method that pioneers the application of fusing range images with image diffusion models, achieving accurate and dense mmWave radar point clouds that are similar to LiDAR. Benefitting from the projection that aligns with human observation, the range image representation of mmWave radar is close to natural images, allowing the knowledge from pre-trained image diffusion models to be effectively transferred, significantly improving the overall performance. Extensive evaluations on both public datasets and self-constructed datasets demonstrate that our approach provides substantial improvements, establishing a new state-of-the-art performance in generating truly three-dimensional LiDAR-like point clouds via mmWave radar. Code will be released after publication.
comment: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication
♻ ☆ What Do You Need for Diverse Trajectory Composition in Diffusion Planning?
In planning, stitching is an ability of algorithms to piece together sub-trajectories of data they are trained on to generate new and diverse behaviours. While stitching is historically a strength of offline reinforcement learning, recent generative behavioural cloning (BC) methods have also shown proficiency at stitching. However, the main factors behind this are poorly understood, hindering the development of new algorithms that can reliably stitch. Focusing on diffusion planners trained via BC, we find two properties are needed to compose: \emph{positional equivariance} and \emph{local receptiveness}. We use these two properties to explain architecture, data, and inference choices in existing generative BC methods based on diffusion planning, including replanning frequency, data augmentation, and data scaling. Experimental comparisions show that (1) while locality is more important than positional equivariance in creating a diffusion planner capable of composition, both are crucial (2) enabling these properties through relatively simple architecture choices can be competitive with more computationally expensive methods such as replanning or scaling data, and (3) simple inpainting-based guidance can guide architecturally compositional models to enable generalization in goal-conditioned settings.
comment: 9 Pages
Systems and Control 49
☆ A General Theory of Emergent Linearity in Complex Dynamical Systems: The Role of Spatial Averaging and Vanishing Correlations
Various natural and engineered systems, from urban traffic flow to the human brain, have been described by large-scale networked dynamical systems. Despite their vast differences, these systems are often similar in being comprised of numerous microscopic subsystems with complex nonlinear dynamics and interactions that give rise to diverse emergent macroscopic behaviors. As such, a long-standing question across various fields has been to understand why and how various forms of macroscopic behavior emerge from underlying microscopic dynamics. Motivated by a growing body of empirical observations, in this work we focus on linearity as one of the most fundamental aspects of system dynamics, and develop a general theoretical framework for the interplay between spatial averaging, decaying microscopic correlations, and emergent macroscopic linearity. Using and extending the theory of mixing sequences, we show that in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that (in addition to technical assumptions) pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
☆ Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
☆ Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
☆ Inverse Optimal Feedback and Gain Margins for Unicycle Stabilization
The recent development of globally strict control Lyapunov functions (CLFs) for the challenging unicycle parking problem provides a foundation for pursuing optimality. We address this in the inverse optimal framework, thereby avoiding the need to solve the Hamilton$\unicode{x2013}$Jacobi$\unicode{x2013}$Bellman (HJB) equations, and establish a general result that is optimal with respect to a meaningful cost. We present several design examples that impose varying levels of penalty on the control effort, including arbitrarily bounded control. Furthermore, we show that the inverse optimal controller possesses an infinite gain margin thanks to the system being driftless, and leveraging this property, we extend the design to an adaptive controller that handles model uncertainty. Finally, we compare the performance of the non-adaptive inverse optimal controller with its adaptive counterpart.
comment: Submitted to ACC 2026
☆ Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
☆ From Legacy to Leadership Intelligent Radio Network Planning Framework for Cell-Free Massive MIMO in B5G6G Era
The proliferation of cell-free Massive MIMO represents a transformative shift in wireless network architecture, addressing critical limitations of conventional distributed Massive MIMO systems. This paper presents an intelligent radio network planning framework that bridges legacy 5G infrastructures with future B5G/6G networks through cell-free architectures. By leveraging operational insights from existing 5G deployments, we systematically address coverage optimization, and capacity enhancement. Our scalable framework enables seamless evolution from legacy designs to next-generation cell-free systems. Through extensive simulations in dense urban environments, we demonstrate substantial improvements: 45% spectral efficiency gains, 30% interference reduction, and significantly enhanced uniform coverage. The proposed framework provides network operators with a practical roadmap for transitioning from traditional cellular architectures to demanding B5G/6G requirements while maximizing existing infrastructure investments.
comment: 6 pages, 12 figures, 4 tables, IEEE ETCM 2025 Conference Preprint
☆ Spatiotemporal Forecasting of Incidents and Congestion with Implications for Sustainable Traffic Control
Urban traffic anomalies such as collisions and disruptions threaten the safety, efficiency, and sustainability of transportation systems. We present a simulation-based framework for modeling, detecting, and predicting such anomalies in urban networks. Using the SUMO platform, we generate reproducible rear-end and intersection crash scenarios with matched baselines, enabling controlled experimentation and comparative evaluation. We record vehicle-level travel time, speed, and emissions for edge and network-level analysis. On this dataset, we develop a hybrid forecasting architecture that combines bidirectional long short-term memory networks with a diffusion convolutional recurrent neural network to capture temporal dynamics and spatial dependencies. Our simulation studies on the Broadway corridor in New York City demonstrate the framework's ability to reproduce consistent incident conditions, quantify their effects, and provide accurate multi-horizon traffic forecasts. Our results highlight the value of combining controlled anomaly generation with deep predictive models to support reproducible evaluation and sustainable traffic management.
☆ Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, to be published in the proceedings of the International Workshop on Active Inference 2025
☆ Environmental Rate Manipulation Attacks on Power Grid Security
The growing complexity of global supply chains has made hardware Trojans a significant threat in sensor-based power electronics. Traditional Trojan designs depend on digital triggers or fixed threshold conditions that can be detected during standard testing. In contrast, we introduce Environmental Rate Manipulation (ERM), a novel Trojan triggering mechanism that activates by monitoring the rate of change in environmental parameters rather than their absolute values. This approach allows the Trojan to remain inactive under normal conditions and evade redundancy and sensor-fusion defenses. We implement a compact 14~$\mu$m$^2$ circuit that measures capacitor charging rates in standard sensor front-ends and disrupts inverter pulse-width modulation PWM signals when a rapid change is induced. Experiments on a commercial Texas Instruments solar inverter demonstrate that ERM can trigger catastrophic driver chip failure. Furthermore, ETAP simulations indicate that a single compromised 100~kW inverter may initiate cascading grid instabilities. The attack's significance extends beyond individual sensors to entire classes of environmental sensing systems common in power electronics, demonstrating fundamental challenges for hardware security.
☆ Fast Energy-Theft Attack on Frequency-Varying Wireless Power without Additional Sensors
With the popularity of wireless charging, energy access protection and cybersecurity are gaining importance, especially in public places. Currently, the most common energy encryption method uses frequency and associated impedance variation. However, we have proven that this method is not reliable, since a hacker can detect the changing frequency and adjust the compensation. However, the previously presented system needed time to follow the updated frequency, while encryption systems may vary the frequency faster to avoid energy theft. Furthermore, the previous system required an additional sensor coil. To solve these problems, we optimized the attack and the associated system, which can intrude and steal energy within 0.2 ms. The key is the elimination of the time-consuming maximum receiver current regulation. Also, we use the main receiving coil rather than any additional sensor antenna to detect the magnetic field. Thus, the new hardware is even simpler. A simulation model and experimental results demonstrate the fast response speed of the attack on encrypted wireless power and steal 65% of the power. Overall, the applicability of the attack is highly improved and leaves less room for hardening the encryption. The results demonstrate that energy access protection needs to be given great attention.
comment: 11 pages, 12 figures
☆ Safe and Stable Control via Lyapunov-Guided Diffusion Models NeurIPS 2025
Diffusion models have made significant strides in recent years, exhibiting strong generalization capabilities in planning and control tasks. However, most diffusion-based policies remain focused on reward maximization or cost minimization, often overlooking critical aspects of safety and stability. In this work, we propose Safe and Stable Diffusion ($S^2$Diff), a model-based diffusion framework that explores how diffusion models can ensure safety and stability from a Lyapunov perspective. We demonstrate that $S^2$Diff eliminates the reliance on both complex gradient-based solvers (e.g., quadratic programming, non-convex solvers) and control-affine structures, leading to globally valid control policies driven by the learned certificate functions. Additionally, we uncover intrinsic connections between diffusion sampling and Almost Lyapunov theory, enabling the use of trajectory-level control policies to learn better certificate functions for safety and stability guarantees. To validate our approach, we conduct experiments on a wide variety of dynamical control systems, where $S^2$Diff consistently outperforms both certificate-based controllers and model-based diffusion baselines in terms of safety, stability, and overall control performance.
comment: Accepted by NeurIPS 2025
☆ Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
Voltage prediction in distribution grids is a critical yet difficult task for maintaining power system stability. Machine learning approaches, particularly Graph Neural Networks (GNNs), offer significant speedups but suffer from poor generalization when trained on limited or incomplete data. In this work, we systematically investigate the role of inductive biases in improving a model's ability to reliably learn power flow. Specifically, we evaluate three physics-informed strategies: (i) power-flow-constrained loss functions, (ii) complex-valued neural networks, and (iii) residual-based task reformulation. Using the ENGAGE dataset, which spans multiple low- and medium-voltage grid configurations, we conduct controlled experiments to isolate the effect of each inductive bias and assess both standard predictive performance and out-of-distribution generalization. Our study provides practical insights into which model assumptions most effectively guide learning for reliable and efficient voltage prediction in modern distribution networks.
☆ Data-Driven Optimal Power Flow: A Behavioral Systems Approach
The increasing decentralization of power systems driven by a large number of renewable energy sources poses challenges in power flow optimization. Partially unknown power line properties can render model-based approaches unsuitable. With increasing deployment of sensors, data-driven methods rise as a promising alternative. They offer the flexibility to adapt to varying grid structures and unknown line properties. In this paper, we propose a novel data-driven representation of nonlinear power flow equations for radial grids based on Willems' Fundamental Lemma. The approach allows for direct integration of input/output data into power flow optimisation, enabling cost minimization and constraint enforcement without requiring explicit knowledge of the electrical properties or the topology of the grid. Moreover, we formulate a convex relaxation to ensure compatibility with state-of-the-art solvers. In a numerical case study, we demonstrate that the novel approach performs similar to state-of-the-art methods, without the need for an explicit system identification step.
☆ Data-Driven Resilience Assessment against Sparse Sensor Attacks
We present a data-driven framework for assessing the attack resilience of linear time-invariant systems against malicious false data injection sensor attacks. Based on the concept of sparse observability, data-driven resilience metrics are proposed. First, we derive a data-driven necessary and sufficient condition for assessing the system's resilience against sensor attacks, using data collected without any attacks. If we obtain attack-free data that satisfy a specific rank condition, we can exactly evaluate the attack resilience level even in a model-free setting. We then extend this analysis to a scenario where only poisoned data are available. Given the poisoned data, we can only conservatively assess the system's resilience. In both scenarios, we also provide polynomial-time algorithms to assess the system resilience under specific conditions. Finally, numerical examples illustrate the efficacy and limitations of the proposed framework.
☆ Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
☆ Real-Time Estimation of Equivalent Series Resistance for Predicting Output Capacitor Failures in Boost Converters
Output capacitors are among the most critical components in power electronic converters, as their degradation directly affects system stability and reliability. A key indicator of capacitor health is the Equivalent Series Resistance (ESR), whose progressive increase is strongly correlated with aging and imminent failure. This paper presents a real-time technique for estimating the ESR of output capacitors in boost converters, with the aim of enabling early fault prediction and condition-based maintenance. The proposed method leverages online parameter estimation to extract ESR values without interrupting converter operation. The estimation algorithm has been implemented on a low-cost STM32 Nucleo platform, demonstrating both computational efficiency and suitability for embedded applications. Experimental validation confirms that the approach provides accurate ESR tracking under varying load and operating conditions, allowing timely detection of abnormal capacitor behavior and preventing unexpected system failures.
comment: 5 pages, 5 figures
☆ MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
☆ Real-Time Power electronics Control and Monitoring with TI F28379D DSC and GUI Composer
This paper details the implementation and experimental validation of a real-time control system for a three-phase induction motor using the Texas Instruments TMS320F28379D microcontroller. The system integrates pulse-width modulation (PWM) generation, analog-to-digital conversion (ADC), digital-to-analog conversion (DAC), and quadrature encoder feedback to facilitate precise control under various strategies. A current sensing solution based on the AMC1301 isolation amplifier and shunt resistor ensures accurate and safe current measurement for feedback loops. Two control algorithms, V/f and Field-Oriented Control (FOC) are implemented and tested. Real-time parameter tuning and data visualization are achieved using GUI Composer, enabling efficient system debugging and interaction. Experimental results demonstrate smooth speed reversal, fast dynamic response, and stable performance under both step and multi-step inputs. While GUI Composer effectively supports general monitoring and control, limitations in signal bandwidth are noted compared to professional-grade platforms. The results confirm the robustness and effectiveness of the implemented control strategies for high-performance induction motor applications.
comment: 10 pages
☆ Spectral Flow Learning Theory: Finite-Sample Guarantees for Vector-Field Identification
We study the identification of continuous-time vector fields from irregularly sampled trajectories. We introduce Spectral Flow Learning (SFL), which learns in a windowed flow space using a lag-linear label operator that aggregates lagged Koopman actions. We provide finite-sample high-probability (FS-HP) guarantees for the class of variable-step linear multistep methods (vLLM). The FS-HP rates are constructed using spectral regularization with qualification-controlled filters for flow predictors under standard source and filter assumptions. A multistep observability inequality links flow error to vector-field error and yields two-term bounds that combine a statistical rate with an explicit discretization bias from vLMM theory. This preliminary preprint states the results and sketches proofs, with full proofs and extensions deferred to a journal version.
☆ Path Diffuser: Diffusion Model for Data-Driven Traffic Simulator
Simulating diverse and realistic traffic scenarios is critical for developing and testing autonomous planning. Traditional rule-based planners lack diversity and realism, while learning-based simulators often replay, forecast, or edit scenarios using historical agent trajectories. However, they struggle to generate new scenarios, limiting scalability and diversity due to their reliance on fully annotated logs and historical data. Thus, a key challenge for a learning-based simulator's performance is that it requires agents' past trajectories and pose information in addition to map data, which might not be available for all agents on the road.Without which, generated scenarios often produce unrealistic trajectories that deviate from drivable areas, particularly under out-of-distribution (OOD) map scenes (e.g., curved roads). To address this, we propose Path Diffuser (PD): a two-stage, diffusion model for generating agent pose initializations and their corresponding trajectories conditioned on the map, free of any historical context of agents' trajectories. Furthermore, PD incorporates a motion primitive-based prior, leveraging Frenet frame candidate trajectories to enhance diversity while ensuring road-compliant trajectory generation. We also explore various design choices for modeling complex multi-agent interactions. We demonstrate the effectiveness of our method through extensive experiments on the Argoverse2 Dataset and additionally evaluate the generalizability of the approach on OOD map variants. Notably, Path Diffuser outperforms the baseline methods by 1.92x on distribution metrics, 1.14x on common-sense metrics, and 1.62x on road compliance from adversarial benchmarks.
☆ Coordinated vs. Sequential Transmission Planning
Coordinated planning of generation, storage, and transmission more accurately captures the interactions among these three capacity types necessary to meet electricity demand, at least in theory. However, in practice, U.S. system operators typically follow a sequential planning approach: They first determine future generation and storage additions based on an assumed unconstrained (`copper plate') system. Next, they perform dispatch simulations of this projected generation and storage capacity mix on the existing transmission grid to identify transmission constraint violations. These violations indicate the need for transmission upgrades. We describe a multistage, multi-locational planning model that co-optimizes generation, storage, and transmission investments. The model respects reliability constraints as well as state energy and climate policies. We test the two planning approaches using a current stakeholder-informed 20-zone model of the PJM region, developed for the current FERC Order No. 1920 compliance filing process. In our most conservative model specification, we find that the co-optimized approach estimates 67% lower transmission upgrade needs than the sequential model, leading to total system costs that are .6% lower and similar reliability and climate outcomes. Our sensitivities show larger transmission and cost savings and reliability and climate benefits from co-optimized planning.
comment: 10 pages
☆ Physics-informed learning under mixing: How physical knowledge speeds up learning
A major challenge in physics-informed machine learning is to understand how the incorporation of prior domain knowledge affects learning rates when data are dependent. Focusing on empirical risk minimization with physics-informed regularization, we derive complexity-dependent bounds on the excess risk in probability and in expectation. We prove that, when the physical prior information is aligned, the learning rate improves from the (slow) Sobolev minimax rate to the (fast) optimal i.i.d. one without any sample-size deflation due to data dependence.
☆ Position estimation based on UWB swarm optimization and comparison against traditional trilateration
Ultra-wideband (UWB) is a promising technology for indoor position estimation for various localization applications of object swarms, such as in 3D analysis of human movement with multiple on-body sensors or a swarm of drones in an indoor environment. However, most UWB-only position estimation methods are based on a star topology, where the position of a mobile node is estimated using distances from several fixed anchors. These approaches ignore the valuable inter-node distance estimates, possible in a fully-connected 'Swarm' topology, which could provide more redundancy in the set of available distance estimates used for the position estimation. This would improve the accuracy and consistency of the position estimates. Also, published studies do not analyze how input measurement errors affect the final position estimates, which makes it difficult to assess the reliability under varying conditions. Therefore, this study first proposes a UWB swarm optimization-based position estimation method that utilizes all available internode distances to enhance accuracy and compare against the traditional trilateration method that utilizes the star configuration. All validations were done with synthetic UWB data, to enable testing all input error situations. The comprehensive error sensitivity analysis was conducted to evaluate its robustness under varying noise conditions. The proposed method consistently outperformed trilateration, with position estimation error around 5.7 cm for realistic UWB distance input estimates, while for higher noise conditions, the proposed method had errors around 6 cm lower than the trilateration method, which had position estimation errors around 19 cm. This study demonstrates the general potential of the Swarm optimization-based method for position estimation as a more accurate and consistent alternative to traditional star-based trilateration methods.
comment: Submitted to journal and Manuscript under review
☆ Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering IROS
Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.
comment: This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025
☆ Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 21 pages, 6 figures
☆ Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
☆ PhysiAgent: An Embodied Agent Framework in Physical World
Vision-Language-Action (VLA) models have achieved notable success but often struggle with limited generalizations. To address this, integrating generalized Vision-Language Models (VLMs) as assistants to VLAs has emerged as a popular solution. However, current approaches often combine these models in rigid, sequential structures: using VLMs primarily for high-level scene understanding and task planning, and VLAs merely as executors of lower-level actions, leading to ineffective collaboration and poor grounding challenges. In this paper, we propose an embodied agent framework, PhysiAgent, tailored to operate effectively in physical environments. By incorporating monitor, memory, self-reflection mechanisms, and lightweight off-the-shelf toolboxes, PhysiAgent offers an autonomous scaffolding framework to prompt VLMs to organize different components based on real-time proficiency feedback from VLAs to maximally exploit VLAs' capabilities. Experimental results demonstrate significant improvements in task-solving performance on complex real-world robotic tasks, showcasing effective self-regulation of VLMs, coherent tool collaboration, and adaptive evolution of the framework during execution. PhysiAgent makes practical and pioneering efforts to integrate VLMs and VLAs, effectively grounding embodied agent frameworks in real-world settings.
☆ Autonomous Detection and Coverage of Unknown Target Areas by Multi-Agent Systems
This paper presents a novel coverage control algorithm for multi-agent systems, where each agent has no prior knowledge of the specific region to be covered. The proposed method enables agents to autonomously detect the target area and collaboratively achieve full coverage. Once an agent detects a part of the target region within its sensor range, a dynamically constructed density function is generated to attract nearby agents. By integrating this density-driven mechanism with Centroidal Voronoi Tessellation (CVT), the agents are guided to achieve optimal spatial distribution. Additionally, Control Barrier Functions (CBFs) are employed to ensure collision avoidance and maintain non-overlapping sensor coverage, enhancing both safety and efficiency. Simulation results verify that agents can independently locate and effectively cover the target area.
comment: 8 pages, 9 figures
☆ Small-Covariance Noise-to-State Stability of Stochastic Systems and Its Applications to Stochastic Gradient Dynamics
This paper studies gradient dynamics subject to additive random noise, which may arise from sources such as stochastic gradient estimation, measurement noise, or stochastic sampling errors. To analyze the robustness of such stochastic gradient systems, the concept of small-covariance noise-to-state stability (NSS) is introduced, along with a Lyapunov-based characterization. Furthermore, the classical Polyak-Lojasiewicz (PL) condition on the objective function is generalized to the $\mathcal{K}$-PL condition via comparison functions, thereby extending its applicability to a broader class of optimization problems. It is shown that the stochastic gradient dynamics exhibit small-covariance NSS if the objective function satisfies the $\mathcal{K}$-PL condition and possesses a globally Lipschitz continuous gradient. This result implies that the trajectories of stochastic gradient dynamics converge to a neighborhood of the optimum with high probability, with the size of the neighborhood determined by the noise covariance. Moreover, if the $\mathcal{K}$-PL condition is strengthened to a $\mathcal{K}_\infty$-PL condition, the dynamics are NSS; whereas if it is weakened to a general positive-definite-PL condition, the dynamics exhibit integral NSS. The results further extend to objectives without globally Lipschitz gradients through appropriate step-size tuning. The proposed framework is further applied to the robustness analysis of policy optimization for the linear quadratic regulator (LQR) and logistic regression.
comment: 25 pages, 1 figure
☆ Towards Tighter Convex Relaxation of Mixed-integer Programs: Leveraging Logic Network Flow for Task and Motion Planning
This paper proposes an optimization-based task and motion planning framework, named "Logic Network Flow", that integrates temporal logic specifications into mixed-integer programs for efficient robot planning. Inspired by the Graph-of-Convex-Sets formulation, temporal predicates are encoded as polyhedron constraints on each edge of a network flow model, instead of as constraints between nodes in traditional Logic Tree formulations. We further propose a network-flow-based Fourier-Motzkin elimination procedure that removes continuous flow variables while preserving convex relaxation tightness, leading to provably tighter convex relaxations and fewer constraints than Logic Tree formulations. For temporal logic motion planning with piecewise-affine dynamic systems, comprehensive experiments across vehicle routing, multi-robot coordination, and temporal logic control on dynamical systems using point mass and linear inverted pendulum models demonstrate computational speedups of up to several orders of magnitude. Hardware demonstrations with quadrupedal robots validate real-time replanning capabilities under dynamically changing environmental conditions. The project website is at https://logicnetworkflow.github.io/.
comment: 35 pages, 17 figures, 7 tables
☆ Multi-Agent Guided Policy Search for Non-Cooperative Dynamic Games
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
☆ SDC-Based Model Predictive Control: Enhancing Computational Feasibility for Safety-Critical Quadrotor Control
Nonlinear Model Predictive Control (NMPC) is widely used for controlling high-speed robotic systems such as quadrotors. However, its significant computational demands often hinder real-time feasibility and reliability, particularly in environments requiring robust obstacle avoidance. This paper proposes a novel SDC-Based Model Predictive Control (MPC) framework, which preserves the high-precision performance of NMPC while substantially reducing computational complexity by over 30%. By reformulating the nonlinear quadrotor dynamics through the State-Dependent Coefficient (SDC) method, the original nonlinear program problem is transformed into a sequential quadratic optimization problem. The controller integrates an integral action to eliminate steady-state tracking errors and imposes constraints for safety-critical obstacle avoidance. Additionally, a disturbance estimator is incorporated to enhance robustness against external perturbations. Simulation results demonstrate that the SDC-Based MPC achieves comparable tracking accuracy to NMPC, with greater efficiency in terms of computation times, thereby improving its suitability for real-time applications. Theoretical analysis further establishes the stability and recursive feasibility of the proposed approach.
comment: This is the initial version
☆ Learning Hybrid Dynamics via Convex Optimizations
This paper investigates the problem of identifying state-dependent switching systems, a class of hybrid dynamical systems that combine multiple linear or nonlinear modes. We propose two broad classes of switching systems: switching linear systems (SLSs) and switching polynomial systems (SPSs). We first formulate the joint estimation of the mode dynamics and switching rules as a mixed integer program. To solve its inherent scalability issue, we develop a hierarchy of convex relaxations and establish a bound and conditions under which these relaxations are tight. Building on these results, we propose a bilevel convex optimization framework that alternates between mode assignment and dynamics estimation, and we recover switching boundaries using margin-based polynomial classifiers. Numerical experiments on both linear and nonlinear oscillators demonstrate that the method accurately identifies mode dynamics and reconstructs switching surfaces from trajectory data. Our results provide a tractable optimization-based framework for switching system identification.
comment: 8 pages, 4 figures
♻ ☆ Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
♻ ☆ RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness
Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness metrics. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
comment: Accepted to the Proceedings of the JuliaCon Conferences
♻ ☆ Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force, multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity by 354\% over manufacturer specifications. These results demonstrate that electrostatic multiplexing provides versatile actuation, overcoming the limitations of prior systems.
♻ ☆ Electricity Demand and Grid Impacts of AI Data Centers: Challenges and Prospects
The rapid growth of artificial intelligence (AI) is driving an unprecedented increase in the electricity demand of AI data centers, raising emerging challenges for electric power grids. Understanding the characteristics of AI data center loads and their interactions with the grid is therefore critical for ensuring both reliable power system operation and sustainable AI development. This paper provides a comprehensive review and vision of this evolving landscape. Specifically, this paper (i) presents an overview of AI data center infrastructure and its key components, (ii) examines the key characteristics and patterns of electricity demand across the stages of model preparation, training, fine-tuning, and inference, (iii) analyzes the critical challenges that AI data center loads pose to power systems across three interrelated timescales, including long-term planning and interconnection, short-term operation and electricity markets, and real-time dynamics and stability, and (iv) discusses potential solutions from the perspectives of the grid, AI data centers, and AI end-users to address these challenges. By synthesizing current knowledge and outlining future directions, this review aims to guide research and development in support of the joint advancement of AI data centers and power systems toward reliable, efficient, and sustainable operation.
♻ ☆ Secure and Decentralized Peer-to-Peer Energy Transactions using Blockchain Technology
This paper presents an optimal peer-to-peer (P2P) energy transaction mechanism leveraging decentralized blockchain technology to enable a secure and scalable retail electricity market for the increasing penetration of distributed energy resources (DERs). A decentralized bidding strategy is proposed to maximize individual profits while collectively enhancing social welfare. The market design and transaction processes are simulated using the Ethereum testnet, demonstrating the blockchain network's capability to ensure secure, transparent, and sustainable P2P energy trading among DER participants.
♻ ☆ A Model-Free Terminal Iterative Learning Control Scheme for Multi-Layer Printing Alignment Control Problems
Roll-to-roll (R2R) printing technologies are promising for high-volume continuous production of substrate-based electronic products. One of the major challenges in R2R flexible electronics printing is achieving tight alignment tolerances, as specified by the device resolution (usually at the micro-meter level), for multi-layer printed electronics. The alignment of the printed patterns in different layers is known as registration. Conventional registration control methods rely on real-time feedback controllers, such as PID control, to regulate the web tension and the web speed. However, those methods may lose effectiveness in compensating for recurring disturbances and supporting effective mitigation of registration errors. In this paper, we propose a Spatial-Terminal Iterative Learning Control (STILC) method integrated with PID control to iteratively learn and reduce registration error cycle-by-cycle, converging it to zero. This approach enables unprecedented precision in the creation, integration, and manipulation of multi-layer microstructures in R2R processes. We theoretically prove the convergence of the proposed STILC-PID hybrid approach and validate its effectiveness through a simulated registration error scenario caused by axis mismatch between roller and motor, a common issue in R2R systems. The results demonstrate that the STILC-PID hybrid control method can fully eliminate the registration error after a feasible number of iterations. Additionally, we analyze the impact of different learning gains on the convergence performance of STILC.
♻ ☆ Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end, with agents generating messages and actions concurrently. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), to simulate future states. Agents then communicate a summary of this plan. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
♻ ☆ Certified Neural Approximations of Nonlinear Dynamics
Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it significantly outperforms the state-of-the-art. Furthermore, we show that our framework can successfully address additional scenarios previously intractable for existing methods - neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators for the purpose of trajectory prediction.
comment: first and second author contributed equally
♻ ☆ Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
comment: Submitted to SIAM Journal on Control and Optimization
♻ ☆ Can Human Drivers and Connected Autonomous Vehicles Co-exist in Lane-Free Traffic? A Microscopic Simulation Perspective
Recent advancements in connected autonomous vehicle (CAV) technology have sparked growing research interest in lane-free traffic (LFT). LFT envisions a scenario where all vehicles are CAVs, coordinating their movements without lanes to achieve smoother traffic flow and higher road capacity. This potentially reduces congestion without building new infrastructure. However, the transition phase will likely involve non-connected actors such as human-driven vehicles (HDVs) or independent AVs sharing the roads. This raises the question of how LFT performance is impacted when not all vehicles are CAVs, as these non-connected vehicles may prioritize their own benefits over system-wide improvements. This paper addresses this question through microscopic simulation on a ring road, where CAVs follow the potential lines (PL) controller for LFT, while HDVs adhere to a strip-based car-following model. The PL controller is also modified for safe velocities to prevent collisions. The results reveal that even a small percentage of HDVs can significantly disrupt LFT flow: 5% HDVs can reduce LFT's maximum road capacity by 20% and a 40% HDVs nearly halves it, up until 100% HDVs where it drops by nearly 60%. The study also develops an adaptive potential line (APL) controller that forms APL corridors in the surroundings of HDVs. APL shows a peak traffic flow improvement of nearly 10% over the PL controller. The study indicates that a penetration rate of approximately 60% CAVs is required to start observing the major LFT benefits. These findings open a new research direction on minimizing the adverse effects of non-connected vehicles on LFT.
comment: This version corresponds to the final published article in Transportation Research Part C: Emerging Technologies. It incorporates revisions made during peer review, including an improved literature review, clearer methodological descriptions, and explicit consideration of safety and comfort
♻ ☆ CommonPower: A Framework for Safe Data-Driven Smart Grid Control
The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). To validate their effectiveness, RL algorithms have to be evaluated across multiple case studies. Case study design is an arduous task requiring the consideration of many aspects, among them the influence of available forecasts and the level of decentralization in the control structure. Furthermore, vanilla RL controllers cannot themselves ensure the satisfaction of system constraints, which makes devising a safeguarding mechanism a necessary task for every case study before deploying the system. To address these shortcomings, we introduce the Python tool CommonPower, the first general framework for the modeling and simulation of power system management tailored towards machine learning. Its modular architecture enables users to focus on specific elements without having to implement a simulation environment. Another unique contribution of CommonPower is the automatic synthesis of model predictive controllers and safeguards. Beyond offering a unified interface for single-agent RL, multi-agent RL, and optimal control, CommonPower includes a training pipeline for machine-learning-based forecasters as well as a flexible mechanism for incorporating feedback of safeguards into the learning updates of RL controllers.
comment: For the corresponding code repository, see https://github.com/TUMcps/commonpower
♻ ☆ Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?
It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.
comment: This paper has been accepted by IEEE Transactions on Mobile Computing. Besides, Guanwen Xie and Jingzehua Xu contributed equally to this work
♻ ☆ Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining B\'ezier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
♻ ☆ Analysis of Hierarchical AoII over Unreliable Channels: A Stochastic Hybrid System Approach
In this work, we generalize the Stochastic Hybrid Systems (SHSs) analysis of traditional AoI to the AoII metric. Hierarchical ageing processes are adopted using the continuous AoII for the first time, where two different hierarchy schemes, i.e., a hybrid of linear ageing processes with different slopes and a hybrid of linear and quadratic ageing processes, are considered. We first modify the main result in \cite[Theorem 1]{yates_age_2020b} to provide a systematic way to analyze the continuous hierarchical AoII over unslotted real-time systems. The closed-form expressions of average hierarchical AoII are obtained based on our Theorem \ref{theorem1} in two typical scenarios with different channel conditions, i.e., an M/M/1/1 queue over noisy channels and two M/M/1/1 queues over collision channels. Moreover, we analyze the stability conditions for two scenarios given that the quadratic ageing process may lead to the absence of stationary solutions. Finally, we compare the average age performance between the classic AoI results and our AoII results in the M/M/1/1 queue, and the effects of different channel parameters on AoII are also evaluated.
comment: 16 pages, 10 figures
♻ ☆ Steering the Herd: A Framework for LLM-based Control of Social Learning
Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.
♻ ☆ A Crime/S.I.R. optimal control problem
This paper presents and discusses a mathematical model inspired by control theory to derive optimal public policies for minimizing costs associated with the reduction and control of criminal activity in a population. Specifically, we analyze the optimal control problem \begin{equation*} \min G(u_1, u_2, u_3) = \int_{0}^{t_{\text{F}}} \left( I(t) - R(t) + \frac{B_1}{2} u_1^2(t) + \frac{B_2}{2} u_2^2(t) + \frac{B_3}{2} u_3^2(t) \right) \, dt. \end{equation*} where $I=I(t)$ and $R=R(t)$ satisfies the system of equations \begin{equation*} \left\{ \begin{aligned} \dot{S} &= \Lambda - (1-u_1)SI - \mu S + ((1+u_3)\gamma_2)I + \rho \Omega R,\\ \dot{I} &= (1-u_1)SI - (\mu + \delta_1)I - ((1+u_2)\gamma_1)I - ((1+u_3)\gamma_2)I + (1-\Omega)\rho R,\\ \dot{R} &= ((1+u_2)\gamma_1)I - (\mu + \delta_2 + \rho)R. \end{aligned} \right. \end{equation*} Our approach assumes that the social and economic effects of criminal behavior can be modeled by a dynamic SIR-type system, which serves as a constraint on a cost functional associated with the strategies implemented by government and law enforcement authorities to reduce criminal behavior. Using optimal control theory, the proposed controls, i.e., preventive policies (such as community and social cohesion programs), are expected to have a significant and positive impact on crime reduction, generating opportunities for the most disadvantaged sectors of Cali society and contributing to long-term security. Given that resources to address this problem are limited, this research aims to determine an optimal combination of public interventions and policies that minimize criminality at the lowest possible economic cost, using an SIR model, tools from variational calculus, and optimal control theory.
Optimization and Control 49
☆ A General Theory of Emergent Linearity in Complex Dynamical Systems: The Role of Spatial Averaging and Vanishing Correlations
Various natural and engineered systems, from urban traffic flow to the human brain, have been described by large-scale networked dynamical systems. Despite their vast differences, these systems are often similar in being comprised of numerous microscopic subsystems with complex nonlinear dynamics and interactions that give rise to diverse emergent macroscopic behaviors. As such, a long-standing question across various fields has been to understand why and how various forms of macroscopic behavior emerge from underlying microscopic dynamics. Motivated by a growing body of empirical observations, in this work we focus on linearity as one of the most fundamental aspects of system dynamics, and develop a general theoretical framework for the interplay between spatial averaging, decaying microscopic correlations, and emergent macroscopic linearity. Using and extending the theory of mixing sequences, we show that in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that (in addition to technical assumptions) pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
☆ Integrator Forwading Design for Unicycles with Constant and Actuated Velocity in Polar Coordinates
In a companion paper, we present a modular framework for unicycle stabilization in polar coordinates that provides smooth steering laws through backstepping. Surprisingly, the same problem also allows the application of integrator forwarding. In this work, we leverage this feature and construct new smooth steering laws together with control Lyapunov functions (CLFs), expanding the set of CLFs available for inverse optimal control design. In the case of constant forward velocity (Dubins car), backstepping produces finite-time (deadbeat) parking, and we show that integrator forwarding yields the very same class of solutions. This reveals a fundamental connection between backstepping and forwarding in addressing both the unicycle and, the Dubins car parking problems.
☆ Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates
Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.
☆ Half-Global Deadbeat Parking for Dubins Vehicle
This paper presents a framework for stabilizing the Dubins vehicle model to zero in finite time (deadbeat parking) by interpreting distance as a time-like variable. We develop control laws that bring the system to a desired position and orientation even when the forward velocity cannot be directly actuated. While the controllers employ inverse-distance gains, we show that the control input remains bounded for all time. In addition to basic deadbeat parking, we incorporate safety considerations by proposing algorithms that prevent the vehicle from crossing in front of the target, enforce deceleration as it approaches the target, and guarantee parking without curb violations. The resulting methods are well-suited for missile guidance and fixed-wing pursuit, but are broadly applicable to physical systems that are represented by the Dubins vehicle model.
comment: Submitted to ACC 2026
☆ Inverse Optimal Feedback and Gain Margins for Unicycle Stabilization
The recent development of globally strict control Lyapunov functions (CLFs) for the challenging unicycle parking problem provides a foundation for pursuing optimality. We address this in the inverse optimal framework, thereby avoiding the need to solve the Hamilton$\unicode{x2013}$Jacobi$\unicode{x2013}$Bellman (HJB) equations, and establish a general result that is optimal with respect to a meaningful cost. We present several design examples that impose varying levels of penalty on the control effort, including arbitrarily bounded control. Furthermore, we show that the inverse optimal controller possesses an infinite gain margin thanks to the system being driftless, and leveraging this property, we extend the design to an adaptive controller that handles model uncertainty. Finally, we compare the performance of the non-adaptive inverse optimal controller with its adaptive counterpart.
comment: Submitted to ACC 2026
☆ Exhaustive-Serve-Longest Control for Multi-robot Scheduling Systems
We study online task allocation for multi-robot, multi-queue systems with stochastic arrivals and switching delays. Time is slotted; each location can host at most one robot per slot; service consumes one slot; switching between locations incurs a one-slot travel delay; and arrivals are independent Bernoulli processes. We formulate a discounted-cost Markov decision process and propose Exhaustive-Serve-Longest (ESL), a simple real-time policy that serves exhaustively when the current location is nonempty and, when idle, switches to a longest unoccupied nonempty location, and we prove the optimality of this policy. As baselines, we tune a fixed-dwell cyclic policy via a discrete-time delay expression and implement a first-come-first-serve policy. Across server-to-location ratios and loads, ESL consistently yields lower discounted holding cost and smaller mean queue lengths, with action-time fractions showing more serving and restrained switching. Its simplicity and robustness make ESL a practical default for real-time multi-robot scheduling systems.
☆ Generating Differentially Private Networks with a Modified Erdős-Rényi Model
Differential privacy has been used to privately calculate numerous network properties, but existing approaches often require the development of a new privacy mechanism for each property of interest. Therefore, we present a framework for generating entire networks in a differentially private way. Differential privacy is immune to post-processing, which allows for any network property to be computed and analyzed for a private output network, without weakening its protections. We consider undirected networks and develop a differential privacy mechanism that takes in a sensitive network and outputs a private network by randomizing its edge set. We prove that this mechanism does provide differential privacy to a network's edge set, though it induces a complex distribution over the space of output graphs. We then develop an equivalent privacy implementation using a modified Erd\H{o}s-R\'{e}nyi model that constructs an output graph edge by edge, and it is efficient and easily implementable, even on large complex networks. Experiments implement $\varepsilon$-differential privacy with $\varepsilon=2.5$ when computing graph Laplacian spectra, and these results show the proposed mechanism incurs $49.34\%$ less error than the current state of the art.
comment: 8 pages, 2 figures
☆ Resource Allocation under Stochastic Demands using Shrinking Horizon Optimization
We consider the problem of optimally allocating a limited number of resources across time to maximize revenue under stochastic demands. This formulation is relevant in various areas of control, such as supply chain, ticket revenue maximization, healthcare operations, and energy allocation in power grids. We propose a bisection method to solve the static optimization problem and extend our approach to a shrinking horizon algorithm for the sequential problem. The shrinking horizon algorithm computes future allocations after updating the distribution of future demands by conditioning on the observed values of demand. We illustrate the method on a simple synthetic example with jointly log-normal demands, showing that it achieves performance close to a bound obtained by solving the prescient problem.
comment: Submitted to the 2026 American Control Conference
Managing ride-sourcing drivers at transportation terminals: a lottery-based queueing approach
Problem definition: Transportation terminals such as airports often experience persistent oversupply of idle ride-sourcing drivers, resulting in long driver waiting times and inducing externalities such as curbside congestion. While platforms now employ virtual queues with control levers like dynamic pricing, information provision, and direct admission control to manage this issue, all existing levers involve significant trade-offs and side effects. This limitation highlights the need for an alternative management approach. Methodology/results: We develop a queueing-theoretic framework to model ride-sourcing operations at terminals and propose a novel lottery-based control mechanism for the virtual queue. This non-monetary strategy works by probabilistically assigning a driver's entry position. By directly influencing their expected waiting time, the mechanism in turn shapes their decision to join the queue. We reformulate the resulting infinite-dimensional, non-smooth optimization into a tractable bi-level program by leveraging the threshold structure of the equilibrium. Theoretically, we prove that the lottery mechanism can achieve higher or equal social welfare than FIFO-queue-based dynamic pricing. Numerical experiments in unconstrained markets show that in profit maximization, our approach only narrowly trails dynamic pricing and significantly outperforms static pricing. Furthermore, it is shown that under commission fee caps, the lottery mechanism can surpass dynamic pricing in profitability. Implications: This study introduces a new, non-monetary lever for managing idle ride-sourcing drivers at transportation terminals. By aligning operational practices with queue-based dynamics, the proposed lottery mechanism offers a robust and implementable alternative to pricing-based approaches, with advantages in both unconstrained and regulated markets.
comment: 28 pages, 15 figures
☆ Addressing Methodological Uncertainty in MCDM with a Systematic Pipeline Approach to Data Transformation Sensitivity Analysis
Multicriteria decision-making methods exhibit critical dependence on the choice of normalization techniques, where different selections can alter 20-40% of the final rankings. Current practice is characterized by the ad-hoc selection of methods without systematic robustness evaluation. We present a framework that addresses this methodological uncertainty through automated exploration of the scaling transformation space. The implementation leverages the existing Scikit-Criteria infrastructure to automatically generate all possible methodological combinations and provide robust comparative analysis.
☆ Improved Stochastic Optimization of LogSumExp
The LogSumExp function, also known as the free energy, plays a central role in many important optimization problems, including entropy-regularized optimal transport and distributionally robust optimization (DRO). It is also the dual to the Kullback-Leibler (KL) divergence, which is widely used in machine learning. In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. Previous approaches that replace the full sum with a small batch introduce significant bias. We propose a novel approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the safe KL divergence. The accuracy of the approximation is controlled by a tunable parameter and can be made arbitrarily small. Like the LogSumExp, our approximation preserves convexity. Moreover, when applied to an $L$-smooth function bounded from below, the smoothness constant of the resulting objective scales linearly with $L$. Experiments in DRO and continuous optimal transport demonstrate the advantages of our approach over state-of-the-art baselines and the effective treatment of numerical issues associated with the standard LogSumExp and KL.
comment: 16 pages, 5 figures
☆ Continuous differentiability of the signum function and Newton's method for bang-bang control
We investigate bang-bang control problems and the possibility to apply Newton's method to solve such kind of problems numerically. To this end, we show that the signum function is Fr\'echet differentiable between appropriate function spaces. Numerical experiments show the applicability of the resulting method.
☆ Bundle Network: a Machine Learning-Based Bundle Method
This paper presents Bundle Network, a learning-based algorithm inspired by the Bundle Method for convex non-smooth minimization problems. Unlike classical approaches that rely on heuristic tuning of a regularization parameter, our method automatically learns to adjust it from data. Furthermore, we replace the iterative resolution of the optimization problem that provides the search direction-traditionally computed as a convex combination of gradients at visited points-with a recurrent neural model equipped with an attention mechanism. By leveraging the unrolled graph of computation, our Bundle Network can be trained end-to-end via automatic differentiation. Experiments on Lagrangian dual relaxations of the Multi-Commodity Network Design and Generalized Assignment problems demonstrate that our approach consistently outperforms traditional methods relying on grid search for parameter tuning, while generalizing effectively across datasets.
☆ Quasi-Ergodic Control of Multi-Periodic Autoregressive Processes: Formulation and Examples
This work considers state dynamics driven by Periodic Autoregressive Moving Average noise, and control of the system over time. Such processes appear frequently in applications involving the environment, such as energy and agriculture. Managing these systems applying forecasts to make decisions that exhibit foresight and risk aversion while maximizing profits is a challenging control problem that can be computationally difficult for standard scenario-based methods. This paper presents a formulation that explicitly enforces time-periodicity of distribution of the state, facilitating the use of periodic stochastic process basis elements as a discretization. By enforcing periodicity explicitly, an ansatz for the solution can be formed that does not require exponential scaling with time. We provide a few examples that can be modeled with this new control formulation.
☆ $\mathcal{KL}$ and Lyapunov Approaches for Discrete-time Peak Computation Problems
In this paper, we propose a method to solve discrete-time peak computation problems (DPCPs for short). DPCPs are optimization problems that consist of maximizing a function over the reachable values set of a discrete-time dynamical system. The optimal value of a DPCP can be rewritten as the supremum of the sequence of optimal values. Previous results provide general techniques for computing the supremum of a real sequence from a well-chosen pair of a strictly increasing continuous function on [0,1] and a positive scalar in (0,1). In this paper, we exploit the specific structure of the optimal value of the DPCP to construct such a pair from classical tools from stability theory: $\mathcal{KL}$ certificate and Lyapunov functions.
comment: 26 pages
☆ Proximal gradient methods in Banach spaces
Proximal gradient methods are a popular tool for the solution of structured, nonsmooth minimization problems. In this work, we investigate an extension of the former to general Banach spaces and provide worst-case convergence rates for, both, convex and nonconvex, problem instances. Moreover, assuming additional regularity properties of stationary points, linear rates of convergence are derived. The theoretical results are illustrated for bang-bang type optimal control problems with partial differential equations which we study in the space of Radon measures. An efficient implementation of the resulting $L^1$-proximal gradient method is given and its performance is compared to standard $L^2$-proximal gradient as well as Frank-Wolfe methods. The paper is complemented by discussing the relationship among different regularity properties as well as by providing a novel characterization of the Polyak--{\L}ojasiewicz--Kurdyka property via second-order conditions involving weak* second subderivatives.
☆ Continuation strategies to mitigate convergence to low-performing local optima in dynamic topology optimization
Solving dynamic topology optimization problems often yields low-performing local optima. Instead of converging towards a design that exploits dynamic mechanisms, a less interesting, mass-driven solution is often generated. This necessitates repeated and computationally expensive optimization reruns before a suitable optimum is found. In this work, an overview of three strategy classes that reduce the need for such reruns is presented: exclusion strategies, frequency shift methods and relaxation strategies. Novel variants for each strategy class are developed, implemented and compared via Monte Carlo sampling on a benchmark problem, namely the sound transmission loss optimization of a sandwich panel. Probabilities of achieving high-performing optima are estimated and all investigated strategies demonstrate quantifiable improvements and trade-offs. The study offers furthermore a quantitative comparison of the presented strategies, supporting researchers in making an informed choice when addressing convergence to poor local optima in dynamic topology optimization.
☆ Tree-based formulation for the multi-commodity flow problem
We introduce a tree-based formulation for the minimum-cost multi-commodity flow problem that addresses large-scale instances. The method decomposes the source-based model by representing flows as convex combinations of trees rooted at source nodes, and solves the resulting formulation with column generation. The number of demand constraints now depends on the number of sources $|S|$, not commodities $|K|$, yielding a compact master problem when $|S| \ll |K|$. We conduct a computational study comparing tree-based decomposition against path-based column generation and direct LP solving. The results show speed-ups of up to one order of magnitude over direct LP solving, and improved scalability compared to path-based formulations. Tree-based decomposition enables solving instances with millions of commodities and hundreds of thousands of nodes. This makes it well-suited for applications in transportation and logistics networks where multiple demands often share common origins.
☆ Observability estimates for the Schrödinger equation on the equilateral triangle
We prove observability estimates for the Schr\"odinger equation posed on the equilateral triangle in the plane, under both Neumann and Dirichlet boundary conditions. No geometric control condition is required on the rough localization functions that we consider. This is the first result of this kind on a non-toric domain in the compact setting. Our strategy is to exploit Pinsky's tiling argument to deduce this result from observability estimates on rational twisted tori. These are obtained via propagation of singularities, adapting arguments from Burq and Zworski. The later require Strichartz estimates on such twisted rational tori, that we derive from Zygmund inequalities in the same geometric setting, also providing the sharp constant. Strichartz estimates on the equilateral triangle are also derived from this analysis.
☆ Learning to Solve Optimization Problems Constrained with Partial Differential Equations
Partial differential equation (PDE)-constrained optimization arises in many scientific and engineering domains, such as energy systems, fluid dynamics and material design. In these problems, the decision variables (e.g., control inputs or design parameters) are tightly coupled with the PDE state variables, and the feasible set is implicitly defined by the governing PDE constraints. This coupling makes the problems computationally demanding, as it requires handling high dimensional discretization and dynamic constraints. To address these challenges, this paper introduces a learning-based framework that integrates a dynamic predictor with an optimization surrogate. The dynamic predictor, a novel time-discrete Neural Operator (Lu et al.), efficiently approximate system trajectories governed by PDE dynamics, while the optimization surrogate leverages proxy optimizer techniques (Kotary et al.) to approximate the associated optimal decisions. This dual-network design enables real-time approximation of optimal strategies while explicitly capturing the coupling between decisions and PDE dynamics. We validate the proposed approach on benchmark PDE-constrained optimization tasks inlacing Burgers' equation, heat equation and voltage regulation, and demonstrate that it achieves solution quality comparable to classical control-based algorithms, such as the Direct Method and Model Predictive Control (MPC), while providing up to four orders of magnitude improvement in computational speed.
☆ Markov Decision Processing Networks
We introduce Markov Decision Processing Networks (MDPNs) as a multiclass queueing network model where service is a controlled, finite-state Markov process. The model exhibits a decision-dependent service process where actions taken influence future service availability. Viewed as a two-sided queueing model, this captures settings such as assemble-to-order systems, ride-hailing platforms, cross-skilled call centers, and quantum switches. We first characterize the capacity region of MDPNs. Unlike classical switched networks, the MDPN capacity region depends on the long-run mix of service states induced by the control of the underlying service process. We show, via a counterexample, that MaxWeight is not throughput-optimal in this class, demonstrating the distinction between MDPNs and classical queueing models. To bridge this gap, we design a weighted average reward policy, a multiobjective MDP that leverages a two-timescale separation at the fluid scale. We prove throughput-optimality of the resulting policy. The techniques yield a clear capacity region description and apply to a broad family of two-sided matching systems.
☆ Distributionally Robust Federated Learning with Outlier Resilience
Federated learning (FL) enables collaborative model training without direct data sharing, but its performance can degrade significantly in the presence of data distribution perturbations. Distributionally robust optimization (DRO) provides a principled framework for handling this by optimizing performance against the worst-case distributions within a prescribed ambiguity set. However, existing DRO-based FL methods often overlook the detrimental impact of outliers in local datasets, which can disproportionately bias the learned models. In this work, we study distributionally robust federated learning with explicit outlier resilience. We introduce a novel ambiguity set based on the unbalanced Wasserstein distance, which jointly captures geometric distributional shifts and incorporates a non-geometric Kullback--Leibler penalization to mitigate the influence of outliers. This formulation naturally leads to a challenging min--max--max optimization problem. To enable decentralized training, we reformulate the problem as a tractable Lagrangian penalty optimization, which admits robustness certificates. Building on this reformulation, we propose the distributionally outlier-robust federated learning algorithm and establish its convergence guarantees. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our approach.
☆ Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
We study distributed reinforcement learning (RL) with policy gradient methods under asynchronous and parallel computations and communications. While non-distributed methods are well understood theoretically and have achieved remarkable empirical success, their distributed counterparts remain less explored, particularly in the presence of heterogeneous asynchronous computations and communication bottlenecks. We introduce two new algorithms, Rennala NIGT and Malenia NIGT, which implement asynchronous policy gradient aggregation and achieve state-of-the-art efficiency. In the homogeneous setting, Rennala NIGT provably improves the total computational and communication complexity while supporting the AllReduce operation. In the heterogeneous setting, Malenia NIGT simultaneously handles asynchronous computations and heterogeneous environments with strictly better theoretical guarantees. Our results are further corroborated by experiments, showing that our methods significantly outperform prior approaches.
☆ Simplex Frank-Wolfe: Linear Convergence and Its Numerical Efficiency for Convex Optimization over Polytopes
We investigate variants of the Frank-Wolfe (FW) algorithm for smoothing and strongly convex optimization over polyhedral sets, with the goal of designing algorithms that achieve linear convergence while minimizing per-iteration complexity as much as possible. Starting from the simple yet fundamental unit simplex, and based on geometrically intuitive motivations, we introduce a novel oracle called Simplex Linear Minimization Oracle (SLMO), which can be implemented with the same complexity as the standard FW oracle. We then present two FW variants based on SLMO: Simplex Frank-Wolfe and the refined Simplex Frank-Wolfe (rSFW). Both variants achieve a linear convergence rate for all three common step-size rules. Finally, we generalize the entire framework from the unit simplex to arbitrary polytopes. Furthermore, the refinement step in rSFW can accommodate any existing FW strategies such as the well-known away-step and pairwise-step, leading to outstanding numerical performance. We emphasize that the oracle used in our rSFW method requires only one more vector addition compared to the standard LMO, resulting in the lowest per-iteration computational overhead among all known Frank-Wolfe variants with linear convergence.
☆ An Augmented Lagrangian Value Function Method for Lower-level Constrained Stochastic Bilevel Optimization
Recently, lower-level constrained bilevel optimization has attracted increasing attention. However, existing methods mostly focus on either deterministic cases or problems with linear constraints. The main challenge in stochastic cases with general constraints is the bias and variance of the hyper-gradient, arising from the inexact solution of the lower-level problem. In this paper, we propose a novel stochastic augmented Lagrangian value function method for solving stochastic bilevel optimization problems with nonlinear lower-level constraints. Our approach reformulates the original bilevel problem using an augmented Lagrangian-based value function and then applies a penalized stochastic gradient method that carefully manages the noise from stochastic oracles. We establish an equivalence between the stochastic single-level reformulation and the original constrained bilevel problem and provide a non-asymptotic rate of convergence for the proposed method. The rate is further enhanced by employing variance reduction techniques. Extensive experiments on synthetic problems and real-world applications demonstrate the effectiveness of our approach.
☆ SDC-Based Model Predictive Control: Enhancing Computational Feasibility for Safety-Critical Quadrotor Control
Nonlinear Model Predictive Control (NMPC) is widely used for controlling high-speed robotic systems such as quadrotors. However, its significant computational demands often hinder real-time feasibility and reliability, particularly in environments requiring robust obstacle avoidance. This paper proposes a novel SDC-Based Model Predictive Control (MPC) framework, which preserves the high-precision performance of NMPC while substantially reducing computational complexity by over 30%. By reformulating the nonlinear quadrotor dynamics through the State-Dependent Coefficient (SDC) method, the original nonlinear program problem is transformed into a sequential quadratic optimization problem. The controller integrates an integral action to eliminate steady-state tracking errors and imposes constraints for safety-critical obstacle avoidance. Additionally, a disturbance estimator is incorporated to enhance robustness against external perturbations. Simulation results demonstrate that the SDC-Based MPC achieves comparable tracking accuracy to NMPC, with greater efficiency in terms of computation times, thereby improving its suitability for real-time applications. Theoretical analysis further establishes the stability and recursive feasibility of the proposed approach.
comment: This is the initial version
☆ Learning Hybrid Dynamics via Convex Optimizations
This paper investigates the problem of identifying state-dependent switching systems, a class of hybrid dynamical systems that combine multiple linear or nonlinear modes. We propose two broad classes of switching systems: switching linear systems (SLSs) and switching polynomial systems (SPSs). We first formulate the joint estimation of the mode dynamics and switching rules as a mixed integer program. To solve its inherent scalability issue, we develop a hierarchy of convex relaxations and establish a bound and conditions under which these relaxations are tight. Building on these results, we propose a bilevel convex optimization framework that alternates between mode assignment and dynamics estimation, and we recover switching boundaries using margin-based polynomial classifiers. Numerical experiments on both linear and nonlinear oscillators demonstrate that the method accurately identifies mode dynamics and reconstructs switching surfaces from trajectory data. Our results provide a tractable optimization-based framework for switching system identification.
comment: 8 pages, 4 figures
☆ A Novel Model for 3D Motion Planning for a Generalized Dubins Vehicle with Pitch and Yaw Rate Constraints
In this paper, we propose a new modeling approach and a fast algorithm for 3D motion planning, applicable for fixed-wing unmanned aerial vehicles. The goal is to construct the shortest path connecting given initial and final configurations subject to motion constraints. Our work differs from existing literature in two ways. First, we consider full vehicle orientation using a body-attached frame, which includes roll, pitch, and yaw angles. However, existing work uses only pitch and/or heading angle, which is insufficient to uniquely determine orientation. Second, we use two control inputs to represent bounded pitch and yaw rates, reflecting control by two separate actuators. In contrast, most previous methods rely on a single input, such as path curvature, which is insufficient for accurately modeling the vehicle's kinematics in 3D. We use a rotation minimizing frame to describe the vehicle's configuration and its evolution, and construct paths by concatenating optimal Dubins paths on spherical, cylindrical, or planar surfaces. Numerical simulations show our approach generates feasible paths within 10 seconds on average and yields shorter paths than existing methods in most cases.
comment: The code for this paper is available at https://github.com/DeepakPrakashKumar/3D-Motion-Planning-for-Generalized-Dubins-with-Pitch-Yaw-constraints
♻ ☆ Trajectory Encryption Cooperative Salvo Guidance
This paper introduces the concept of trajectory encryption in cooperative simultaneous target interception, wherein heterogeneity in guidance principles across a team of unmanned autonomous systems is leveraged as a strategic design feature. By employing a mix of heterogeneous time-to-go formulations leading to a cooperative guidance strategy, the swarm of vehicles is able to generate diverse trajectory families. This diversity expands the feasible solution space for simultaneous target interception, enhances robustness under disturbances, and enables flexible time-to-go adjustments without predictable detouring. From an adversarial perspective, heterogeneity obscures the collective interception intent by preventing straightforward prediction of swarm dynamics, effectively acting as an encryption layer in the trajectory domain. Simulations demonstrate that the swarm of heterogeneous vehicles is able to intercept a moving target simultaneously from a diverse set of initial engagement configurations.
♻ ☆ A Two-fold Randomization Framework for Impulse Control Problems
We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb \lambda$ vanishes. This convergence, combined with the value function's $C^{2,\alpha}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $\sigma$ in the state process effectively demonstrates the exploration-exploitation tradeoff.
♻ ☆ Parallel subspace correction methods for semicoercive and nearly semicoercive convex optimization with applications to nonlinear PDEs
We present new convergence analyses for parallel subspace correction methods for unconstrained semicoercive and nearly semicoercive convex optimization problems, generalizing the theory of singular and nearly singular linear problems to a class of nonlinear problems. Our results demonstrate that the elegant theoretical framework developed for singular and nearly singular linear problems can be extended to unconstrained semicoercive and nearly semicoercive convex optimization problems. For semicoercive problems, we show that the convergence rate can be estimated in terms of a seminorm stable decomposition over the subspaces and the kernel of the problem, aligning with the theory for singular linear problems. For nearly semicoercive problems, we establish a parameter-independent convergence rate, assuming the kernel of the semicoercive part can be decomposed into a sum of local kernels, which aligns with the theory for nearly singular problems. To demonstrate the applicability of our results, we provide convergence analyses of two-level additive Schwarz methods for solving certain nonlinear partial differential equations with Neumann boundary conditions, within the proposed abstract framework.
comment: 31 pages, 0 figures
♻ ☆ Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
Data selection plays a crucial role in data-driven decision-making, including in large language models (LLMs), and is typically task-dependent. Properties such as data quality and diversity have been extensively studied and are known to enhance model performance. However, it remains unclear whether there exist other quantitative and general principles of data selection that can consistently improve performance, especially for complicated tasks. In this paper, we demonstrate that selecting more uniformly distributed data can improve training efficiency while enhancing performance. Specifically, we establish that more uniform (less biased) distribution leads to a larger minimum pairwise distance between data points, denoted by $h_{\min}$, and prove that a smaller $h_{\min}$ can slow down the training dynamics of gradient descent (GD). Moreover, we theoretically show that the approximation error of neural networks decreases as $h_{\min}$ increases. Our analysis introduces a convergence framework for GD beyond the Neural Tangent Kernel (NTK) regime, applicable to a broad class of architectures, including transformers, without requiring Lipschitz smoothness. This framework further provides theoretical justification for the use of residual connection and function composition in deep neural architectures. In the end, we conduct comprehensive experiments for supervised fine-tuning across various settings, including different optimization strategies, model sizes, and training datasets. The results consistently demonstrate that selecting data by maximizing pairwise distance significantly accelerates training and achieves comparable or better performance in LLMs across diverse datasets. Code and Datasets are available at the link: https://github.com/SafeRL-Lab/data-uniformity.
♻ ☆ Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes
We study the almost sure convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $\gamma$-H\"{o}lder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: $\min_{s\leq t} \|\nabla F(w_s)\|^2 = o(t^{p-1})$ for non-convex objectives and $F(w_t) - F_* = o(t^{2\gamma/(1+\gamma) \cdot \max(p-1,-2p+1)-\epsilon})$ for $\beta \in (0, 1)$ and $\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ almost surely for convex objectives. In addition, we proved that SHB with constant momentum parameter $\beta \in (0, 1)$ attains a convergence rate of $F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}{\delta})$ with probability at least $1-\delta$ when $F$ is convex and $\gamma = 1$ and step size $\alpha_t = \Theta(t^{-p})$ with $p \in (\frac{1}{2}, 1)$.
♻ ☆ A quantitative Robbins-Siegmund theorem
The Robbins-Siegmund theorem is one of the most important results in stochastic optimization, where it is widely used to prove the convergence of stochastic algorithms. We provide a quantitative version of the theorem, establishing a bound on how far one needs to look in order to locate a region of \emph{metastability} in the sense of Tao. Our proof involves a metastable analogue of Doob's theorem for $L_1$-supermartingales along with a series of technical lemmas that make precise how quantitative information propagates through sums and products of stochastic processes. In this way, our paper establishes a general methodology for finding metastable bounds for stochastic processes that can be reduced to supermartingales, and therefore for obtaining quantitative convergence information across a broad class of stochastic algorithms whose convergence proof relies on some variation of the Robbins-Siegmund theorem. We conclude by discussing how our general quantitative result might be used in practice.
comment: 19 pages
♻ ☆ Moment-sos and spectral hierarchies for polynomial optimization on the sphere and quantum de Finetti theorems
We revisit the convergence analysis of two approximation hierarchies for polynomial optimization on the unit sphere. The first one is based on the moment-sos approach and gives semidefinite bounds for which Fang and Fawzi (2021) showed an analysis in $O(1/r^2)$ for the r-th level bound, using the polynomial kernel method. The second hierarchy was recently proposed by Lovitz and Johnston (2023) and gives spectral bounds for which they show a convergence rate in $O(1/r)$, using a quantum de Finetti theorem of Christandl et al. (2007) that applies to complex Hermitian matrices with a "double" symmetry. We investigate links between these approaches, in particular, via duality of moments and sums of squares. Our main results include showing that the spectral bounds cannot have a convergence rate better than $O(1/r^2)$ and that they do not enjoy generic finite convergence. In addition, we propose alternative performance analyses that involve explicit constants depending on intrinsic parameters of the optimization problem. For this we develop a novel "banded" real de Finetti theorem that applies to real matrices with "double" symmetry. We also show how to use the polynomial kernel method to obtain a de Finetti type result in $O(1/r^2)$ for real maximally symmetric matrices, improving an earlier result in $O(1/r)$ of Doherty and Wehner (2012).
comment: 31 pages
♻ ☆ A Globally Convergent Gradient Method with Momentum
In this work, we consider smooth unconstrained optimization problems and we deal with the class of gradient methods with momentum, i.e., descent algorithms where the search direction is defined as a linear combination of the current gradient and the preceding search direction. This family of algorithms includes nonlinear conjugate gradient methods and Polyak's heavy-ball approach, and is thus of high practical and theoretical interest in large-scale nonlinear optimization. We propose a general framework where the scalars of the linear combination defining the search direction are computed simultaneously by minimizing the approximate quadratic model in the 2 dimensional subspace. This strategy allows us to define a class of gradient methods with momentum enjoying global convergence guarantees and an optimal worst-case complexity bound in the nonconvex setting. Differently than all related works in the literature, the convergence conditions are stated in terms of the Hessian matrix of the bi-dimensional quadratic model. To the best of our knowledge, these results are novel to the literature. Moreover, extensive computational experiments show that the gradient method with momentum here presented is a solid choice to tackle some classes of nonconvex unconstrained problems.
♻ ☆ Rapid stabilization and finite time stabilization of the bilinear Schrödinger equation
We propose a method to establish the rapid stabilization of the bilinear Schr\"odinger control system and its linearized system, and the finite time stabilization of the linearized system using the Grammian operators. The analysis of the rapid stabilization involves a new quantity (variable) which is inspired by the adjoint state in the optimal control theory and is proposed in our recent work on control systems associated with strongly continuous group. The analysis of the finite time stabilization follows the strategy introduced by Coron and Nguyen in the study of the finite time stabilization of the heat equation and incorporate a new ingredient involving the estimate of the cost of controls of the linearized system in small time derived in this paper.
♻ ☆ Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level
Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $\alpha$-th moment assumption, $\alpha \in (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.
comment: 60 pages
♻ ☆ Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise
Gradient clipping is a widely used technique in Machine Learning and Deep Learning (DL), known for its effectiveness in mitigating the impact of heavy-tailed noise, which frequently arises in the training of large language models. Additionally, first-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_0,L_1)$-smoothness assumption, a property observed in many DL tasks. However, the high-probability convergence of Clip-SGD under both assumptions -- heavy-tailed noise and $(L_0,L_1)$-smoothness -- has not been fully addressed in the literature. In this paper, we bridge this critical gap by establishing the first high-probability convergence bounds for Clip-SGD applied to convex $(L_0,L_1)$-smooth optimization with heavy-tailed noise. Our analysis extends prior results by recovering known bounds for the deterministic case and the stochastic setting with $L_1 = 0$ as special cases. Notably, our rates avoid exponentially large factors and do not rely on restrictive sub-Gaussian noise assumptions, significantly broadening the applicability of gradient clipping.
comment: 33 pages
♻ ☆ Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
comment: Submitted to SIAM Journal on Control and Optimization
♻ ☆ Augmentation approaches for Mixed Integer Programming
This paper analyses the feasible sets structure of general mixed integer linear programs (MIPs) and its relationship with the existence of a finite cardinality test set which can be applied in augmentation algorithms. We derive and characterize a computable, finite test set for MIPs which can be embedded in a finite augmentation algorithm. Several examples illustrate the structure of this set and its relationship with previous approaches in the literature.
comment: No comments
♻ ☆ BIBO stability of 1-D hyperbolic boundary control systems
We study the question of bounded-input bounded-output (BIBO) stability of a class of 1-D hyperbolic boundary control systems, which, in particular, contains distributed port-Hamiltonian systems. Exploiting the particular structure of the transfer function of these systems, we derive several sufficient conditions for BIBO stability.
comment: 22 pages
♻ ☆ From group stage to league format: The impact of the draw in European football's Champions League
A fundamental reform has been introduced in the 2024/25 season of club competitions organised by the Union of European Football Associations (UEFA): the well-established group stage has been replaced by an incomplete round-robin format. In this format, the 36 teams are ranked in a single league table, but play against only a subset of the competitors. While this innovative change has highlighted that the incomplete round-robin tournament is a reasonable alternative to the standard design of allocating the teams into round-robin groups, the characteristics of the new format remain unexplored. Our paper contributes to this topic by using simulations to compare the uncertainty generated by the draw in the old format with that in the new format of the UEFA Champions League. We develop a method to break down the impact of the 2024/25 reform into various components for each team. The new format is found to decrease the overall effect of the draw. However, this reduction can mainly be attributed to the inaccurate seeding system used by UEFA. If the teams are seeded based on their actual strengths, the impact of the draw is about the same in a tournament with an incomplete round-robin league or a group stage.
comment: 29 pages, 12 figures, 3 tables
♻ ☆ Eco-Conscious Customers Behavior in Capacitated Two-Echelon Location-Routing Models for Sustainable Last-Mile Delivery
This paper introduces a novel capacitated Two-Echelon Location-Routing Problem with Eco-conscious Customer Behavior (2E-LRP-ECB) aimed at enhancing the environmental sustainability of last-mile delivery (LMD) operations. The model jointly optimizes dynamic satellites location, vehicle routing, and customer delivery modes, explicitly accounting for (i) heterogeneous customer travel behaviors, (ii) heterogeneous fleet composition, and (iii) diverse emission profiles across both echelons. A piecewise linear formulation captures the additional emissions from first-echelon vehicle stops, while customer travel emissions are computed based on individual willingness and capacity to use zero-emission transport. The problem is solved exactly for a wide set of real-world-based instances under four operational strategies, differing in optimization objectives and second-echelon fleet composition. Computational experiments, including a case study with a major Portuguese LMD provider, highlight the environmental and operational tradeoffs inherent to strategic and operational choices such as fleet composition, satellite activation, and customer pick-up policies. Results reveal that minimizing distance can lead to substantial increases in emissions, while emissions-oriented strategies leverage customer travel to achieve significant sustainability gains without compromising service efficiency. A multi-objective analysis using the epsilon-constraint method produces Pareto frontiers and knee-point solutions, offering actionable insights for balancing operational efficiency and environmental impact in sustainable LMD design.
♻ ☆ FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
Mixed-Integer Linear Programming (MILP) is a foundational tool for complex decision-making problems. However, the NP-hard nature of MILP presents a significant computational challenge, motivating the development of machine learning-based heuristic solutions to accelerate downstream solvers. While recent generative models have shown promise in learning powerful heuristics, they suffer from a critical limitation. That is, they model the distribution of only the integer variables and fail to capture the intricate coupling between integer and continuous variables, creating an information bottleneck and ultimately leading to suboptimal solutions. To this end, we propose Joint Continuous-Integer Flow for Mixed-Integer Linear Programming (FMIP), which is the first generative framework that models the joint distribution of both integer and continuous variables for MILP solutions. Built upon the joint modeling paradigm, a holistic guidance mechanism is designed to steer the generative trajectory, actively refining solutions toward optimality and feasibility during the inference process. Extensive experiments on eight standard MILP benchmarks demonstrate the superior performance of FMIP against existing baselines, reducing the primal gap by 41.34% on average. Moreover, we show that FMIP is fully compatible with arbitrary backbone networks and various downstream solvers, making it well-suited for a broad range of real-world MILP applications.
comment: FMIP is a generative framework that jointly models integer and continuous variables in MILP, achieving a 41.34% reduction in primal gap and demonstrating compatibility with various solvers and applications
♻ ☆ Muon Optimizes Under Spectral Norm Constraints
The pursuit of faster optimization algorithms remains an active and important research direction in deep learning. Recently, the Muon optimizer [JJB+24] has demonstrated promising empirical performance, but its theoretical foundation remains less understood. In this paper, we bridge this gap and provide a theoretical analysis of Muon by placing it within the Lion-$\mathcal{K}$ family of optimizers [CLLL24]. Specifically, we show that Muon corresponds to Lion-$\mathcal{K}$ when equipped with the nuclear norm, and we leverage the theoretical results of Lion-$\mathcal{K}$ to establish that Muon (with decoupled weight decay) implicitly solves an optimization problem that enforces a constraint on the spectral norm of weight matrices. This perspective not only demystifies the implicit regularization effects of Muon but also leads to natural generalizations through varying the choice of convex map $\mathcal{K}$, allowing for the exploration of a broader class of implicitly regularized and constrained optimization algorithms.
♻ ☆ VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
Optimizing large-scale nonconvex problems, common in deep learning, demands balancing rapid convergence with computational efficiency. First-order (FO) optimizers, which serve as today's baselines, provide fast convergence and good generalization but often incur high computation and memory costs due to the large size of modern models. Conversely, zeroth-order (ZO) algorithms reduce this burden using estimated gradients, yet their slow convergence in high-dimensional settings limits practicality. We introduce VAMO (VAriance-reduced Mixed-gradient Optimizer), a stochastic variance-reduced method that extends mini-batch SGD with full-batch ZO gradients under an SVRG-style framework. VAMO's hybrid design utilizes a two-point ZO estimator to achieve a dimension-agnostic convergence rate of $\mathcal{O}(1/T + 1/b)$, where $T$ is the number of iterations and $b$ is the batch-size, surpassing the dimension-dependent slowdown of purely ZO methods and significantly improving over SGD's $\mathcal{O}(1/\sqrt{T})$ rate. Additionally, we propose a multi-point variant that mitigates the $O(1/b)$ error by adjusting the number of estimation points to balance convergence and cost. Importantly, VAMO achieves these gains with smaller dynamic memory requirements than many FO baselines, making it particularly attractive for edge deployment. Experiments including traditional neural network training and LLM finetuning confirm that VAMO not only outperforms established FO and ZO methods, but also does so with a light memory footprint.
♻ ☆ Predict-and-Optimize Robust Unit Commitment with Statistical Guarantees via Weight Combination
The growing uncertainty from renewable power and electricity demand brings significant challenges to unit commitment (UC). While various advanced forecasting and optimization methods have been developed to predict better and address this uncertainty, most previous studies treat forecasting and optimization as separate tasks. This separation can lead to suboptimal results due to misalignment between the objectives of the two tasks. To overcome this challenge, we propose a robust UC framework that integrates forecasting and optimization processes while ensuring statistical guarantees. In the forecasting stage, we combine multiple predictions derived from diverse data sources and methodologies for an improved prediction, aiming to optimize the UC performance. In the optimization stage, the combined prediction is used to construct an uncertainty set with statistical guarantees, based on which the robust UC model is formulated. The optimal robust UC solution provides feedback to refine the weight used for combining multiple predictions. To solve the proposed integrated forecasting-optimization framework efficiently and effectively, we develop a neural network-based surrogate model for acceleration and introduce a reshaping method for the uncertainty set based on the optimization result to reduce conservativeness. Case studies on modified IEEE 30-bus and 118-bus systems demonstrate the advantages of the proposed approach.
comment: accepted by IEEE Transactions on Power Systems
♻ ☆ Bayesian distributionally robust variational inequalities: regularization and quantification
We propose a Bayesian distributionally robust variational inequality (DRVI) framework that models the data-generating distribution through a finite mixture family, which allows us to study the DRVI on a tractable finite-dimensional parametric ambiguity set. To address distributional uncertainty, we construct a data-driven ambiguity set with posterior coverage guarantees via Bayesian inference. We also employ a regularization approach to ensure numerical stability. We prove the existence of solutions to the Bayesian DRVI and the asymptotic convergence to a solution as sample size grows to infinity and the regularization parameter goes to zero. Moreover, we derive quantitative stability bounds and finite-sample guarantees under data scarcity and contamination. Numerical experiments on a distributionally robust multi-portfolio Nash equilibrium problem validate our theoretical results and demonstrate the robustness and reliability of Bayesian DRVI solutions in practice.
Systems and Control 18
☆ Solar and Wind Power Forecasting: A Comparative Review of LSTM, Random Forest, and XGBoost Models
Rising global energy demand from population growth raises concerns about the sustainability of fossil fuels. Consequently, the energy sector has increasingly transitioned to renewable energy sources like solar and wind, which are naturally abundant. However, the periodic and unpredictable nature of these resources pose significant challenges for power system reliability. Accurate forecasting is essential to ensure grid stability and optimize energy management. But due to the high variability in weather conditions which directly affected wind and solar energy, achieving precise predictions remains difficult. Advancements in Artificial Intelligence (AI), particularly in Machine Learning (ML) and Deep Learning (DL), offer promising solutions to improve forecasting accuracy. The study highlights three widely used algorithms for solar and wind energy prediction: Long Short-Term Memory (LSTM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). These models are capable of learning complex patterns from historical and environmental data, enabling more accurate forecasts and contributing to the enhanced efficiency and reliability of renewable energy systems. This review aims to provide an overview on RF, XGBoost, and LSTM by conducting a comparative analysis across three essential criteria: research prevalence, model complexity, and computational execution time.
☆ Zeroth-Order Constrained Optimization from a Control Perspective via Feedback Linearization
Designing safe derivative-free optimization algorithms under unknown constraints is a fundamental challenge in modern learning and control. Most existing zeroth-order (ZO) approaches typically assume white-box constraints or focus on convex settings, leaving the general case of nonconvex optimization with black-box constraints largely open. We propose a control-theoretic framework for ZO constrained optimization that enforces feasibility without relying on solving costly convex subproblems. Leveraging feedback linearization, we introduce a family of ZO feedback linearization (ZOFL) algorithms applicable to both equality and inequality constraints. Our method requires only noisy, sample-based gradient estimates yet provably guarantees constraint satisfaction under mild regularity conditions. We establish finite-time bounds on constraint violation and further present a midpoint discretization variant that further improves feasibility without sacrificing optimality. Empirical results demonstrate that ZOFL consistently outperforms standard ZO baselines, achieving competitive objective values while maintaining feasibility.
☆ Frequency Control and Optimal Power Sharing in Combined Power and Heating Networks with Heat Pumps
Heat pumps have the capability for fast adjustments in power consumption with potential connections to large heating-inertia district heating networks, and are thus a very important resource for providing frequency support in low-inertia power systems. Nevertheless, the coupling of power networks with district heating systems renders the underlying dynamics much more involved. It is therefore important to ensure that system stability and appropriate power sharing are maintained. In this paper, we consider the problem of leveraging district heating systems as ancillary services for primary frequency control in power networks via heat pumps. We propose a novel power sharing scheme for heating systems based on the average temperature. This enables an optimal power allocation among diverse energy sources without requiring load disturbances information. We then discuss two approaches for heating systems to contribute to frequency regulation in power networks. We show that both approaches ensure stability in the combined heat and power network and facilitate optimal power allocation among the different energy sources. We also discuss how various generation dynamics can be incorporated into our framework with guaranteed stability and optimality. Finally, we conduct simulations that demonstrate various tradeoffs in the transient response and the practical potential of the proposed approaches.
☆ Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
☆ Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse steady and unsteady-states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop.
comment: 8 pages, 2 figures
☆ Integrated Communication and Control for Energy-Efficient UAV Swarms: A Multi-Agent Reinforcement Learning Approach
The deployment of unmanned aerial vehicle (UAV) swarm-assisted communication networks has become an increasingly vital approach for remediating coverage limitations in infrastructure-deficient environments, with especially pressing applications in temporary scenarios, such as emergency rescue, military and security operations, and remote area coverage. However, complex geographic environments lead to unpredictable and highly dynamic wireless channel conditions, resulting in frequent interruptions of air-to-ground (A2G) links that severely constrain the reliability and quality of service in UAV swarm-assisted mobile communications. To improve the quality of UAV swarm-assisted communications in complex geographic environments, we propose an integrated communication and control co-design mechanism. Given the stringent energy constraints inherent in UAV swarms, our proposed mechanism is designed to optimize energy efficiency while maintaining an equilibrium between equitable communication rates for mobile ground users (GUs) and UAV energy expenditure. We formulate the joint resource allocation and 3D trajectory control problem as a Markov decision process (MDP), and develop a multi-agent reinforcement learning (MARL) framework to enable real-time coordinated actions across the UAV swarm. To optimize the action policy of UAV swarms, we propose a novel multi-agent hybrid proximal policy optimization with action masking (MAHPPO-AM) algorithm, specifically designed to handle complex hybrid action spaces. The algorithm incorporates action masking to enforce hard constraints in high-dimensional action spaces. Experimental results demonstrate that our approach achieves a fairness index of 0.99 while reducing energy consumption by up to 25% compared to baseline methods.
☆ Toward a Robust Biomimetic Hybrid Battery: Bridging Biology, Electrochemistry and Data-Driven Control
Electric vehicles and renewable energy systems need batteries that charge quickly, last many years and still store a lot of energy, but current chemistries struggle to deliver all three. Inspired by electric fish that deliver bursts of current and birds that sleep with half their brains, we propose a hybrid battery concept called SwiftPulse. It combines sodium-ion cells that provide energy with niobium-oxide cells that accept high-power pulses. A pulse-based charger and a battery-management strategy rotate clusters of cells into rest so they can recover. We derive simple models of energy density, diffusion and capacity fade to show that a pack made mostly of sodium-ion modules with a smaller fraction of niobium-oxide modules could exceed 175 Wh per kg, endure more than ten thousand charge-discharge cycles and recharge to eighty percent in less than ten minutes. Simulations suggest that pulsed charging reduces ion buildup at the surface and slows degradation. We outline a roadmap for cell-level and module-level experiments and suggest integrating machine learning to adapt pulse parameters and rest scheduling. By blending ideas from biology, electrochemistry and data-driven control, this work points toward batteries that are safer, faster to charge and longer lasting.
☆ Real Time Fatigue Crack Growth Monitoring Using High Precision Control and Data Acquisition Systems
Fatigue cracks may initiate and propagate long before a structural component reaches the end of its nominal life. Detecting and quantifying crack growth in real time is critical for avoiding catastrophic failures in aerospace structures, nuclear reactors and civil infrastructure. This research reviews four widely used crack growth monitoring techniques: digital image correlation (DIC), acoustic emission (AE), compliance based methods and direct current potential drop (DCPD). The research evaluates their working principles, instrumentation, detection thresholds, noise sensitivities and suitability under various environmental conditions. Recent advances in high precision control and data acquisition systems (DAQs) that enable multi sensor data fusion are explored. The goal is to guide selection and integration of monitoring methods in modern structural health monitoring architectures.
☆ Electric Vehicle Charger Infrastructure Planning: Demand Estimation, Coverage Optimization Over an Integrated Power Grid
For electrifying the transportation sector, deploying a strategically planned and efficient charging infrastructure is essential. This paper presents a two-phase approach for electric vehicle (EV) charger deployment that integrates spatial point-of-interest analysis and maximum coverage optimization over an integrated spatial power grid. Spatial-focused studies in the literature often overlook electrical grid constraints, while grid-focused work frequently considers statistically modeled EV charging demand. To address these gaps, a new framework is proposed that combines spatial network planning with electrical grid considerations. This study approaches EV charger planning from a perspective of the distribution grid, starting with an estimation of EV charging demand and the identification of optimal candidate locations. It ensures that the capacity limits of newly established chargers are maintained within the limits of the power grid. This framework is applied in a test case for the Dallas area, integrating the existing EV charger network with an 8500-bus distribution system for comprehensive planning.
☆ Robustness of 'small' networks
Modeling how networks change under structural perturbations can yield foundational insights into network robustness, which is critical in many real-world applications. The largest connected component is a popular measure of network performance. Percolation theory provides a theoretical framework to establish statistical properties of the largest connected component of large random graphs. However, this theoretical framework is typically only exact in the large-$\nodes$ limit, failing to capture the statistical properties of largest connected components in small networks, which many real-world networks are. We derive expected values for the largest connected component of small $G(\nodes,p)$ random graphs from which nodes are either removed uniformly at random or targeted by highest degree and compare these values with existing theory. We also visualize the performance of our expected values compared to existing theory for predicting the largest connected component of various real-world, small graphs.
comment: 10 pages, 7 figures
☆ Communication-aware Wide-Area Damping Control using Risk-Constrained Reinforcement Learning
Non-ideal communication links, especially delays, critically affect fast networked controls in power systems, such as the wide-area damping control (WADC). Traditionally, a delay estimation and compensation approach is adopted to address this cyber-physical coupling, but it demands very high accuracy for the fast WADC and cannot handle other cyber concerns like link failures or {cyber perturbations}. Hence, we propose a new risk-constrained framework that can target the communication delays, yet amenable to general uncertainty under the cyber-physical couplings. Our WADC model includes the synchronous generators (SGs), and also voltage source converters (VSCs) for additional damping capabilities. To mitigate uncertainty, a mean-variance risk constraint is introduced to the classical optimal control cost of the linear quadratic regulator (LQR). Unlike estimating delays, our approach can effectively mitigate large communication delays by improving the worst-case performance. A reinforcement learning (RL)-based algorithm, namely, stochastic gradient-descent with max-oracle (SGDmax), is developed to solve the risk-constrained problem. We further show its guaranteed convergence to stationarity at a high probability, even using the simple zero-order policy gradient (ZOPG). Numerical tests on the IEEE 68-bus system not only verify SGDmax's convergence and VSCs' damping capabilities, but also demonstrate that our approach outperforms conventional delay compensator-based methods under estimation error. While focusing on performance improvement under large delays, our proposed risk-constrained design can effectively mitigate the worst-case oscillations, making it equally effective for addressing other communication issues and cyber perturbations.
comment: 12 pages, 14 figures, Accepted for publication in IEEE Transactions on Smart Grid, 2025
☆ AW-EL-PINNs: A Multi-Task Learning Physics-Informed Neural Network for Euler-Lagrange Systems in Optimal Control Problems
This paper presents adaptive weighted Euler-Lagrange theorem combined physics-informed neural networks (AW-EL-PINNs) for solving Euler-Lagrange systems in optimal control problems. The framework systematically converts optimal control frameworks into two-point boundary value problems (TPBVPs) while establishing a multi-task learning paradigm through innovative integration of the Euler-Lagrange theorem with deep learning architecture. An adaptive loss weighting mechanism dynamically balances loss function components during training, decreasing tedious manual tuning of weighting the loss functions compared to the conventional physics-informed neural networks (PINNs). Based on six numerical examples, it's clear that AW-EL-PINNs achieve enhanced solution accuracy compared to baseline methods while maintaining stability throughout the optimization process. These results highlight the framework's capability to improve precision and ensure stability in solving Euler-Lagrange systems in optimal control problems, offering potential strategies for problems under physical applications.
♻ ☆ Safety-Critical Control with Guaranteed Lipschitz Continuity via Filtered Control Barrier Functions
In safety-critical control systems, ensuring both system safety and smooth control input is essential for practical deployment. Existing Control Barrier Function (CBF) frameworks, especially High-Order CBFs (HOCBFs), effectively enforce safety constraints, but also raise concerns about the smoothness of the resulting control inputs. While smoothness typically refers to continuity and differentiability, it does not by itself ensure bounded input variation. In contrast, Lipschitz continuity is a stronger form of continuity that not only is necessary for the theoretical guarantee of safety, but also bounds the rate of variation and eliminates abrupt changes in the control input. Such abrupt changes can degrade system performance or even violate actuator limitations, yet current CBF-based methods do not provide Lipschitz continuity guarantees. This paper introduces Filtered Control Barrier Functions (FCBFs), which extend HOCBFs by incorporating an auxiliary dynamic system-referred to as an input regularization filter-to produce Lipschitz continuous control inputs. The proposed framework ensures safety, control bounds, and Lipschitz continuity of the control inputs simultaneously by integrating FCBFs and HOCBFs within a unified quadratic program (QP). Theoretical guarantees are provided and simulations on a unicycle model demonstrate the effectiveness of the proposed method compared to standard and smoothness-penalized HOCBF approaches.
comment: 8 pages, 4 figures
♻ ☆ Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with an asymptotic local \textit{linear} convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local \textit{superlinear} convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods with similar guarantees do not offer convergence rates in the nonconvex-nonconcave setting. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria. Code for our experiments can be found at https://github.com/CLeARoboticsLab/ZeroSumGameSolve.jl.
♻ ☆ Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings contractive under Euclidean norm. Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA where the iterations have arbitrary norm contractive mappings and Markovian noise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ in the general case, and at a rate of $O(1/n)$ in a special case where the slower timescale is noiseless. Our analysis uses the generalized Moreau envelope to handle the arbitrary norm contractions and solutions of Poisson equation to deal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, we give the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPs under the average reward criterion. We also obtain a rate of $O(1/n)$ for Q-Learning with Polyak-averaging and provide an algorithm for learning Generalized Nash Equilibrium (GNE) for strongly monotone games which converges at a rate of $O(1/n^{2/3})$.
comment: To be presented at IEEE Conference on Decision and Control (CDC) 2025
Data-informativity conditions for structured linear systems with implications for dynamic networks
When estimating a single subsystem (module) in a linear dynamic network with a prediction error method, a data-informativity condition needs to be satisfied for arriving at a consistent module estimate. This concerns a condition on input signals in the constructed, possibly MIMO (multiple input multiple output) predictor model being persistently exciting, which is typically guaranteed if the input spectrum is positive definite for a sufficient number of frequencies. Generically, the condition can be formulated as a path-based condition on the graph of the network model. The current condition has two elements of possible conservatism: (a) rather than focussing on the full MIMO model, one would like to be able to focus on consistently estimating the target module only, and (b) structural information, such as structural zero elements in the interconnection structure or known subsystems, should be taken into account. In this paper relaxed conditions for data-informativity are derived addressing these two issues, leading to relaxed path-based conditions on the network graph. This leads to experimental conditions that are less strict, i.e. require a smaller number of external excitation signals. Additionally, the new expressions for data-informativity in identification are shown to be closely related to earlier derived conditions for (generic) single module identifiability.
comment: 17 pages, 5 figures
♻ ☆ FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities
Effective traffic signal control (TSC) is crucial in mitigating urban congestion and reducing emissions. Recently, reinforcement learning (RL) has been the research trend for TSC. However, existing RL algorithms face several real-world challenges that hinder their practical deployment in TSC: (1) Sensor accuracy deteriorates with increased sensor detection range, and data transmission is prone to noise, potentially resulting in unsafe TSC decisions. (2) During the training of online RL, interactions with the environment could be unstable, potentially leading to inappropriate traffic signal phase (TSP) selection and traffic congestion. (3) Most current TSC algorithms focus only on TSP decisions, overlooking the critical aspect of phase duration, affecting safety and efficiency. To overcome these challenges, we propose a robust two-stage fuzzy approach called FuzzyLight, which integrates compressed sensing and RL for TSC deployment. FuzzyLight offers several key contributions: (1) It employs fuzzy logic and compressed sensing to address sensor noise and enhances the efficiency of TSP decisions. (2) It maintains stable performance during training and combines fuzzy logic with RL to generate precise phases. (3) It works in real cities across 22 intersections and demonstrates superior performance in both real-world and simulated environments. Experimental results indicate that FuzzyLight enhances traffic efficiency by 48% compared to expert-designed timings in the real world. Furthermore, it achieves state-of-the-art (SOTA) performance in simulated environments using six real-world datasets with transmission noise. The code and deployment video are available at the URL1
♻ ☆ Autonomous Close-Proximity Photovoltaic Panel Coating Using a Quadcopter
Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti-reflective and self-cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter-based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual-inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model-based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.
comment: 7 pages, 10 figures. Submitted to IEEE RA-L
Optimization and Control 22
☆ ADAPT: Lightweight, Long-Range Machine Learning Force Fields Without Graphs
Point defects play a central role in driving the properties of materials. First-principles methods are widely used to compute defect energetics and structures, including at scale for high-throughput defect databases. However, these methods are computationally expensive, making machine-learning force fields (MLFFs) an attractive alternative for accelerating structural relaxations. Most existing MLFFs are based on graph neural networks (GNNs), which can suffer from oversmoothing and poor representation of long-range interactions. Both of these issues are especially of concern when modeling point defects. To address these challenges, we introduce the Accelerated Deep Atomic Potential Transformer (ADAPT), an MLFF that replaces graph representations with a direct coordinates-in-space formulation and explicitly considers all pairwise atomic interactions. Atoms are treated as tokens, with a Transformer encoder modeling their interactions. Applied to a dataset of silicon point defects, ADAPT achieves a roughly 33 percent reduction in both force and energy prediction errors relative to a state-of-the-art GNN-based model, while requiring only a fraction of the computational cost.
comment: 14 total pages of main content, 4 of references, 3 in Appendix
☆ Computing Invariant Zeros of a MIMO Linear System Using State-Space Realization
Poles of a multi-input multi-output (MIMO) linear system can be computed by solving an eigenvalue problem; however, the problem of computing its invariant zeros is equivalent to a generalized eigenvalue problem. This paper revisits the problem of computing the invariant zeros by solving an eigenvalue problem. We introduce a realization called the invariant zero form in which the system's invariant zeros are isolated in a partition of the transformed dynamics matrix. It is shown that the invariant zeros are then the eigenvalues of a partition of the transformed dynamics matrix. Although the paper's main result is proved only for square MIMO systems, the technique can be heuristically extended to nonsquare MIMO systems, as shown in the numerical examples.
☆ Zeroth-Order Constrained Optimization from a Control Perspective via Feedback Linearization
Designing safe derivative-free optimization algorithms under unknown constraints is a fundamental challenge in modern learning and control. Most existing zeroth-order (ZO) approaches typically assume white-box constraints or focus on convex settings, leaving the general case of nonconvex optimization with black-box constraints largely open. We propose a control-theoretic framework for ZO constrained optimization that enforces feasibility without relying on solving costly convex subproblems. Leveraging feedback linearization, we introduce a family of ZO feedback linearization (ZOFL) algorithms applicable to both equality and inequality constraints. Our method requires only noisy, sample-based gradient estimates yet provably guarantees constraint satisfaction under mild regularity conditions. We establish finite-time bounds on constraint violation and further present a midpoint discretization variant that further improves feasibility without sacrificing optimality. Empirical results demonstrate that ZOFL consistently outperforms standard ZO baselines, achieving competitive objective values while maintaining feasibility.
☆ Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
☆ Mixed-Derivative Total Variation
The formulation of norms on continuous-domain Banach spaces with exact pixel-based discretization is advantageous for solving inverse problems (IPs). In this paper, we investigate a new regularization that is a convex combination of a TV term and the $\M(\R^2)$ norm of mixed derivatives. We show that the extreme points of the corresponding unit ball are indicator functions of polygons whose edges are aligned with either the $x_1$- or $x_2$-axis. We then apply this result to construct a new regularization for IPs, which can be discretized exactly by tensor products of first-order B-splines, or equivalently, pixels. Furthermore, we exactly discretize the loss of the denoising problem on its canonical pixel basis and prove that it admits a unique solution, which is also a solution to the underlying continuous-domain IP.
☆ Equation-Free Coarse Control of Distributed Parameter Systems via Local Neural Operators
The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse steady and unsteady-states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop.
comment: 8 pages, 2 figures
☆ Observability of Schrödinger propagators on tori in rough settings
On a torus of arbitrary dimension, it is conjectured that Schr\"odinger propagators with bounded potentials are observable from any space-time set of positive Lebesgue measure. We establish a criterion that proves the conjecture on the one-dimensional torus, yields new examples of measurable observation sets, and, more significantly, reduces the general conjecture to verifying certain integrability bounds for free Schr\"odinger waves. These bounds are far weaker than Bourgain's conjectured periodic Strichartz estimates, yet remain highly nontrivial.
☆ Efficient Douglas-Rachford Methods on Hadamard Manifolds with Applications to the Heron Problems
Our interest lies in developing some efficient methods for minimizing the sum of two geodesically convex functions on Hadamard manifolds, with the aim to enhance the convergence of the Douglas-Rachford algorithm in Hadamard manifolds. Specifically, we propose two types of algorithms: inertial and non-inertial algorithms. The convergence analysis of both algorithms is provided under suitable assumptions on algorithmic parameters and the geodesic convexity of the objective functions. This convergence analysis is based on fixed-point theory for nonexpansive operators. We also study the convergence rates of these two methods. Additionally, we introduce parallel Douglas-Rachford type algorithms for minimizing functionals containing multiple summands with applications to the generalized Heron problem on Hadamard manifolds. To demonstrate the effectiveness of the proposed algorithms, we present some numerical experiments for the generalized Heron problems.
☆ On the Relationships among GPU-Accelerated First-Order Methods for Solving Linear Programming
This paper aims to understand the relationships among recently developed GPU-accelerated first-order methods (FOMs) for linear programming (LP), with particular emphasis on HPR-LP -- a Halpern Peaceman--Rachford (HPR) method for LP. Our findings can be summarized as follows: (i) the base algorithm of cuPDLPx, a recently released GPU solver, is a special case of the base algorithm of HPR-LP, thereby showing that cuPDLPx is another concrete implementation instance of HPR-LP; (ii) once the active sets have been identified, HPR-LP and EPR-LP -- an ergodic PR method for LP -- become equivalent under the same initialization; and (iii) extensive numerical experiments on benchmark datasets demonstrate that HPR-LP achieves the best overall performance among current GPU-accelerated LP solvers. These findings provide a strong motivation for using the HPR method as a baseline to further develop GPU-accelerated LP solvers and beyond.
☆ Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Catastrophic forgetting remains a critical challenge in continual learning for large language models (LLMs), where models struggle to retain performance on historical tasks when fine-tuning on new sequential data without access to past datasets. In this paper, we first reveal that the drift of functional directions during the fine-tuning process is a key reason why existing regularization-based methods fail in long-term LLM continual learning. To address this, we propose Dynamic Orthogonal Continual (DOC) fine-tuning, a novel approach that tracks the drift of these functional directions and dynamically updates them during the fine-tuning process. Furthermore, by adjusting the gradients of new task parameters to be orthogonal to the tracked historical function directions, our method mitigates interference between new and old tasks. Extensive experiments on various LLM continual learning benchmarks demonstrate that this approach outperforms prior methods, effectively reducing catastrophic forgetting and providing a robust tool for continuous LLM fine-tuning. Our code is available at https://github.com/meloxxxxxx/DOC.
☆ Best weighted approximation of some kernels on the real axis
We calculate the exact value and find the polynomial of the best weighted polynomial approximation of kernels of the form $\frac {A+Bt}{(t^2+\lambda^2)^{s+1}}$, where $A$ and $B$ are fixed complex numbers, $\lambda>0$, $s\in {\mathbb N}$, in the mean square metric.
☆ Stochastic Origin Frank-Wolfe for traffic assignment
In this paper, we present the Stochastic Origin Frank-Wolfe (SOFW) method, which is a special case of the block-coordinate Frank-Wolfe algorithm, applied to the problem of finding equilibrium flow distributions. By significantly reducing the computational complexity of the minimization oracle, the method improves overall efficiency at the cost of increased memory consumption. Its key advantage lies in minimizing the number of shortest path computations. We refer to existing theoretical convergence guarantees for generalized coordinate Frank-Wolfe methods and, in addition, extend the analysis by providing a convergence proof for a batched version of the Block-Coordinate Frank-Wolfe algorithm, which was not covered in the original work. We also demonstrate the practical effectiveness of our approach through experimental results. In particular, our findings show that the proposed method significantly outperforms the classical Frank-Wolfe algorithm and its variants on large-scale datasets. On smaller datasets, SOFW also remains effective, though the performance gap relative to classical methods becomes less pronounced. In such cases, there is a trade-off between solution quality, iteration time complexity, and memory usage.
☆ Weak and strong convergence of a relaxed inertial proximal splitting algorithm for solving hierarchical equilibrium problems
In this chapter, we introduce the relaxed inertial proximal splitting algorithm (RIPSA) for hierarchical equilibrium problems. Using Opial-Passty's lemma, we first establish weak ergodic and weak convergence of the sequence generated by the algorithm to a solution of the problem, in the absence of the Browder-Halpern contraction factor. We then derive a strong convergence result under an additional strong monotonicity assumption. Subsequently, we relax this requirement by removing strong monotonicity and instead incorporating a Browder-Halpern contraction factor into (RIPSA), which guarantees strong convergence to a solution determined by the contraction factor. Finally, we discuss two related settings: convex minimization problems and monotone variational inequalities formulated as fixed-point problems for nonexpansive operators.
comment: 33 pages
☆ Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
The theory of discrete-time reinforcement learning (RL) has advanced rapidly over the past decades. Although primarily designed for discrete environments, many real-world RL applications are inherently continuous and complex. A major challenge in extending discrete-time algorithms to continuous-time settings is their sensitivity to time discretization, often leading to poor stability and slow convergence. In this paper, we investigate deterministic policy gradient methods for continuous-time RL. We derive a continuous-time policy gradient formula based on an analogue of the advantage function and establish its martingale characterization. This theoretical foundation leads to our proposed algorithm, CT-DDPG, which enables stable learning with deterministic policies in continuous-time environments. Numerical experiments show that the proposed CT-DDPG algorithm offers improved stability and faster convergence compared to existing discrete-time and continuous-time methods, across a wide range of control tasks with varying time discretizations and noise levels.
☆ On Computing the Copositive Minimum and its Representatives
Computing the copositive minimum of a strictly copositive quadratic form is a natural generalization of computing the arithmetical minimum of a positive definite one. In this paper we show that this generalized problem is NP-complete. Moreover, we describe a practical method to calculate all shortest vectors using the LDLT-decomposition in a big class of special cases. Our numerical tests show that our method performs significantly better than previous approaches.
comment: 19 pages, 2 figures
☆ Hybrid Powertrain Optimization for Regional Aircraft Integrating Hydrogen Fuel Cells and Aluminum Air Batteries
With the increasing demand for air travel and the urgency to reduce emissions, transitioning from fossil fuel-based propulsion systems is a critical step toward sustainable aviation. While batteries are widely used in urban air mobility, their long charging durations limit their feasibility for consecutive flights. Hybrid propulsion systems, which integrate fuel cells and batteries, offer a promising alternative due to their higher energy density and improved efficiency. This paper presents a novel hybrid powertrain architecture for regional aircraft, incorporating a hydrogen fuel cell, a lithium-ion battery, and an auxiliary aluminum-air battery. The proposed system is evaluated using real-world power demand data from a Cessna 208 aircraft. The hydrogen fuel cell acts as the primary power source, ensuring continuous operation, while the lithium-ion battery manages transient power fluctuations to enhance system stability. The aluminum-air battery is introduced as a high-energy emergency backup, providing extended endurance during critical situations. A mixed-integer optimization model is formulated for system sizing and power scheduling, ensuring optimal energy distribution among the power sources. Multiple operational scenarios are analyzed to evaluate system performance, particularly under emergency conditions, where power reliability is crucial. The results highlight the feasibility and effectiveness of the proposed hybrid architecture in improving energy efficiency and flight safety for regional aircraft applications.
♻ ☆ Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
This study explores the performance of the random Gaussian smoothing Zeroth-Order ExtraGradient (ZO-EG) scheme considering \Af{deterministic} min-max optimisation problems with possibly NonConvex-NonConcave (NC-NC) objective functions. We consider both unconstrained and constrained, differentiable and non-differentiable settings. We discuss the min-max problem from the point of view of variational inequalities. For the unconstrained problem, we establish the convergence of the ZO-EG algorithm to the neighbourhood of an $\epsilon$-stationary point of the NC-NC objective function, whose radius can be controlled under a variance reduction scheme, along with its complexity. For the constrained problem, we introduce the new notion of proximal variational inequalities and give examples of functions satisfying this property. Moreover, we prove analogous results to the unconstrained case for the constrained problem. For the non-differentiable case, we prove the convergence of the ZO-EG algorithm to a neighbourhood of an $\epsilon$-stationary point of the smoothed version of the objective function, where the radius of the neighbourhood can be controlled, which can be related to the ($\delta,\epsilon$)-Goldstein stationary point of the original objective function.
♻ ☆ Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We first present a randomized projected Langevin dynamics algorithm that solves the robust policy evaluation problem to global optimality but is inefficient. We also propose a deterministic policy gradient method that is efficient but solves the robust policy evaluation problem only approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Finally, we describe an actor-critic algorithm that finds an $\epsilon$-optimal solution for the robust policy improvement problem in $\mathcal{O}(1/\epsilon^4)$ iterations. We thus present the first complete solution scheme for robust MDPs with non-rectangular uncertainty sets offering global optimality guarantees. Numerical experiments show that our algorithms compare favorably against state-of-the-art methods.
comment: comments are welcome
♻ ☆ Safety-Critical Control with Guaranteed Lipschitz Continuity via Filtered Control Barrier Functions
In safety-critical control systems, ensuring both system safety and smooth control input is essential for practical deployment. Existing Control Barrier Function (CBF) frameworks, especially High-Order CBFs (HOCBFs), effectively enforce safety constraints, but also raise concerns about the smoothness of the resulting control inputs. While smoothness typically refers to continuity and differentiability, it does not by itself ensure bounded input variation. In contrast, Lipschitz continuity is a stronger form of continuity that not only is necessary for the theoretical guarantee of safety, but also bounds the rate of variation and eliminates abrupt changes in the control input. Such abrupt changes can degrade system performance or even violate actuator limitations, yet current CBF-based methods do not provide Lipschitz continuity guarantees. This paper introduces Filtered Control Barrier Functions (FCBFs), which extend HOCBFs by incorporating an auxiliary dynamic system-referred to as an input regularization filter-to produce Lipschitz continuous control inputs. The proposed framework ensures safety, control bounds, and Lipschitz continuity of the control inputs simultaneously by integrating FCBFs and HOCBFs within a unified quadratic program (QP). Theoretical guarantees are provided and simulations on a unicycle model demonstrate the effectiveness of the proposed method compared to standard and smoothness-penalized HOCBF approaches.
comment: 8 pages, 4 figures
♻ ☆ Gromov's Approximating Tree and the All-Pairs Bottleneck Paths Problem
Given a pointed metric space $(X,\mathsf{dist}, w)$ on $n$ points, its Gromov's approximating tree is a 0-hyperbolic pseudo-metric space $(X,\mathsf{dist}_T)$ such that $\mathsf{dist}(x,w)=\mathsf{dist}_T(x,w)$ and $\mathsf{dist}(x, y)-2 \delta \log_2n \leq \mathsf{dist}_T (x, y) \leq \mathsf{dist}(x, y)$ for all $x, y \in X$ where $\delta$ is the Gromov hyperbolicity of $X$. On the other hand, the all pairs bottleneck paths (APBP) problem asks, given an undirected graph with some capacities on its edges, to find the maximal path capacity between each pair of vertices. In this note, we prove: $\bullet$ Computing Gromov's approximating tree for a metric space with $n+1$ points from its matrix of distances reduces to solving the APBP problem on an connected graph with $n$ vertices. $\bullet$ There is an explicit algorithm that computes Gromov's approximating tree for a graph from its adjacency matrix in quadratic time. $\bullet$ Solving the APBP problem on a weighted graph with $n$ vertices reduces to finding Gromov's approximating tree for a metric space with $n+1$ points from its distance matrix.
comment: Version 2. Minor corrections
♻ ☆ Finite-Time Bounds for Two-Time-Scale Stochastic Approximation with Arbitrary Norm Contractions and Markovian Noise
Two-time-scale Stochastic Approximation (SA) is an iterative algorithm with applications in reinforcement learning and optimization. Prior finite time analysis of such algorithms has focused on fixed point iterations with mappings contractive under Euclidean norm. Motivated by applications in reinforcement learning, we give the first mean square bound on non linear two-time-scale SA where the iterations have arbitrary norm contractive mappings and Markovian noise. We show that the mean square error decays at a rate of $O(1/n^{2/3})$ in the general case, and at a rate of $O(1/n)$ in a special case where the slower timescale is noiseless. Our analysis uses the generalized Moreau envelope to handle the arbitrary norm contractions and solutions of Poisson equation to deal with the Markovian noise. By analyzing the SSP Q-Learning algorithm, we give the first $O(1/n)$ bound for an algorithm for asynchronous control of MDPs under the average reward criterion. We also obtain a rate of $O(1/n)$ for Q-Learning with Polyak-averaging and provide an algorithm for learning Generalized Nash Equilibrium (GNE) for strongly monotone games which converges at a rate of $O(1/n^{2/3})$.
comment: To be presented at IEEE Conference on Decision and Control (CDC) 2025
♻ ☆ The Aubin Property for Generalized Equations over $C^2$-cone Reducible Sets
This paper establishes the equivalence of the Aubin property and the strong regularity for generalized equations over $C^2$-cone reducible sets. This result resolves a long-standing question in variational analysis and extends the well-known equivalence theorem for polyhedral sets to a significantly broader class of non-polyhedral cases. Our proof strategy departs from traditional variational techniques, integrating insights from convex geometry with powerful tools from algebraic topology. A cornerstone of our analysis is a new fundamental lemma concerning the local structure of the normal cone map for arbitrary closed convex sets, which reveals how the dimension of normal cones varies in the neighborhood of a boundary point. This geometric insight is the key to applying degree theory, allowing us to prove that a crucial function associated with the problem has a topological index of $\pm1$. This, via a homological version of the inverse mapping theorem, implies that the function is a local homeomorphism, which in turn yields the strong regularity of the original solution map. This result unifies and extends several existing stability results for problems such as conventional nonlinear programming, nonlinear second-order cone programming, and nonlinear semidefinite programming under a single general framework.
comment: v3: Minor corrections and typo fixes. (Note: v2 was a copy of v1 submitted in error)
Systems and Control 19
☆ Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
☆ Robust Orientation Estimation with TRIAD-aided Manifold EKF
The manifold extended Kalman filter (Manifold EKF) has found extensive application for attitude determination. Magnetometers employed as sensors for such attitude determination are easily prone to disturbances by their sensitivity to calibration and external magnetic fields. The TRIAD (Tri-Axial Attitude Determination) algorithm is well known as a sub-optimal attitude estimator. In this article, we incorporate this sub-optimal feature of the TRIAD in mitigating the influence of the magnetometer reading in the pitch and roll axis determination in the Manifold EKF algorithm. We substantiate our results with experiments.
☆ Optimizing the Network Topology of a Linear Reservoir Computer
Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of a reservoir computer (RC) is generated at random, lacking a principled design. Here, we focus on optimizing the topology of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both the training and testing phases and also often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.
☆ Efficient Norm-Based Reachable Sets via Iterative Dynamic Programming
In this work, we present a numerical optimal control framework for reachable set computation using \emph{normotopes}, a new set representation as a norm ball with a shaping matrix. In reachable set computations, we expect to continuously vary the shape matrix as a function of time. Incorporating the shape dynamics as an input, we build a \emph{controlled embedding system} using a linear differential inclusion overapproximating the dynamics of the system, where a single forward simulation of this embedding system always provides an overapproximating reachable set of the system, no matter the choice of \emph{hypercontrol}. By iteratively solving a linear quadratic approximation of the nonlinear optimal hypercontrol problem, we synthesize less conservative final reachable sets, providing a natural tradeoff between runtime and accuracy. Terminating our algorithm at any point always returns a valid reachable set overapproximation.
☆ Sensitivity Analysis for Antenna Devices through Interval Arithmetic -- A Generalized Approach
This paper presents a novel method for the sensitivity analysis of electromagnetic (EM) systems whose transfer function (TF), that is the input-output (I/O) relationship between the input parameters affected by tolerance and the system response (i.e., the arising EM performance of interest), is not available in closed (or explicit) form. The method is a generalized analytic technique based on the Interval Analysis (IA). First, an analytic surrogate model (SM) of the TF is defined by means of a learning-by-example (LBE) approach starting from a set of available I/O examples. Then, the LBE-derived SM is extended to intervals through IA to yield inclusive, yet finite, performance bounds of the output system response when the control parameters are affected by unknown, but bounded, tolerances. A set of representative numerical examples is reported to validate the proposed IA-LBE method as well as to assess its effectiveness and reliability when dealing with realistic EM systems (e.g., antennas) for which the TF is not explicitly known.
☆ Dual-Function Beam Pattern Design for Multi-Target ISAC Systems: A Decoupled Approach
We investigate the beampattern design problem for mono-static multi-user (MU) multi-point-target integrated sensing and communication (ISAC) systems, where a dual-function multiple-input multiple-output (DF-MIMO) base station (BS) performs downlink communication and radar sensing simultaneously. In ISAC systems, sensing and communication inherently compete for resources. As communication demand increases, the beam pattern is reshaped, which might degrade the direction of arrival (DoA) sensing accuracy, measured in terms of mean-squared error (MSE) and lower-bounded by the Cramer-Rao lower bound (CRLB). Since conventional joint formulations of the sensing-based problem often overlook this trade-off, our work addresses it by decomposing the sensing-based problem into two subproblems (SPs). This decomposition enables a more effective exploitation of the beam pattern's physical properties, which we refer to as the Sensing-Guided Communication Dual-Function (SGCDF) beam pattern design. We further develop a low-complexity extension using the Riemannian Manifold Optimization (RMO) and convex closed-set projection. Simulation results confirm that the proposed method improves multi-target estimation accuracy, compared to traditional joint optimization strategies, by preserving the beam pattern, while the low-complexity version offers an excellent performance-complexity tradeoff, maintaining high accuracy with significantly reduced computational cost.
comment: 13 pages; 7 figures; tables; 3 tables; manuscript submitted to IEEE journal
☆ Incorporating flexibility and resilience demand into capacity market considering the guidance on generation investment
The capacity market provides economic guidance for generation investment and ensures the adequacy of generation capability for power systems. With the rapidly increasing proportion of renewable energy, the adequacy of flexibility and resilience becomes more crucial for the secure operation of power systems. In this context, this paper incorporates the flexibility and resilience demand into the capacity market by formulating the capacity demand curves for ramping capability, inertia and recovery capabilities besides the generation capability. The guidance on generation investment of the capacity market is also taken into account by solving the generation investment equilibrium among generation companies with a Nash Cournot model employing an equivalent quadratic programming formulation. The overall problem is established as a trilevel game and an iterative algorithm is devised to formulate the capacity demand curves in the upper level based on Genco's investment acquired from the middle and lower levels. The case study further demonstrates that to incorporate flexibility and resilience demand into the capacity market could stimulate proper generation investment and ensure the adequacy of flexibility and resilience in power systems.
☆ Leave No Observation Behind: Real-time Correction for VLA Action Chunks
To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
☆ SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion
Quadruped robots are designed to achieve agile locomotion by mimicking legged animals. However, existing control methods for quadrupeds often lack one of the key capabilities observed in animals: adaptive and adjustable compliance in response to external disturbances. Most locomotion controllers do not provide tunable compliance and tend to fail under large perturbations. In this work, we propose a switched policy framework for compliant and safe quadruped locomotion. First, we train a force compliant policy with adjustable compliance levels using a teacher student reinforcement learning framework, eliminating the need for explicit force sensing. Next, we develop a safe policy based on the capture point concept to stabilize the robot when the compliant policy fails. Finally, we introduce a recoverability network that predicts the likelihood of failure and switches between the compliant and safe policies. Together, this framework enables quadruped robots to achieve both force compliance and robust safety when subjected to severe external disturbances.
☆ Markov Modeling for Licensed and Unlicensed Band Allocation in Underlay and Overlay D2D
In this paper, a novel analytical model for resource allocation is proposed for a device-to-device (D2D) assisted cellular network. The proposed model can be applied to underlay and overlay D2D systems for sharing licensed bands and offloading cellular traffic. The developed model also takes into account the problem of unlicensed band sharing with Wi-Fi systems. In the proposed model, a global system state reflects the interaction among D2D, conventional cellular, and Wi-Fi packets. Under the standard traffic model assumptions, a threshold-based flow control is proposed for guaranteeing the quality-of-service (QoS) of Wi-Fi. The packet blockage probability is then derived. Simulation results show the proposed scheme sacrifices conventional cellular performance slightly to improve overlay D2D performance significantly while maintaining the performance for Wi-Fi users. Meanwhile, the proposed scheme has more flexible adjustments between D2D and Wi-Fi than the underlay scheme.
comment: 10 pages, 5 figures, published in 2024 IEEE ICC
☆ Modeling the Unlicensed Band Allocation for LAA With Buffering Mechanism
In this letter, we propose an analytical model and conduct simulation experiments to study listen-before-talk-based unlicensed band allocation with the buffering mechanism for the License-Assisted Access (LAA) packets in the heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue, which may affect the quality of service for both systems significantly. We evaluate the performance of these unlicensed band allocations in terms of the acceptance rate of both LAA and Wi-Fi packets. This letter provides the guidelines for designing the channel occupation phase and buffer threshold of the LAA systems.
comment: 5 pages, 3 figures, 2 tables, published in IEEE Communications Letters
☆ Unlicensed Band Allocation for Heterogeneous Networks
Based on the License-Assisted Access (LAA) small cell architecture, the LAA coexisting with Wi-Fi heterogeneous networks provides LTE mobile users with high bandwidth efficiency as the unlicensed channels are shared among LAA and Wi-Fi. However, LAA and Wi-Fi interfere with each other when both systems use the same unlicensed channel in heterogeneous networks. In such a network, unlicensed band allocation for LAA and Wi-Fi is an important issue that may affect the quality of service (QoS) of both systems significantly. In this paper, we propose an analytical model and conduct simulation experiments to study four allocations for the unlicensed band: unlicensed full allocation (UFA), unlicensed time-division allocation (UTA), and UFA/UTA with buffering mechanism (UFAB and UTAB) for the LAA data packets. We evaluate the performance of these unlicensed band allocation schemes in terms of the acceptance rate of both LAA and Wi-Fi packet data in the LAA buffer queue. Our study provides guidelines for designing the channel occupation phase and the buffer size of the LAA small cell.
comment: 14 pages, 12 figures, 1 table, published in IEICE Transactions on Communications
♻ ☆ Optimality of Linear Policies in Distributionally Robust Linear Quadratic Control
We study a generalization of the classical discrete-time, Linear-Quadratic-Gaussian (LQG) control problem where the noise distributions affecting the states and observations are unknown and chosen adversarially from divergence-based ambiguity sets centered around a known nominal distribution. For a finite horizon model with Gaussian nominal noise and a structural assumption on the divergence that is satisfied by many examples -- including 2-Wasserstein distance, Kullback-Leibler divergence, moment-based divergences, entropy-regularized optimal transport, or Fisher (score-matching) divergence -- we prove that a control policy that is affine in the observations is optimal and the adversary's corresponding worst-case optimal distribution is Gaussian. When the nominal means are zero (as in the classical LQG model), we show that the adversary should optimally set the distribution's mean to zero and the optimal control policy becomes linear. Moreover, the adversary should optimally ``inflate" the noise by choosing covariance matrices that dominate the nominal covariance in Loewner order. Exploiting these structural properties, we develop a Frank-Wolfe algorithm whose inner step solves standard LQG subproblems via Kalman filtering and dynamic programming and show that the implementation consistently outperforms semidefinite-programming reformulations of the problem. All structural and algorithmic results extend to an infinite-horizon, average-cost formulation, yielding stationary linear policies and a time-invariant Gaussian distribution for the adversary. Lastly, we show that when the divergence is 2-Wasserstein, the entire framework remains valid when the nominal distributions are elliptical rather than Gaussian.
♻ ☆ Fractional Logistic Growth with Memory Effects: A Tool for Industry-Oriented Modeling
The logistic growth model is a classical framework for describing constrained growth phenomena, widely applied in areas such as population dynamics, epidemiology, and resource management. This study presents a generalized extension using Atangana-Baleanu in Caputo sense (ABC)-type fractional derivatives. Proportional time delay is also included, allowing the model to capture memory-dependent and nonlocal dynamics not addressed in classical formulations. Free parameters provide flexibility for modeling complex growth in industrial, medical, and social systems. The Hybrid Sumudu Variational (HSV) method is employed to efficiently obtain semi-analytical solutions. Results highlight the combined effects of fractional order and delay on system behavior. This approach demonstrates the novelty of integrating ABC-type derivatives, proportional delay, and HSV-based solutions for real-world applications.
comment: 15
♻ ☆ Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
♻ ☆ Koopman-based control using sum-of-squares optimization: Improved stability guarantees and data efficiency
In this paper, we propose a novel controller design approach for unknown nonlinear systems using the Koopman operator. In particular, we use the recently proposed stability- and feedback-oriented extended dynamic mode decomposition (SafEDMD) architecture to generate a data-driven bilinear surrogate model with certified error bounds. Then, by accounting for the obtained error bounds in a controller design based on the bilinear system, one can guarantee closed-loop stability for the true nonlinear system. While existing approaches over-approximate the bilinearity of the surrogate model, thus introducing conservatism and providing only local guarantees, we explicitly account for the bilinearity by using sum-of-squares (SOS) optimization in the controller design. More precisely, we parametrize a rational controller stabilizing the error-affected bilinear surrogate model and, consequently, the underlying nonlinear system. The resulting SOS optimization problem provides explicit data-driven controller design conditions for unknown nonlinear systems based on semidefinite programming. Our approach significantly reduces conservatism by establishing a larger region of attraction and improved data efficiency. The proposed method is evaluated using numerical examples, demonstrating its advantages over existing approaches.
comment: Accepted for publication in the European Journal of Control, 2025
♻ ☆ A Hybrid TDMA/CSMA Protocol for Time-Sensitive Traffic in Robot Applications
Recent progress in robotics has underscored the demand for real-time control in applications such as manufacturing, healthcare, and autonomous systems, where the timely delivery of mission-critical commands under heterogeneous robotic traffic is paramount for operational efficacy and safety. In these scenarios, mission-critical traffic follows a strict deadline-constrained communication pattern: commands must arrive within defined QoS deadlines, otherwise late arrivals can degrade performance or destabilize control loops.In this work, we demonstrate on a real-time SDR platform that CSMA, widely adopted in robotic communications,suffers severe degradation under high robot traffic loads, with contention-induced collisions and delays disrupting the on-time arrival of mission-critical packets. To address this problem, we propose an IEEE 802.11-compatible hybrid TDMA/CSMA protocol that combines TDMA's deterministic slot scheduling with CSMA's adaptability for heterogeneous robot traffic.The protocol achieves collision-free, low-latency mission-critical command delivery and IEEE 802.11 compatibility through the synergistic integration of sub-microsecond PTP-based slot synchronization-essential for establishing precise timing for TDMA, a three-session superframe with dynamic TDMA allocation for structured and adaptable traffic management,and beacon-NAV protection to preemptively secure these critical communication sessions from interference. Emulation experiments on real-time SDR testbed and Robot Operating System (ROS) simulation show that the proposed protocol reduces missed-deadline errors by 93% compared to the CSMA baseline. In high-speed robot path-tracking ROS simulations, the protocol lowers Root Mean Square (RMS) trajectory error by up to 90% compared with a CSMA baseline, all while maintaining throughput for non-critical traffic within +-2%.
♻ ☆ Rapidly Converging Time-Discounted Ergodicity on Graphs for Active Inspection of Confined Spaces
Ergodic exploration has spawned a lot of interest in mobile robotics due to its ability to design time trajectories that match desired spatial coverage statistics. However, current ergodic approaches are for continuous spaces, which require detailed sensory information at each point and can lead to fractal-like trajectories that cannot be tracked easily. This paper presents a new ergodic approach for graph-based discretization of continuous spaces. It also introduces a new time-discounted ergodicity metric, wherein early visitations of information-rich nodes are weighted more than late visitations. A Markov chain synthesized using a convex program is shown to converge more rapidly to time-discounted ergodicity than the traditional fastest mixing Markov chain. The resultant ergodic traversal method is used within a hierarchical framework for active inspection of confined spaces with the goal of detecting anomalies robustly using SLAM-driven Bayesian hypothesis testing. Experiments on a ground robot show the advantages of this framework over three continuous space ergodic planners as well as greedy and random exploration methods for left-behind foreign object debris detection in a ballast tank.
♻ ☆ Frequency-Domain Bounds for the Multiconductor Telegrapher's Equation
We establish mathematical bounds on the chain, ABCD and immittance matrices of a multiconductor transmission line, based on the Telegrapher's equation. Closed-form expressions for those matrices are also presented. Existing results that hold on the imaginary axis are extended to the complex plane, without reliance on a diagonalizability assumption that is prevalent in the literature. Therefore, the results remain valid under arbitrary arrangements of the conducting wires, which include electrical faults. The analysis ultimately reveals important system-theoretic implications of the Telegrapher's equation, which are of general relevance to control, power systems, and signal processing involving multiconductor transmission lines.
comment: First revision: 10 pages, 1 figure
Optimization and Control 29
☆ Network-Optimised Spiking Neural Network for Event-Driven Networking
Spiking neural networks offer event-driven computation suited to time-critical networking tasks such as anomaly detection, local routing control, and congestion management at the edge. Classical units, including Hodgkin-Huxley, Izhikevich, and the Random Neural Network, map poorly to these needs. We introduce Network-Optimised Spiking (NOS), a compact two-variable unit whose state encodes normalised queue occupancy and a recovery resource. The model uses a saturating nonlinearity to enforce finite buffers, a service-rate leak, and graph-local inputs with delays and optional per link gates. It supports two differentiable reset schemes for training and deployment. We give conditions for equilibrium existence and uniqueness, local stability tests from the Jacobian trace and determinant, and a network threshold that scales with the Perron eigenvalue of the coupling matrix. The analysis yields an operational rule g* ~ k* rho(W) linking damping and offered load, shows how saturation enlarges the stable region, and explains finite-size smoothing of synchrony onsets. Stochastic arrivals follow a Poisson shot-noise model aligned with telemetry smoothing. Against queueing baselines, NOS matches M/M/1 mean by calibration while truncating deep tails under bursty input. In closed loop it gives, low-jitte with short settling. In zero-shot, label-free forecasting NOS is calibrated per node from arrival statistics. Its NOS dynamics yield high AUROC/AUPRC, enabling timely detection of congestion onsets with few false positives. Under a train-calibrated residual protocol across chain, star, and scale-free topologies, NOS improves early-warning F1 and detection latency over MLP, RNN, GRU, and tGNN. We provide guidance for data-driven initialisation, surrogate-gradient training with a homotopy on reset sharpness, and explicit stability checks with topology-aware bounds for resource constrained deployments.
comment: 52 pages, 16 figures, 9 tables
☆ Bounds for the Permutation Flowshop Scheduling Problem: New Framework and Theoretical Insights
In this work, we use the matrix formulation of the Permutation Flowshop Scheduling Problem with makespan minimization to derive an upper bound and a general framework for obtaining lower bounds. The proposed framework involves solving a min-max or max-min expression over a set of paths. We introduce a family of such path sets for which the min-max expression can be solved in polynomial time under certain bounded parameters. To validate the proposed approach, we test it on the Taillard and VRF benchmark instances, the two most widely used datasets in PFSP research. Our method improves the bounds in $112$ out of the $120$ Taillard instances and $430$ out of the $480$ VRF instances. These improvements include both small and large instances, highlighting the scalability of the proposed methodology. Additionally, the upper bound is used to give a more accurate estimate of the number of possible makespan values for a given instance and to present asymptotic results which provide advances in a conjecture given by Taillard related to the quality of one of the most popular lower bounds, as well as the asymptotic approximation ratio of any algorithm.
☆ Distributionally robust LMI synthesis for LTI systems
This article shows that distributionally robust controller synthesis as investigated in \cite{taskesen2024distributionally} can be formulated as a convex linear matrix inequality (LMI) synthesis problem. To this end, we rely on well-established convexification techniques from robust control. The LMI synthesis problem we propose has the advantage that it can be solved efficiently using off-the-shelf semi-definite programming (SDP) solvers. In addition, our formulation exposes the studied distributionally robust controller synthesis problem as an instance of robust $H_2$ synthesis.
☆ Penalized Weighted Trace Minimization for Optimal Control Device Design and Placement
In this paper, we present a new analytical framework for determining the well-posedness of constrained optimization problems that arise in the study of optimal control device design and placement within the context of infinite dimensional linear quadratic control systems. We first prove the well-posedness of the newly minted "strong form" of the time-independent operator-valued Riccati equation. This form of the equation then enables the use of trace-class operator analysis and the Lagrange multiplier formalism to analyze operator-valued Riccati equation-constrained optimization problems. Using this fundamental result, we then determine the conditions under which there exists unique solutions to two important classes of penalized trace minimization problems for optimal control device placement and design.
comment: 21 Pages, Draft 1
☆ New Insights and Algorithms for Optimal Diagonal Preconditioning
Preconditioning (scaling) is essential in many areas of mathematics, and in particular in optimization. In this work, we study the problem of finding an optimal diagonal preconditioner. We focus on minimizing two different notions of condition number: the classical, worst-case type, $\kappa$-condition number, and the more averaging motivated $\omega$-condition number. We provide affine based pseudoconvex reformulations of both optimization problems. The advantage of our formulations is that the gradient of the objective is inexpensive to compute and the optimization variable is just an $n\times 1$ vector. We also provide elegant characterizations of the optimality conditions of both problems. We develop a competitive subgradient method, with convergence guarantees, for $\kappa$-optimal diagonal preconditioning that scales much better and is more efficient than existing SDP-based approaches. We also show that the preconditioners found by our subgradient method leads to better PCG performance for solving linear systems than other approaches. Finally, we show the interesting phenomenon that we can apply the $\omega$-optimal preconditioner to the exact $\kappa$-optimally diagonally preconditioned matrix $A$ and get consistent, significantly improved convergence results for PCG methods.
☆ Efficient Norm-Based Reachable Sets via Iterative Dynamic Programming
In this work, we present a numerical optimal control framework for reachable set computation using \emph{normotopes}, a new set representation as a norm ball with a shaping matrix. In reachable set computations, we expect to continuously vary the shape matrix as a function of time. Incorporating the shape dynamics as an input, we build a \emph{controlled embedding system} using a linear differential inclusion overapproximating the dynamics of the system, where a single forward simulation of this embedding system always provides an overapproximating reachable set of the system, no matter the choice of \emph{hypercontrol}. By iteratively solving a linear quadratic approximation of the nonlinear optimal hypercontrol problem, we synthesize less conservative final reachable sets, providing a natural tradeoff between runtime and accuracy. Terminating our algorithm at any point always returns a valid reachable set overapproximation.
☆ Landing with the Score: Riemannian Optimization through Denoising
Under the data manifold hypothesis, high-dimensional data are concentrated near a low-dimensional manifold. We study the problem of Riemannian optimization over such manifolds when they are given only implicitly through the data distribution, and the standard manifold operations required by classical algorithms are unavailable. This formulation captures a broad class of data-driven design problems that are central to modern generative AI. Our key idea is to introduce a link function that connects the data distribution to the geometric operations needed for optimization. We show that this function enables the recovery of essential manifold operations, such as retraction and Riemannian gradient computation. Moreover, we establish a direct connection between our construction and the score function in diffusion models of the data distribution. This connection allows us to leverage well-studied parameterizations, efficient training procedures, and even pretrained score networks from the diffusion model literature to perform optimization. Building on this foundation, we propose two efficient inference-time algorithms -- Denoising Landing Flow (DLF) and Denoising Riemannian Gradient Descent (DRGD) -- and provide theoretical guarantees for both feasibility (approximate manifold adherence) and optimality (small Riemannian gradient norm). Finally, we demonstrate the effectiveness of our approach on finite-horizon reference tracking tasks in data-driven control, highlighting its potential for practical generative and design applications.
comment: 37 pages, 9 figures
☆ Continuous-Time Reinforcement Learning for Asset-Liability Management
This paper proposes a novel approach for Asset-Liability Management (ALM) by employing continuous-time Reinforcement Learning (RL) with a linear-quadratic (LQ) formulation that incorporates both interim and terminal objectives. We develop a model-free, policy gradient-based soft actor-critic algorithm tailored to ALM for dynamically synchronizing assets and liabilities. To ensure an effective balance between exploration and exploitation with minimal tuning, we introduce adaptive exploration for the actor and scheduled exploration for the critic. Our empirical study evaluates this approach against two enhanced traditional financial strategies, a model-based continuous-time RL method, and three state-of-the-art RL algorithms. Evaluated across 200 randomized market scenarios, our method achieves higher average rewards than all alternative strategies, with rapid initial gains and sustained superior performance. The outperformance stems not from complex neural networks or improved parameter estimation, but from directly learning the optimal ALM strategy without learning the environment.
comment: Accepted at the 6th ACM International Conference on AI in Finance (ICAIF 2025), 8 pages, 2 figures
☆ Local SGD and Federated Averaging Through the Lens of Time Complexity
We revisit the classical Local SGD and Federated Averaging (FedAvg) methods for distributed optimization and federated learning. While prior work has primarily focused on iteration complexity, we analyze these methods through the lens of time complexity, taking into account both computation and communication costs. Our analysis reveals that, despite its favorable iteration complexity, the time complexity of canonical Local SGD is provably worse than that of Minibatch SGD and Hero SGD (locally executed SGD). We introduce a corrected variant, Dual Local SGD, and further improve it by increasing the local step sizes, leading to a new method called Decaying Local SGD. Our analysis shows that these modifications, together with Hero SGD, are optimal in the nonconvex setting (up to logarithmic factors), closing the time complexity gap. Finally, we use these insights to improve the theory of a number of other asynchronous and local methods.
comment: 51 pages, 8 figures
☆ Conditional Risk Minimization with Side Information: A Tractable, Universal Optimal Transport Framework
Conditional risk minimization arises in high-stakes decisions where risk must be assessed in light of side information, such as stressed economic conditions, specific customer profiles, or other contextual covariates. Constructing reliable conditional distributions from limited data is notoriously difficult, motivating a series of optimal-transport-based proposals that address this uncertainty in a distributionally robust manner. Yet these approaches remain fragmented, each constrained by its own limitations: some rely on point estimates or restrictive structural assumptions, others apply only to narrow classes of risk measures, and their structural connections are unclear. We introduce a universal framework for distributionally robust conditional risk minimization, built on a novel union-ball formulation in optimal transport. This framework offers three key advantages: interpretability, by subsuming existing methods as special cases and revealing their deep structural links; tractability, by yielding convex reformulations for virtually all major risk functionals studied in the literature; and scalability, by supporting cutting-plane algorithms for large-scale conditional risk problems. Applications to portfolio optimization with rank-dependent expected utility highlight the practical effectiveness of the framework, with conditional models converging to optimal solutions where unconditional ones clearly do not.
☆ Cyber Risk Management and Mitigation Under Controlled Stochastic SIS Model
In this paper, we formulate cyber risk management and mitigation as a stochastic optimal control problem under a stochastic Susceptible-Infected-Susceptible (SIS) epidemic model. To capture the dynamics and interplay of management and mitigation strategies, we introduce two stochastic controls: (i) a proactive risk management control to reduce external cyber attacks and internal contagion effects, and (ii) a reactive mitigation control to accelerate system recovery from cyber infection. The interplay between these controls is modeled by minimizing the expected discounted running costs, which balance proactive management expenses against reactive mitigation expenditures. We derive the associated Hamilton-Jacobi-Bellman (HJB) equation and characterize the value function as its unique viscosity solution. For numerical implementation, we propose a Policy Improvement Algorithm (PIA) and prove its convergence via Backward Stochastic Differential Equations (BSDEs). Finally, we present numerical results through a benchmark example, suboptimal control analysis, sensitivity analysis, and comparative statics.
☆ On Optimal Markovian Couplings of Levy Processes
We study the optimal Markovian coupling problem for two Pi-valued Feller processes {X_t} and {Y_t}, which seeks a coupling process {(X_t, Y_t)} that minimizes the right derivative at t = 0 of the expected cost E^{(x,y)}[c(X_t, Y_t)], for all initial states (x,y) in Pi^2 and a given cost function c on Pi. This problem was first formulated and solved by Chen (1994) for drift-diffusion processes and later extended by Zhang (2000) to Markov processes with bounded jumps. In this work, we resolve the case of Levy processes under the quadratic cost c(x,y) = 1/2 |x - y|^2 by introducing a new formulation of the "Levy optimal transport problem" between Levy measures. We show that the resulting optimal coupling process {(X_t*, Y_t*)}_{t >= 0} satisfies a minimal growth property: for each t >= 0 and x,y in R^d, the expectation E^{(x,y)}|X_t* - Y_t*|^2 is minimized among all Feller couplings. A key feature of our approach is the development of a dual problem, expressed as a variational principle over test functions of the generators. We prove strong duality for this formulation, thereby closing the optimality gap. As a byproduct, we obtain a Wasserstein-type metric on the space of Levy generators and Levy measures with finite second moment, and establish several of its fundamental properties.
comment: 82 pages
☆ A Constrained Optimization Approach for Constructing Rigid Bar Frameworks with Higher-order Rigidity
We present a systematic approach for constructing bar frameworks that are rigid but not first-order rigid, using constrained optimization. We show that prestress stable (but not first-order rigid) frameworks arise as the solution to a simple optimization problem, which asks to maximize or minimize the length of one edge while keep the other edge lengths fixed. By starting with a random first-order rigid framework, we can thus design a wide variety of prestress stable frameworks, which, unlike many examples known in the literature, have no special symmetries. We then show how to incorporate a bifurcation method to design frameworks that are third-order rigid. Our results highlight connections between concepts in rigidity theory and constrained optimization, offering new insights into the construction and analysis of bar frameworks with higher-order rigidity.
comment: 31 pages, 7 figures
☆ Data-Driven Long-Term Asset Allocation with Tsallis Entropy Regularization
This paper addresses the problem of dynamic asset allocation under uncertainty, which can be formulated as a linear quadratic (LQ) control problem with multiplicative noise. To handle exploration exploitation trade offs and induce sparse control actions, we introduce Tsallis entropy as a regularization term. We develop an entropy regularized policy iteration scheme and provide theoretical guarantees for its convergence. For cases where system dynamics are unknown, we further propose a fully data driven algorithm that estimates Q functions using an instrumental variable least squares approach, allowing efficient and stable policy updates. Our framework connects entropy-regularized stochastic control with model free reinforcement learning, offering new tools for intelligent decision making in finance and automation.
comment: 27 pages, 4 figures
♻ ☆ Shuffling Heuristic in Variational Inequalities: Establishing New Convergence Guarantees
Variational inequalities have gained significant attention in machine learning and optimization research. While stochastic methods for solving these problems typically assume independent data sampling, we investigate an alternative approach -- the shuffling heuristic. This strategy involves permuting the dataset before sequential processing, ensuring equal consideration of all data points. Despite its practical utility, theoretical guarantees for shuffling in variational inequalities remain unexplored. We address this gap by providing the first theoretical convergence estimates for shuffling methods in this context. Our analysis establishes rigorous bounds and convergence rates, extending the theoretical framework for this important class of algorithms. We validate our findings through extensive experiments on diverse benchmark variational inequality problems, demonstrating faster convergence of shuffling methods compared to independent sampling approaches.
comment: 25 pages, 5 figures, 2 tables
♻ ☆ Optimality of Linear Policies in Distributionally Robust Linear Quadratic Control
We study a generalization of the classical discrete-time, Linear-Quadratic-Gaussian (LQG) control problem where the noise distributions affecting the states and observations are unknown and chosen adversarially from divergence-based ambiguity sets centered around a known nominal distribution. For a finite horizon model with Gaussian nominal noise and a structural assumption on the divergence that is satisfied by many examples -- including 2-Wasserstein distance, Kullback-Leibler divergence, moment-based divergences, entropy-regularized optimal transport, or Fisher (score-matching) divergence -- we prove that a control policy that is affine in the observations is optimal and the adversary's corresponding worst-case optimal distribution is Gaussian. When the nominal means are zero (as in the classical LQG model), we show that the adversary should optimally set the distribution's mean to zero and the optimal control policy becomes linear. Moreover, the adversary should optimally ``inflate" the noise by choosing covariance matrices that dominate the nominal covariance in Loewner order. Exploiting these structural properties, we develop a Frank-Wolfe algorithm whose inner step solves standard LQG subproblems via Kalman filtering and dynamic programming and show that the implementation consistently outperforms semidefinite-programming reformulations of the problem. All structural and algorithmic results extend to an infinite-horizon, average-cost formulation, yielding stationary linear policies and a time-invariant Gaussian distribution for the adversary. Lastly, we show that when the divergence is 2-Wasserstein, the entire framework remains valid when the nominal distributions are elliptical rather than Gaussian.
♻ ☆ Flow Matching for Efficient and Scalable Data Assimilation
Data assimilation (DA) estimates a dynamical system's state from noisy observations. Recent generative models like the ensemble score filter (EnSF) improve DA in high-dimensional nonlinear settings but are computationally expensive. We introduce the ensemble flow filter (EnFF), a training-free, flow matching (FM)-based framework that accelerates sampling and offers flexibility in flow design. EnFF uses Monte Carlo estimators for the marginal flow field, localized guidance for observation assimilation, and utilizes a novel flow that exploits the Bayesian DA formulation. It generalizes classical filters such as the bootstrap particle filter and ensemble Kalman filter. Experiments on high-dimensional benchmarks demonstrate EnFF's improved cost-accuracy tradeoffs and scalability, highlighting FM's potential for efficient, scalable DA. Code is available at https://github.com/Utah-Math-Data-Science/Data-Assimilation-Flow-Matching.
comment: revamp presentation and experiment design
♻ ☆ A Randomized Block-Coordinate Primal-Dual Method for Large-scale Stochastic Saddle Point Problems
We consider (stochastic) convex-concave saddle point (SP) problems with high-dimensional decision variables, arising in various applications including machine learning problems. To contend with the challenges in computing full gradients, we employ a randomized block-coordinate primal-dual scheme in which randomly selected primal and dual blocks of variables are updated. We consider both deterministic and stochastic settings, where deterministic partial gradients and their randomly sampled estimates are used, respectively, at each iteration. We investigate the convergence of the proposed method under different blocking strategies and provide the corresponding complexity results. While the best-known computational complexity result for computing a saddle point with $\varepsilon$ primal-dual gap for deterministic primal-dual methods using full gradients is $\mathcal O(\max\{m,n\}^2/\varepsilon)$, where $m$ and $n$ denote the dimensions of primal and dual variables, respectively, we show that our proposed randomized block-coordinate method achieves an improved complexity of $\mathcal O(mn/\varepsilon)$ assuming a coordinate-friendly structure on the problem. Moreover, for the stochastic setting where a mini-batch sample gradient is utilized, we show a computational complexity of $\tilde{\mathcal{O}}(m^2n^2/\varepsilon^2)$ through acceleration. Finally, almost sure convergence of the iterate sequence to a saddle point is established.
comment: In the unbounded domain scenario for the stochastic setting, even almost sure convergence of the iterates does not guarantee uniform boundedness of the stochastic sequence, which prevents one from defining an appropriate restricted gap function as a measure of convergence metric. In this update we provide a resolution to this issue -- see Lemma 4, Remark 3, Lemma 5, Theorem 1 and Corollary 1
♻ ☆ A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite Optimization
Decentralized optimization methods with local updates have recently gained attention for their provable ability to communication acceleration. In these methods, nodes perform several iterations of local computations between the communication rounds. Nevertheless, this capability is effective only when the network is sufficiently well-connected and the loss function is smooth. In this paper, we propose a communication-efficient method MG-Skip with probabilistic local updates and multi-gossip communications for decentralized composite (smooth + nonsmooth) optimization, whose stepsize is independent of the number of local updates and the network topology. For any undirected and connected networks, MG-Skip allows for the multi-gossip communications to be skipped in most iterations in the strongly convex setting, while its computation complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\epsilon}\right)$ and communication complexity is only $\mathcal{O}\left(\sqrt{\frac{\kappa}{(1-\rho)}} \log \frac{1}{\epsilon}\right)$, where $\kappa$ is the condition number of the loss function, $\rho$ reflects the connectivity of the network topology, and $\epsilon$ is the target accuracy. The theoretical results indicate that MG-Skip achieves provable communication acceleration, thereby validating the advantages of local updates in the nonsmooth setting.
♻ ☆ A learning-based approach to stochastic optimal control under reach-avoid constraint
We develop a model-free approach to optimally control stochastic, Markovian systems subject to a reach-avoid constraint. Specifically, the state trajectory must remain within a safe set while reaching a target set within a finite time horizon. Due to the time-dependent nature of these constraints, we show that, in general, the optimal policy for this constrained stochastic control problem is non-Markovian, which increases the computational complexity. To address this challenge, we apply the state-augmentation technique from arXiv:2402.19360, reformulating the problem as a constrained Markov decision process (CMDP) on an extended state space. This transformation allows us to search for a Markovian policy, avoiding the complexity of non-Markovian policies. To learn the optimal policy without a system model, and using only trajectory data, we develop a log-barrier policy gradient approach. We prove that under suitable assumptions, the policy parameters converge to the optimal parameters, while ensuring that the system trajectories satisfy the stochastic reach-avoid constraint with high probability.
♻ ☆ Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size
We theoretically analyzed the convergence behavior of Riemannian stochastic gradient descent (RSGD) and found that using an increasing batch size leads to faster convergence than using a constant batch size, not only with a constant learning rate but also with a decaying learning rate, such as cosine annealing decay and polynomial decay. The convergence rate improves from $O(T^{-1}+C)$ with a constant batch size to $O(T^{-1})$ with an increasing batch size, where $T$ denotes the total number of iterations and $C$ is a constant. Using principal component analysis and low-rank matrix completion, we investigated, both theoretically and numerically, how an increasing batch size affects computational time as quantified by stochastic first-order oracle (SFO) complexity. An increasing batch size was found to reduce the SFO complexity of RSGD. Furthermore, an increasing batch size was found to offer the advantages of both small and large constant batch sizes.
comment: Accepted at ACML2025
♻ ☆ Duality of causal distributionally robust optimization
We study the distributionally robust optimization (DRO) in a dynamic context where the model uncertainty is captured by penalizing potential models in function of their adapted Wasserstein distance to a given reference model. We consider both discrete- and continuous-time settings and derive dynamic duality formulas that reformulate the worst-case expectation as a tractable minimax problem. The inner maximum can be computed recursively in discrete time, or solved by a path-dependent Hamilton--Jacobi--Bellman equation in continuous time. We further extend these duality results from the worst-case expectation to the worst-case expected shortfall, a non-linear expectation. Finally, we apply the DRO framework to optimal stopping problems in discrete time. We recast the original problem as a classical Wasserstein DRO on a nested space by introducing a novel relaxation that considers stopping times with respect to general flitrations.
comment: 25 pages, 2 figures. Added new continuous-time results
♻ ☆ Fast Krasnoselskii-Mann Method with Overrelaxation and Preconditioners
We study accelerated Krasnoselskii-Mann-type methods with preconditioners in both continuous and discrete time. From a continuous-time model, we derive a generalized fast Krasnoselskii-Mann method, providing a new yet simple proof of convergence that leads to unprecedented flexibility in parameter tuning, allowing up to twice larger relaxation parameters. Our analysis unifies inertial and anchoring acceleration mechanisms and offers a broad range of parameter choices, which prove beneficial in practice. Additionally, we extend our analysis to the case where preconditioners are allowed to be degenerate, unifying the treatment of various splitting methods and establishing the weak convergence of shadow sequences.
♻ ☆ The Second-Order Tâtonnement: Decentralized Interior-Point Methods for Market Equilibrium
The t\^atonnement process and Smale's process are two classical approaches to compute market equilibrium in exchange economies. While the t\^atonnement process can be seen as a first-order method, Smale's process, being second-order, is less popular due to its reliance on additional information from the players and expensive Newton steps. In this paper, we study Fisher exchange market for a broad class of utility functions, where we show that all high-order information required by Smale's process is readily available from players' best responses. Motivated by this observation, we develop two second-order t\^atonnement processes, constructed as decentralized interior-point methods, which are traditionally known to work in a centralized manner. The methods here bear the name "t\^atonnement", since, in spirit, they demand no more information than the classical t\^atonnement process. To address the Newton systems involved, we introduce an explicitly invertible approximation with high-probability guarantees and a scaling matrix that optimally minimizes the condition number, both of which rely solely on best responses as the methods themselves. Using these tools, the first second-order t\^atonnement process has O(log(1/$\epsilon$))complexity rate. Under mild conditions, the other method achieves a non-asymptotic superlinear convergence rate. Preliminary experiments are presented to justify the capability of the proposed methods for large-scale problems. Extensions of our approach are also discussed.
comment: improve organization
♻ ☆ Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al \cite{DISZ17} and follow-up work of Liang and Stokes \cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits last-iterate convergence to saddle points in {\em unconstrained} convex-concave min-max optimization problems. We show that the same holds true in the more general problem of {\em constrained} min-max optimization under a variant of the no-regret Multiplicative-Weights-Update method called "Optimistic Multiplicative-Weights Update (OMWU)". This answers an open question of Syrgkanis et al \cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in no-regret learning literature and the aforementioned papers. We show that OMWU monotonically improves the Kullback-Leibler divergence of the current iterate to the (appropriately normalized) min-max solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU is locally (asymptotically) stable converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.
comment: Appeared in ITCS 2019. The new version fixes an issue in the proof of Lemma B.4 of previous version (Lemma B.3 in this version)
♻ ☆ Cost-Aware Opinion Dynamics in Multi-Agents Systems under Malicious Agent Influence
In many MASs, links to malicious agents cannot be severed immediately. Under these conditions, averaging-only consensus mechanisms typically lack sufficient resistance, leaving the system vulnerable to harmful deviations. To address this challenge, this brief leverages the Boomerang Effect from sociology, which drives normal agents to firmly reject malicious inputs, although this strategy may appear overly cautious. Thus, this brief emphasizes the necessity of acknowledging the resulting trade-off between cost and convergence speed in practice. To address this, the additional costs induced by Boomerang-style fusion is analyzed and a cost aware evolution rate adjustment mechanism is proposed. Multi-robot simulations demonstrate that this mechanism suppresses excess costs while maintaining resilience to extremist disruptions and ensuring stable convergence, enabling MAS to efficiently develop in a ethical order.
comment: 7 pages, 9 figures, 2 tables. This work has been submitted to the IEEE for possible publication
♻ ☆ Koopman-based control using sum-of-squares optimization: Improved stability guarantees and data efficiency
In this paper, we propose a novel controller design approach for unknown nonlinear systems using the Koopman operator. In particular, we use the recently proposed stability- and feedback-oriented extended dynamic mode decomposition (SafEDMD) architecture to generate a data-driven bilinear surrogate model with certified error bounds. Then, by accounting for the obtained error bounds in a controller design based on the bilinear system, one can guarantee closed-loop stability for the true nonlinear system. While existing approaches over-approximate the bilinearity of the surrogate model, thus introducing conservatism and providing only local guarantees, we explicitly account for the bilinearity by using sum-of-squares (SOS) optimization in the controller design. More precisely, we parametrize a rational controller stabilizing the error-affected bilinear surrogate model and, consequently, the underlying nonlinear system. The resulting SOS optimization problem provides explicit data-driven controller design conditions for unknown nonlinear systems based on semidefinite programming. Our approach significantly reduces conservatism by establishing a larger region of attraction and improved data efficiency. The proposed method is evaluated using numerical examples, demonstrating its advantages over existing approaches.
comment: Accepted for publication in the European Journal of Control, 2025
♻ ☆ On Tackling High-Dimensional Nonconvex Stochastic Optimization via Stochastic First-Order Methods with Non-smooth Proximal Terms and Variance Reduction
When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. To alleviate this linear dependence, we adopt non-Euclidean settings and propose two choices of non-smooth proximal terms when taking the stochastic gradient steps. This approach leads to stronger convergence metric, incremental computational overhead, and potentially dimension-insensitive sample complexity. We also consider further acceleration through variance reduction which achieves near optimal sample complexity and, to our best knowledge, is the first such result in the $\ell_1/\ell_\infty$ setting. Since the use of non-smooth proximal terms is unconventional, the convergence analysis deviates much from algorithms in Euclidean settings or employing Bregman divergence, providing tools for analyzing other non-Euclidean choices of distance functions. Efficient resolution of the subproblems in various scenarios is also discussed and simulated. We illustrate the dimension-insensitive property of the proposed methods via preliminary numerical experiments.
♻ ☆ Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approach, we conduct an extensive experimental study, evaluating its effectiveness across various tasks and architectures. Our results demonstrate that the proposed method outperforms the current state-of-the-art techniques.
comment: 34 pages, 4 tables, 3 figures, 4 theorems; code available at https://anonymous.4open.science/r/pinns-bgda-00D6/README.md
Systems and Control 23
☆ Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
From metronomes to celestial bodies, mechanics underpins how the world evolves in time and space. With consideration of this, a number of recent neural network models leverage inductive biases from classical mechanics to encourage model interpretability and ensure forecasted states are physical. However, in general, these models are designed to capture the dynamics of a single system with fixed physical parameters, from state-space measurements of a known configuration space. In this paper we introduce Symplectic Phase Space GAN (SPS-GAN) which can capture the dynamics of multiple systems, and generalize to unseen physical parameters from. Moreover, SPS-GAN does not require prior knowledge of the system configuration space. In fact, SPS-GAN can discover the configuration space structure of the system from arbitrary measurement types (e.g., state-space measurements, video frames). To achieve physically plausible generation, we introduce a novel architecture which embeds a Hamiltonian neural network recurrent module in a conditional GAN backbone. To discover the structure of the configuration space, we optimize the conditional time-series GAN objective with an additional physically motivated term to encourages a sparse representation of the configuration space. We demonstrate the utility of SPS-GAN for trajectory prediction, video generation and symmetry discovery. Our approach captures multiple systems and achieves performance on par with supervised models designed for single systems.
☆ Safe Task Space Synchronization with Time-Delayed Information
In this paper, an adaptive controller is designed for the synchronization of the trajectory of a robot with unknown kinematics and dynamics to that of the current human trajectory in the task space using the delayed human trajectory information. The communication time delay may be a result of various factors that arise in human-robot collaboration tasks, such as sensor processing or fusion to estimate trajectory/intent, network delays, or computational limitations. The developed adaptive controller uses Barrier Lyapunov Function (BLF) to constrain the Cartesian coordinates of the robot to ensure safety, an ICL-based adaptive law to account for the unknown kinematics, and a gradient-based adaptive law to estimate unknown dynamics. Barrier Lyapunov-Krasovskii (LK) functionals are used for the stability analysis to show that the synchronization and parameter estimation errors remain semi-globally uniformly ultimately bounded (SGUUB). The simulation results based on a human-robot synchronization scenario with time delay are provided to demonstrate the effectiveness of the designed synchronization controller with safety constraints.
☆ Hierarchical Control Design for Space Robots with Application to In-Orbit Servicing Missions
In-Orbit Servicing and Active Debris Removal require advanced robotic capabilities for capturing and detumbling uncooperative targets. This work presents a hierarchical control framework for autonomous robotic capture of tumbling objects in space. A simulation environment is developed, incorporating sloshing dynamics of the chaser, a rarely studied effect in space robotics. The proposed controller combines an inner Lyapunov-based robust control loop for multi-body dynamics with an outer loop addressing an extended inverse kinematics problem. Simulation results show improved robustness and adaptability compared to existing control schemes.
☆ Stochastic Security Constrained AC Optimal Power Flow Using General Polynomial Chaos Expansion
Addressing the uncertainty introduced by increasing renewable integration is crucial for secure power system operation, yet capturing it while preserving the full nonlinear physics of the grid remains a significant challenge. This paper presents a stochastic security constrained optimal power flow model with chance constraints supporting nonlinear AC power flow equations and non Gaussian uncertainties. We use general polynomial chaos expansion to model arbitrary uncertainties of finite variance, enabling accurate moment computations and robust prediction of system states across diverse operating scenarios. The chance constraints probabilistically limit inequality violations, providing a more flexible representation of controllable variables and the consequent power system operation. Case studies validate the proposed models effectiveness in satisfying operational constraints and capturing uncertainty with high fidelity. Compared to the deterministic formulation, it also uncovers a wider set of unsecure contingencies, highlighting improved uncertainty capture and operational insight.
comment: 2025 IEEE International Conference on Energy Technologies for Future Grids (6 pages)
☆ Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data
Reliable detection and classification of power system events are critical for maintaining grid stability and situational awareness. Existing approaches often depend on limited labeled datasets, which restricts their ability to generalize to rare or unseen disturbances. This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification using synchrophasor data. A variational autoencoder-generative adversarial network is employed to model normal operating conditions, where both reconstruction error and discriminator error are extracted as anomaly indicators. Two complementary decision strategies are developed: a threshold-based rule for computational efficiency and a convex hull-based method for robustness under complex error distributions. These features are organized into spatiotemporal detection and classification matrices through a sliding-window mechanism, and an identification and decision fusion stage integrates the outputs across PMUs. This design enables the framework to identify known events while systematically classifying previously unseen disturbances into a new category, addressing a key limitation of supervised classifiers. Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines. The ability to recognize unknown events further highlights the adaptability and practical value of the proposed approach for wide-area event analysis in modern power systems.
comment: 10 pages
☆ A Preliminary Assessment of Shipboard Power System Architectures for LVDC Integration
The adoption of low-voltage direct current sections within grid architectures is emerging as a promising design option in the naval sector. This paper presents a preliminary comparative assessment of three different grid topologies, using an existing MVAC-LVAC shipboard power system as a reference: a conventional MVAC-LVAC radial distribution with an additional LVDC section, a full LVDC radial distribution and a zonal LVDC distribution. Each architecture includes typical elements such as synchronous generators, propulsion motors, energy storage system units, extra propulsive loads, and pulse power loads. The analysis exploits five key performance indicators: weight, volume, technology readiness level, average system interruption duration index, and pulsed power loads interruption index.
comment: Presented at the 10th IEEE Workshop on the Electronic Grid (eGrid 2025)
☆ Neo-Grounded Theory: A Methodological Innovation Integrating High-Dimensional Vector Clustering and Multi-Agent Collaboration for Qualitative Research
Purpose: Neo Grounded Theory (NGT) integrates vector clustering with multi agent systems to resolve qualitative research's scale depth paradox, enabling analysis of massive datasets in hours while preserving interpretive rigor. Methods: We compared NGT against manual coding and ChatGPT-assisted analysis using 40,000 character Chinese interview transcripts. NGT employs 1536-dimensional embeddings, hierarchical clustering, and parallel agent-based coding. Two experiments tested pure automation versus human guided refinement. Findings: NGT achieved 168-fold speed improvement (3 hours vs 3 weeks), superior quality (0.904 vs 0.883), and 96% cost reduction. Human AI collaboration proved essential: automation alone produced abstract frameworks while human guidance yielded actionable dual pathway theories. The system discovered patterns invisible to manual coding, including identity bifurcation phenomena. Contributions: NGT demonstrates computational objectivity and human interpretation are complementary. Vector representations provide reproducible semantic measurement while preserving meaning's interpretive dimensions. Researchers shift from mechanical coding to theoretical guidance, with AI handling pattern recognition while humans provide creative insight. Implications: Cost reduction from \$50,000 to \$500 democratizes qualitative research, enabling communities to study themselves. Real-time analysis makes qualitative insights contemporaneous with events. The framework shows computational methods can strengthen rather than compromise qualitative research's humanistic commitments. Keywords: Grounded theory; Vector embeddings; Multi agent systems; Human AI collaboration; Computational qualitative analysis
comment: 44 pages, 11 figures
☆ Safe-by-Design: Approximate Nonlinear Model Predictive Control with Real Time Feasibility
This paper establishes relationships between continuous-time, receding horizon, nonlinear model predictive control (MPC) and control Lyapunov and control barrier functions (CLF/CBF). We show that, if the cost function "behaves well" for points in the terminal set, then the optimal value function and the feasible set, respectively, define a compatible CLF/CBF pair on the MPC's region of attraction. We then proceed to prove that any approximation of the value function and the feasible set also define a CLF/CBF pair, as long as those approximations satisfy the same "well behavedness" condition; and that a feasible state feedback can be computed by solving an infinitesimal version of the MPC problem. This methodology permits the formulation of continuous-time small-sized quadratic programs for feedback and enables approximate solutions of the nonlinear model predictive controller with theoretical safety and convergence guarantee. Finally, we demonstrate the effectiveness of the proposed approach when compared to other constrained control techniques through numerical experiments for nonlinear constrained spacecraft control.
comment: This work has been submitted to the IEEE Transactions on Automatic Control for possible publication
☆ Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security
This study presents a structured approach to evaluating vulnerabilities within quantum cryptographic protocols, focusing on the BB84 quantum key distribution method and National Institute of Standards and Technology (NIST) approved quantum-resistant algorithms. By integrating AI-driven red teaming, automated penetration testing, and real-time anomaly detection, the research develops a framework for assessing and mitigating security risks in quantum networks. The findings demonstrate that AI can be effectively used to simulate adversarial attacks, probe weaknesses in cryptographic implementations, and refine security mechanisms through iterative feedback. The use of automated exploit simulations and protocol fuzzing provides a scalable means of identifying latent vulnerabilities, while adversarial machine learning techniques highlight novel attack surfaces within AI-enhanced cryptographic processes. This study offers a comprehensive methodology for strengthening quantum security and provides a foundation for integrating AI-driven cybersecurity practices into the evolving quantum landscape.
☆ Effect of Gait Design on Proprioceptive Sensing of Terrain Properties in a Quadrupedal Robot ICRA
In-situ robotic exploration is an important tool for advancing knowledge of geological processes that describe the Earth and other Planetary bodies. To inform and enhance operations for these roving laboratories, it is imperative to understand the terramechanical properties of their environments, especially for traversing on loose, deformable substrates. Recent research suggested that legged robots with direct-drive and low-gear ratio actuators can sensitively detect external forces, and therefore possess the potential to measure terrain properties with their legs during locomotion, providing unprecedented sampling speed and density while accessing terrains previously too risky to sample. This paper explores these ideas by investigating the impact of gait on proprioceptive terrain sensing accuracy, particularly comparing a sensing-oriented gait, Crawl N' Sense, with a locomotion-oriented gait, Trot-Walk. Each gait's ability to measure the strength and texture of deformable substrate is quantified as the robot locomotes over a laboratory transect consisting of a rigid surface, loose sand, and loose sand with synthetic surface crusts. Our results suggest that with both the sensing-oriented crawling gait and locomotion-oriented trot gait, the robot can measure a consistent difference in the strength (in terms of penetration resistance) between the low- and high-resistance substrates; however, the locomotion-oriented trot gait contains larger magnitude and variance in measurements. Furthermore, the slower crawl gait can detect brittle ruptures of the surface crusts with significantly higher accuracy than the faster trot gait. Our results offer new insights that inform legged robot "sensing during locomotion" gait design and planning for scouting the terrain and producing scientific measurements on other worlds to advance our understanding of their geology and formation.
comment: 7+1 pages, 5 figures, ICRA Submission This work has been submitted to the IEEE for possible publication
☆ A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
We present a wearable, fully-dry, and ultra-low power EMG system for silent speech recognition, integrated into a textile neckband to enable comfortable, non-intrusive use. The system features 14 fully-differential EMG channels and is based on the BioGAP-Ultra platform for ultra-low power (22 mW) biosignal acquisition and wireless transmission. We evaluate its performance on eight speech commands under both vocalized and silent articulation, achieving average classification accuracies of 87$\pm$3% and 68$\pm$3% respectively, with a 5-fold CV approach. To mimic everyday-life conditions, we introduce session-to-session variability by repositioning the neckband between sessions, achieving leave-one-session-out accuracies of 64$\pm$18% and 54$\pm$7% for the vocalized and silent experiments, respectively. These results highlight the robustness of the proposed approach and the promise of energy-efficient silent-speech decoding.
comment: 4 pages, 4 figures
☆ Distributed Time-Varying Optimization via Unbiased Extremum Seeking
This paper proposes a novel distributed optimization framework that addresses time-varying optimization problems without requiring explicit derivative information of the objective functions. Traditional distributed methods often rely on derivative computations, limiting their applicability when only real-time objective function measurements are available. Leveraging unbiased extremum seeking, we develop continuous-time algorithms that utilize local measurements and neighbor-shared data to collaboratively track time-varying optima. Key advancements include compatibility with directed communication graphs, customizable convergence rates (asymptotic, exponential, or prescribed-time), and the ability to handle dynamically evolving objectives. By integrating chirpy probing signals with time-varying frequencies, our unified framework achieves accelerated convergence while maintaining stability under mild assumptions. Theoretical guarantees are established through Lie bracket averaging and Lyapunov-based analysis, with linear matrix inequality conditions ensuring rigorous convergence. Numerical simulations validate the effectiveness of the algorithms.
comment: 13pages, extended version
☆ Finite Sample Analyses for Continuous-time Linear Systems: System Identification and Online Control
Real world evolves in continuous time but computations are done from finite samples. Therefore, we study algorithms using finite observations in continuous-time linear dynamical systems. We first study the system identification problem, and propose a first non-asymptotic error analysis with finite observations. Our algorithm identifies system parameters without needing integrated observations over certain time intervals, making it more practical for real-world applications. Further we propose a lower bound result that shows our estimator is provably optimal up to constant factors. Moreover, we apply the above algorithm to online control regret analysis for continuous-time linear system. Our system identification method allows us to explore more efficiently, enabling the swift detection of ineffective policies. We achieve a regret of $\mathcal{O}(\sqrt{T})$ over a single $T$-time horizon in a controllable system, requiring only $\mathcal{O}(T)$ observations of the system.
☆ Optimized Control of Duplex Networks
Many real-world complex systems can be modeled as multiplex networks, where each layer represents a distinct set of interactions among the same entities. Controlling such systems-steering them toward desired states using external inputs-is crucial across many domains. However, existing network control theory largely focuses on single-layer networks, and applying separate controls to each layer of a multiplex system often leads to redundant sets of driver nodes, increasing cost and complexity. To address this challenge, we formulate the Universal Minimum Union Driver Set (MinUDS) problem for duplex networks. The goal is to find the smallest set of driver nodes that can simultaneously control both layers. We propose a novel algorithm, Shortest Cross-Layer Augmenting Path Search (CLAP-S). This method introduces the concept of a Cross-Layer Augmenting Path (CLAP) and efficiently explores the combinatorial space of control configurations. CLAP-S iteratively realigns each layer's Minimum Driver Set (MDS) to maximize their overlap. We prove the algorithm's global optimality and demonstrate its efficiency on both synthetic networks and real-world multiplex systems. The results show that CLAP-S consistently outperforms baseline approaches by significantly reducing the number of required driver nodes and cutting computational time by an order of magnitude. This work provides a powerful, general-purpose tool for optimizing control strategies in multi-layer networks, enabling more economical interventions in diverse fields.
☆ Reinforcement Learning Based Traffic Signal Design to Minimize Queue Lengths
Efficient traffic signal control (TSC) is crucial for reducing congestion, travel delays, pollution, and for ensuring road safety. Traditional approaches, such as fixed signal control and actuated control, often struggle to handle dynamic traffic patterns. In this study, we propose a novel adaptive TSC framework that leverages Reinforcement Learning (RL), using the Proximal Policy Optimization (PPO) algorithm, to minimize total queue lengths across all signal phases. The challenge of efficiently representing highly stochastic traffic conditions for an RL controller is addressed through multiple state representations, including an expanded state space, an autoencoder representation, and a K-Planes-inspired representation. The proposed algorithm has been implemented using the Simulation of Urban Mobility (SUMO) traffic simulator and demonstrates superior performance over both traditional methods and other conventional RL-based approaches in reducing queue lengths. The best performing configuration achieves an approximately 29% reduction in average queue lengths compared to the traditional Webster method. Furthermore, comparative evaluation of alternative reward formulations demonstrates the effectiveness of the proposed queue-based approach, showcasing the potential for scalable and adaptive urban traffic management.
☆ On Suboptimal Safety-Critical Tracking Controller Design
This paper proposes a novel framework for safety-critical optimal trajectory tracking in nonlinear systems based on the state-dependent Riccati equation (SDRE) methodology. By embedding barrier states into the system dynamics, the proposed strategy simultaneously ensures safety and tracking requirements, even in scenarios where these objectives may be inherently conflicting. A discounted pseudo-quadratic cost function is formulated to achieve a suboptimal trade-off between tracking accuracy, control effort, and safety objective. We present two distinct controller designs: one utilizing a single barrier state to enforce overall safety constraints, and another employing multiple barrier states to individually tuning the system's conservatism with respect to each safety constraint, providing enhanced flexibility in tuning the system's conservatism toward individual constraints. We establish sufficient conditions to ensure the solvability of the associated Riccati equations. The proposed safe controller is well-suited for real-time implementation in practical systems, given its reasonable computational requirements and compatibility with widely available embedded microprocessors. This is supported by simulation studies involving a mechanical system and a mobile robot collision avoidance scenario, where the safe SDRE controller consistently maintained safety while achieving trajectory tracking objectives in challenging conditions. Additionally, experimental results on a cable-driven parallel robot further demonstrate the practical applicability and effectiveness of the proposed method in real-world control tasks.
comment: 14 pages, 7 figures. This work has been submitted to Automatica for possible publication
♻ ☆ Constraint Hypergraphs as a Unifying Framework for Digital Twins
Digital twins, used to represent physical systems, have been lauded as tools for understanding reality. Complex system behavior is typically captured in domain-specific models crafted by subject experts. Contemporary methods for employing models in a digital twin require prescriptive interfaces, resulting in twins that are difficult to connect, redeploy, and modify. The limited interoperability of these twins has prompted calls for a universal framework enabling observability across model aggregations. Here we show how a new mathematical formalism called a constraint hypergraph addresses these challenges in interoperability. A digital twin is shown to be the second of two coupled systems where both adhere to the same constraint hypergraph, permitting the properties of the first to be observable from the second. Interoperability is given by deconstructing models into a structure enabling autonomous, white-box simulation of system properties. The resulting digital twins can interact immediately with both human and autonomous agents. This is demonstrated in a case study of a microgrid, showing how both measured and simulated data from the aggregated twins can be provided regardless of the operating environment. By connecting models, constraint hypergraphs supply scientists and modelers robust means to capture, communicate, and combine digital twins across all fields of study. We expect this framework to expand the use of digital twins, enriching scientific insights and collaborations by providing a structure for characterizing complex systems.
comment: 12 pages, 7 figures
♻ ☆ Vibrational Stabilization of Complex Network Systems
Many natural and man-made network systems need to maintain certain patterns, such as working at equilibria or limit cycles, to function properly. Thus, the ability to stabilize such patterns is crucial. Most of the existing studies on stabilization assume that network systems states can be measured online so that feedback control strategies can be used. However, in many real-world scenarios, systems states, e.g., neuronal activity in the brain, are often difficult to measure. In this paper, we take this situation into account and study the stabilization problem of linear network systems with an open-loop control strategy (vibrational control). We derive a graph-theoretic sufficient condition for structural vibrational stabilizability, under which network systems can always be stabilized. We further provide an approach to select the locations in the network for control placement and design corresponding vibrational inputs to stabilize systems that satisfy this condition. Finally, we provide some numerical results that demonstrate the validity of our theoretical findings.
♻ ☆ The need for and feasibility of alternative ground robots to traverse sandy and rocky extraterrestrial terrain
Robotic spacecraft have helped expand our reach for many planetary exploration missions. Most ground mobile planetary exploration robots use wheeled or modified wheeled platforms. Although extraordinarily successful at completing intended mission goals, because of the limitations of wheeled locomotion, they have been largely limited to benign, solid terrain and avoided extreme terrain with loose soil/sand and large rocks. Unfortunately, such challenging terrain is often scientifically interesting for planetary geology. Although many animals traverse such terrain at ease, robots have not matched their performance and robustness. This is in major part due to a lack of fundamental understanding of how effective locomotion can be generated from controlled interaction with complex terrain on the same level of flight aerodynamics and underwater vehicle hydrodynamics. Early fundamental understanding of legged and limbless locomotor-ground interaction has already enabled stable and efficient bio-inspired robot locomotion on relatively flat ground with small obstacles. Recent progress in the new field of terradynamics of locomotor-terrain interaction begins to reveal the principles of bio-inspired locomotion on loose soil/sand and over large obstacles. Multi-legged and limbless platforms using terradynamics insights hold the promise for serving as robust alternative platforms for traversing extreme extraterrestrial terrain and expanding our reach in planetary exploration.
♻ ☆ Gaussian-Process-based Adaptive Tracking Control with Dynamic Active Learning for Autonomous Ground Vehicles
This article proposes an active-learning-based adaptive trajectory tracking control method for autonomous ground vehicles to compensate for modeling errors and unmodeled dynamics. The nominal vehicle model is decoupled into lateral and longitudinal subsystems, which are augmented with online Gaussian Processes (GPs), using measurement data. The estimated mean functions of the GPs are used to construct a feedback compensator, which, together with an LPV state feedback controller designed for the nominal system, gives the adaptive control structure. To assist exploration of the dynamics, the paper proposes a new, dynamic active learning method to collect the most informative samples to accelerate the training process. To analyze the performance of the overall learning tool-chain provided controller, a novel iterative, counterexample-based algorithm is proposed for calculating the induced L2 gain between the reference trajectory and the tracking error. The analysis can be executed for a set of possible realizations of the to-be-controlled system, giving robust performance certificate of the learning method under variation of the vehicle dynamics. The efficiency of the proposed control approach is shown on a high-fidelity physics simulator and in real experiments using a 1/10 scale F1TENTH electric car.
comment: Submitted to IEEE Transactions on Control Systems Technology (revised)
♻ ☆ Divide, Conquer and Verify: Improving Symbolic Execution Performance
Symbolic Execution is a formal method that can be used to verify the behavior of computer programs and detect software vulnerabilities. Compared to other testing methods such as fuzzing, Symbolic Execution has the advantage of providing formal guarantees about the program. However, despite advances in performance in recent years, Symbolic Execution is too slow to be applied to real-world software. This is primarily caused by the \emph{path explosion problem} as well as by the computational complexity of SMT solving. In this paper, we present a divide-and-conquer approach for symbolic execution by executing individual slices and later combining the side effects. This way, the overall problem size is kept small, reducing the impact of computational complexity on large problems.
♻ ☆ An approach to the LQG/LTR design problem with specifications for finite-dimensional SISO control systems
This is an expository paper which discusses an approach to the LQG/LTR design problem for finite-dimensional SISO control systems. The approach is based on the utilisation of weighting augmentation for incorporating design specifications into the framework of the LTR technique for LQG compensator design. The LQG compensator is to simultaneously meet given analytical low- and high-frequency design specifications expressed in terms of desirable sensitivity and controller noise sensitivity functions. The paper is aimed at nonspecialists and, in particular, practitioners in finite-dimensional LQG theory interested in the design of feedback compensators for closed-loop performance and robustness shaping of SISO control systems in realistic situations. The proposed approach is illustrated by a detailed numerical example: the torque control of a geared DC motor with an elastically mounted output shaft.
comment: typos corrected
♻ ☆ Optimal Behavior Planning for Implicit Communication using a Probabilistic Vehicle-Pedestrian Interaction Model
In interactions between automated vehicles (AVs) and crossing pedestrians, modeling implicit vehicle communication is crucial. In this work, we present a combined prediction and planning approach that allows to consider the influence of the planned vehicle behavior on a pedestrian and predict a pedestrian's reaction. We plan the behavior by solving two consecutive optimal control problems (OCPs) analytically, using variational calculus. We perform a validation step that assesses whether the planned vehicle behavior is adequate to trigger a certain pedestrian reaction, which accounts for the closed-loop characteristics of prediction and planning influencing each other. In this step, we model the influence of the planned vehicle behavior on the pedestrian using a probabilistic behavior acceptance model that returns an estimate for the crossing probability. The probabilistic modeling of the pedestrian reaction facilitates considering the pedestrian's costs, thereby improving cooperative behavior planning. We demonstrate the performance of the proposed approach in simulated vehicle-pedestrian interactions with varying initial settings and highlight the decision making capabilities of the planning approach.
comment: 8 pages, 5 figures, conference article
Optimization and Control 46
☆ MDP modeling for multi-stage stochastic programs
We study a class of multi-stage stochastic programs, which incorporate modeling features from Markov decision processes (MDPs). This class includes structured MDPs with continuous state and action spaces. We extend policy graphs to include decision-dependent uncertainty for one-step transition probabilities as well as a limited form of statistical learning. We focus on the expressiveness of our modeling approach, illustrating ideas with a series of examples of increasing complexity. As a solution method, we develop new variants of stochastic dual dynamic programming, including approximations to handle non-convexities.
☆ Multihead Finite-State Dimension
We introduce multihead finite-state dimension, a generalization of finite-state dimension in which a group of finite-state agents (the heads) with oblivious, one-way movement rules, each reporting only one symbol at a time, enable their leader to bet on subsequent symbols in an infinite data stream. In aggregate, such a scheme constitutes an $h$-head finite state gambler whose maximum achievable growth rate of capital in this task, quantified using betting strategies called gales, determines the multihead finite-state dimension of the sequence. The 1-head case is equivalent to finite-state dimension as defined by Dai, Lathrop, Lutz and Mayordomo (2004). In our main theorem, we prove a strict hierarchy as the number of heads increases, giving an explicit sequence family that separates, for each positive integer $h$, the earning power of $h$-head finite-state gamblers from that of $(h+1)$-head finite-state gamblers. We prove that multihead finite-state dimension is stable under finite unions but that the corresponding quantity for any fixed number $h>1$ of heads--the $h$-head finite-state predimension--lacks this stability property.
☆ Mixtures Closest to a Given Measure: A Semidefinite Programming Approach
Mixture models, such as Gaussian mixture models, are widely used in machine learning to represent complex data distributions. A key challenge, especially in high-dimensional settings, is to determine the mixture order and estimate the mixture parameters. We study the problem of approximating a target measure, available only through finitely many of its moments, by a mixture of distributions from a parametric family (e.g., Gaussian, exponential, Poisson), with approximation quality measured by the 2-Wasserstein or the total variation distance. Unlike many existing approaches, the parameter set is not assumed to be finite; it is modeled as a compact basic semi-algebraic set. We introduce a hierarchy of semidefinite relaxations with asymptotic convergence to the desired optimal value. In addition, when a certain rank condition is satisfied, the convergence is even finite and recovery of an optimal mixing measure is obtained. We also present an application to clustering, where our framework serves either as a stand-alone method or as a preprocessing step that yields both the number of clusters and strong initial parameter estimates, thereby accelerating convergence of standard (local) clustering algorithms.
comment: 22 pages, 2 algorithms, 1 table, 3 figures
☆ Teleoperator-Aware and Safety-Critical Adaptive Nonlinear MPC for Shared Autonomy in Obstacle Avoidance of Legged Robots
Ensuring safe and effective collaboration between humans and autonomous legged robots is a fundamental challenge in shared autonomy, particularly for teleoperated systems navigating cluttered environments. Conventional shared-control approaches often rely on fixed blending strategies that fail to capture the dynamics of legged locomotion and may compromise safety. This paper presents a teleoperator-aware, safety-critical, adaptive nonlinear model predictive control (ANMPC) framework for shared autonomy of quadrupedal robots in obstacle-avoidance tasks. The framework employs a fixed arbitration weight between human and robot actions but enhances this scheme by modeling the human input with a noisily rational Boltzmann model, whose parameters are adapted online using a projected gradient descent (PGD) law from observed joystick commands. Safety is enforced through control barrier function (CBF) constraints integrated into a computationally efficient NMPC, ensuring forward invariance of safe sets despite uncertainty in human behavior. The control architecture is hierarchical: a high-level CBF-based ANMPC (10 Hz) generates blended human-robot velocity references, a mid-level dynamics-aware NMPC (60 Hz) enforces reduced-order single rigid body (SRB) dynamics to track these references, and a low-level nonlinear whole-body controller (500 Hz) imposes the full-order dynamics via quadratic programming to track the mid-level trajectories. Extensive numerical and hardware experiments, together with a user study, on a Unitree Go2 quadrupedal robot validate the framework, demonstrating real-time obstacle avoidance, online learning of human intent parameters, and safe teleoperator collaboration.
☆ Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives NeurIPS 2025
In this paper, we present two effective policy learning algorithms for multi-agent online coordination(MA-OC) problem. The first one, \texttt{MA-SPL}, not only can achieve the optimal $(1-\frac{c}{e})$-approximation guarantee for the MA-OC problem with submodular objectives but also can handle the unexplored $\alpha$-weakly DR-submodular and $(\gamma,\beta)$-weakly submodular scenarios, where $c$ is the curvature of the investigated submodular functions, $\alpha$ denotes the diminishing-return(DR) ratio and the tuple $(\gamma,\beta)$ represents the submodularity ratios. Subsequently, in order to reduce the reliance on the unknown parameters $\alpha,\gamma,\beta$ inherent in the \texttt{MA-SPL} algorithm, we further introduce the second online algorithm named \texttt{MA-MPL}. This \texttt{MA-MPL} algorithm is entirely \emph{parameter-free} and simultaneously can maintain the same approximation ratio as the first \texttt{MA-SPL} algorithm. The core of our \texttt{MA-SPL} and \texttt{MA-MPL} algorithms is a novel continuous-relaxation technique termed as \emph{policy-based continuous extension}. Compared with the well-established \emph{multi-linear extension}, a notable advantage of this new \emph{policy-based continuous extension} is its ability to provide a lossless rounding scheme for any set function, thereby enabling us to tackle the challenging weakly submodular objectives. Finally, extensive simulations are conducted to validate the effectiveness of our proposed algorithms.
comment: Accepted to NeurIPS 2025
☆ Learning to Price Bundles: A GCN Approach for Mixed Bundling
Bundle pricing refers to designing several product combinations (i.e., bundles) and determining their prices in order to maximize the expected profit. It is a classic problem in revenue management and arises in many industries, such as e-commerce, tourism, and video games. However, the problem is typically intractable due to the exponential number of candidate bundles. In this paper, we explore the usage of graph convolutional networks (GCNs) in solving the bundle pricing problem. Specifically, we first develop a graph representation of the mixed bundling model (where every possible bundle is assigned with a specific price) and then train a GCN to learn the latent patterns of optimal bundles. Based on the trained GCN, we propose two inference strategies to derive high-quality feasible solutions. A local-search technique is further proposed to improve the solution quality. Numerical experiments validate the effectiveness and efficiency of our proposed GCN-based framework. Using a GCN trained on instances with 5 products, our methods consistently achieve near-optimal solutions (better than 97%) with only a fraction of computational time for problems of small to medium size. It also achieves superior solutions for larger size of problems compared with other heuristic methods such as bundle size pricing (BSP). The method can also provide high quality solutions for instances with more than 30 products even for the challenging cases where product utilities are non-additive.
☆ Explicit Global Convergence Rates of BFGS without Line Search
This paper studies the convergence rates of the Broyden--Fletcher--Goldfarb--Shanno~(BFGS) method without line search. We show that the BFGS method with an adaptive step size [Gao and Goldfarb, Optimization Methods and Software, 34(1):194-217, 2019] exhibits a two-phase non-asymptotic global convergence behavior when minimizing a strongly convex function, i.e., a linear convergence rate of $\mathcal{O}((1 - 1 / \varkappa)^{k})$ in the first phase and a superlinear convergence rate of $\mathcal{O}((\varkappa / k)^{k})$ in the second phase, where $k$ is the iteration counter and $\varkappa$ is the condition number. In contrast, the existing analysis only establishes asymptotic results. Furthermore, we propose a novel adaptive BFGS method without line search, which allows a larger step size by taking the gradient Lipschitz continuity into the algorithm design. We prove that our method achieves faster convergence when the initial point is far away from the optimal solution.
☆ Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation.
☆ A dynamical formulation of multi-marginal optimal transport
We present a primal-dual dynamical formulation of the multi-marginal optimal transport problem for (semi-)convex cost functions. Even in the two-marginal setting, this formulation applies to cost functions not covered by the classical dynamical approach of Benamou-Brenier. Our dynamical formulation yields a convex optimization problem, enabling the use of convex optimization tools to find quasi-Monge solutions of the static multi-marginal problem for translation-invariant costs. We illustrate our results numerically with proximal splitting methods.
☆ On an optimization framework for damage localization in structures
Efficient structural damage localization remains a challenge in structural health monitoring (SHM), particularly when the problem is coupled with uncertainty of conditions and complexity of structures. Traditional methods simply based on experimental data processing are often not sufficiently reliable, while complex models often struggle with computational inefficiency given the tremendous amount of model parameters. This paper focuses on closing the gap between data-driven SHM and physics-based model updating by offering a solution for real-world infrastructure. We first concentrate on fusing multi-source damage-sensitive features (DSF) based on experimental modal data into spatially mapped belief masses to pre-screen candidate damage locations. The resulting candidate damage locations are integrated into an inverse Finite Element method (FEM) model calibration process. We propose an optimization framework to identify the most probable damage scenario with single and multi-damage cases. We present the corresponding numerical results in this paper, which open the door to extend the application of the framework to a complex real bridge structure.
comment: 32 pages, 17 figures
☆ Learning from Delayed Feedback in Games via Extra Prediction
This study raises and addresses the problem of time-delayed feedback in learning in games. Because learning in games assumes that multiple agents independently learn their strategies, a discrepancy in optimization often emerges among the agents. To overcome this discrepancy, the prediction of the future reward is incorporated into algorithms, typically known as Optimistic Follow-the-Regularized-Leader (OFTRL). However, the time delay in observing the past rewards hinders the prediction. Indeed, this study firstly proves that even a single-step delay worsens the performance of OFTRL from the aspects of regret and convergence. This study proposes the weighted OFTRL (WOFTRL), where the prediction vector of the next reward in OFTRL is weighted $n$ times. We further capture an intuition that the optimistic weight cancels out this time delay. We prove that when the optimistic weight exceeds the time delay, our WOFTRL recovers the good performances that the regret is constant ($O(1)$-regret) in general-sum normal-form games, and the strategies converge to the Nash equilibrium as a subsequence (best-iterate convergence) in poly-matrix zero-sum games. The theoretical results are supported and strengthened by our experiments.
comment: 11 pages, 3 figures (main); 9 pages (appendix)
☆ Safe-by-Design: Approximate Nonlinear Model Predictive Control with Real Time Feasibility
This paper establishes relationships between continuous-time, receding horizon, nonlinear model predictive control (MPC) and control Lyapunov and control barrier functions (CLF/CBF). We show that, if the cost function "behaves well" for points in the terminal set, then the optimal value function and the feasible set, respectively, define a compatible CLF/CBF pair on the MPC's region of attraction. We then proceed to prove that any approximation of the value function and the feasible set also define a CLF/CBF pair, as long as those approximations satisfy the same "well behavedness" condition; and that a feasible state feedback can be computed by solving an infinitesimal version of the MPC problem. This methodology permits the formulation of continuous-time small-sized quadratic programs for feedback and enables approximate solutions of the nonlinear model predictive controller with theoretical safety and convergence guarantee. Finally, we demonstrate the effectiveness of the proposed approach when compared to other constrained control techniques through numerical experiments for nonlinear constrained spacecraft control.
comment: This work has been submitted to the IEEE Transactions on Automatic Control for possible publication
☆ Extremal Eigenvalues of Weighted Steklov Problems
We study the optimization of Steklov eigenvalues with respect to a boundary density function $\rho$ on a bounded Lipschitz domain $\Omega \subset \mathbb{R}^N$. We investigate the minimization and maximization of $\lambda_k(\rho)$, the $k$th Steklov eigenvalue, over admissible densities satisfying pointwise bounds and a fixed integral constraint. Our analysis covers both first and higher-order eigenvalues and applies to general, not necessarily convex or simply connected, domains. We establish the existence of optimal solutions and provide structural characterizations: minimizers are bang--bang functions and may have disconnected support, while maximizers are not necessarily bang--bang. On circular domains, the minimization problem admits infinitely many minimizers generated by rotational symmetry, while the maximization problem has infinitely many distinct maximizers that are not symmetry-induced. We also show that the maps $\rho \mapsto \lambda_k(\rho)$ and $\rho \mapsto 1/\lambda_k(\rho)$ are generally neither convex nor concave, limiting the use of classical convex optimization tools. To address these challenges, we analyze the objective functional and introduce a Fr\'echet differentiable surrogate that enables the derivation of optimality conditions. We further design an efficient numerical algorithm, with experiments illustrating the difficulty of recovering optimal densities when they lack smoothness or exhibit oscillations.
☆ Stability Analysis of An Integrated Multistage Stochastic Programming and Markov Decision Process Problem
In this paper, we consider an integrated MSP-MDP framework which captures features of Markov decision process (MDP) and multistage stochastic programming (MSP). The integrated framework allows one to study a dynamic decision-making process that involves both transition of system states and dynamic change of the stochastic environment affected respectively by potential endogenous uncertainties and exogenous uncertainties. The integrated model differs from classical MDP models by taking into account the effect of history-dependent exogenous uncertainty and distinguishes itself from standard MSP models by explicitly considering transition of states between stages. We begin by deriving dynamic nested reformulation of the problem and the Lipschitz continuity and convexity of the stage-wise optimal value functions. We then move on to investigate stability of the problem in terms of the optimal value and the set of optimal solutions under the perturbations of the probability distributions of the endogenous uncertainty and the exogenous uncertainty. Specifically, we quantify the effects of the perturbation of the two uncertainties on the optimal values and optimal solutions by deriving the error bounds in terms of Kantorovich metric and Fortet-Mourier metric of the probability distributions of the respective uncertainties. These results differ from the existing stability results established in terms of the filtration distance \cite{heitsch2009scenario} or the nested distance \cite{pflug2012distance}. We use some examples to explain the differences via tightness of the error bounds and applicability of the stability results. The results complement the existing stability results and provide new theoretical grounding for emerging integrated MSP-MDP models.
☆ Slicing Wasserstein Over Wasserstein Via Functional Optimal Transport
Wasserstein distances define a metric between probability measures on arbitrary metric spaces, including meta-measures (measures over measures). The resulting Wasserstein over Wasserstein (WoW) distance is a powerful, but computationally costly tool for comparing datasets or distributions over images and shapes. Existing sliced WoW accelerations rely on parametric meta-measures or the existence of high-order moments, leading to numerical instability. As an alternative, we propose to leverage the isometry between the 1d Wasserstein space and the quantile functions in the function space $L_2([0,1])$. For this purpose, we introduce a general sliced Wasserstein framework for arbitrary Banach spaces. Due to the 1d Wasserstein isometry, this framework defines a sliced distance between 1d meta-measures via infinite-dimensional $L_2$-projections, parametrized by Gaussian processes. Combining this 1d construction with classical integration over the Euclidean unit sphere yields the double-sliced Wasserstein (DSW) metric for general meta-measures. We show that DSW minimization is equivalent to WoW minimization for discretized meta-measures, while avoiding unstable higher-order moments and computational savings. Numerical experiments on datasets, shapes, and images validate DSW as a scalable substitute for the WoW distance.
☆ A Variational Calculus for Optimal Control of the Generalized Riemann Problem for Hyperbolic Systems of Conservation Laws
We develop a variational calculus for entropy solutions of the Generalized Riemann Problem (GRP) for strictly hyperbolic systems of conservation laws where the control is the initial state. The GRP has a discontinuous initial state with exactly one discontinuity and continuously differentiable ($C^1$) states left and right of it. The control consists of the $C^1$ parts of the initial state and the position of the discontinuity. Solutions of the problem are generally discontinuous since they contain shock curves. We assume the time horizon $T>0$ to be sufficiently small such that no shocks interact and no new shocks are generated. Moreover, we assume that no rarefaction waves occur and that the jump of the initial state is sufficiently small. Since the shock positions depend on the control, a transformation to a reference space is used to fix the shock positions. In the reference space, we prove that the solution of the GRP between the shocks is continuously differentiable from the control space to $C^0$. In physical coordinates, this implies that the shock curves in $C^1$ and the states between the shocks in the topology of $C^0$ depend continuously differentiable on the control. As a consequence, we obtain the differentiability of tracking type objective functionals.
☆ A note on the Davis-Yin three-operator splitting method
This paper addresses the monotone inclusion problem in a real Hilbert space, with the objective of identifying a point \( x \in \mathcal{H} \) such that \( 0 \in Ax + Bx + Cx \), where \( A \) and \(C\) are maximal monotone operators and \( B \) is a \(\beta\)-cocoercive operator. The Davis-Yin three-operator splitting method constitutes a widely used algorithm for solving this problem. This method consists of two subproblems: the first entails a backward step, whereas the second comprises a backward step followed by a forward step. A natural question arises: is it possible to construct an algorithm that incorporates the forward step into the first subproblem? This paper addresses this question by introducing a novel splitting algorithm that integrates the forward step into the first subproblem. The proposed algorithm generalizes the Douglas-Rachford splitting method, the reflected forward-backward splitting method, and the forward-reflected-backward splitting method. We establish the weak convergence of the proposed algorithm in a general real Hilbert space under suitable stepsize conditions.
☆ An Efficient ADMM Method for Ratio-Type Nonconvex and Nonsmooth Minimization in Sparse Recovery
Sparse signal recovery based on nonconvex and nonsmooth optimization problems has significant applications and demonstrates superior performance in signal processing and machine learning. This work deals with a scale-invariant $\ell_{1/2}/\ell_{2}$ sparse minimization with nonconvex, nonseparable, ratio-type regularization to enhance the accuracy and stability of sparse recovery. Within the framework of the null space property, we analyze the conditions for exact and stable recovery in constrained minimization problem. For the unconstrained regularized minimization problem, we develop an alternating direction method of multipliers (ADMM) based on a splitting strategy and rigorously analyze its global convergence and linear convergence rate under reasonable assumptions. Numerical experiments demonstrate that the proposed method consistently outperforms existing approaches across diverse noise levels and measurement settings. Furthermore, experiments on neural network sparsity and generalization performance demonstrate that the method effectively improves prediction accuracy.
☆ Spiking Neural Networks: a theoretical framework for Universal Approximation and training
Spiking Neural Networks (SNNs) are widely regarded as a biologically-inspired and energy-efficient alternative to classical artificial neural networks. Yet, their theoretical foundations remain only partially understood. In this work, we develop a rigorous mathematical analysis of a representative SNN architecture based on Leaky Integrate-and-Fire (LIF) neurons with threshold-reset dynamics. Our contributions are twofold. First, we establish a universal approximation theorem showing that SNNs can approximate continuous functions on compact domains to arbitrary accuracy. The proof relies on a constructive encoding of target values via spike timing and a careful interplay between idealized $\delta$-driven dynamics and smooth Gaussian-regularized models. Second, we analyze the quantitative behavior of spike times across layers, proving well-posedness of the hybrid dynamics and deriving conditions under which spike counts remain stable, decrease, or in exceptional cases increase due to resonance phenomena or overlapping inputs. Together, these results provide a principled foundation for understanding both the expressive power and the dynamical constraints of SNNs, offering theoretical guarantees for their use in classification and signal processing tasks.
☆ A Riemannian Accelerated Proximal Gradient Method
Riemannian accelerated gradient methods have been well studied for smooth optimization, typically treating geodesically convex and geodesically strongly convex cases separately. However, their extension to nonsmooth problems on manifolds with theoretical acceleration remains underexplored. To address this issue, we propose a unified Riemannian accelerated proximal gradient method for problems of the form $F(x) = f(x) + h(x)$ on manifolds, where $f$ can be either geodesically convex or geodesically strongly convex, and $h$ is $\rho$-retraction-convex, possibly nonsmooth. We rigorously establish accelerated convergence rate under reasonable conditions. Additionally, we introduce a safeguard mechanism to ensure global convergence in non-convex settings. Numerical results validate the theoretical acceleration of the proposed method.
☆ Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness
Despite neural ordinary differential equations (Neural ODEs) exhibiting intrinsic robustness under input perturbations due to their dynamical systems nature, recent approaches often involve imposing Lyapunov-based stability conditions to provide formal robustness guarantees. However, a fundamental challenge remains: the tension between robustness and accuracy, primarily stemming from the difficulty in imposing appropriate stability conditions. To address this, we propose an adaptive stable learning framework named Zubov-Net, which innovatively reformulates Zubov's equation into a consistency characterization between regions of attraction (RoAs) and prescribed RoAs (PRoAs). Building on this consistency, we introduce a new paradigm for actively controlling the geometry of RoAs by directly optimizing PRoAs to reconcile accuracy and robustness. Our approach is realized through tripartite losses (consistency, classification, and separation losses) and a parallel boundary sampling algorithm that co-optimizes the Neural ODE and the Lyapunov function. To enhance the discriminativity of Lyapunov functions, we design an input-attention-based convex neural network via a softmax attention mechanism that focuses on equilibrium-relevant features and also serves as weight normalization to maintain training stability in deep architectures. Theoretically, we prove that minimizing the tripartite loss guarantees consistent alignment of PRoAs-RoAs, trajectory stability, and non-overlapping PRoAs. Moreover, we establish stochastic convex separability with tighter probability bounds and fewer dimensionality requirements to justify the convex design in Lyapunov functions. Experimentally, Zubov-Net maintains high classification accuracy while significantly improving robustness against various stochastic noises and adversarial attacks.
☆ Gamma-Convergence of Convex Functions, Conjugates, and Subdifferentials
This work establishes dual and subdifferential characterizations of Unicode ({\Gamma})-convergence for sequences of proper convex lower semicontinuous functions in weakly compactly generated Banach spaces. It is shown that such a sequence Unicode ({\Gamma})-converges in the strong topology to a limit function if and only if the sequence of conjugates Unicode ({\Gamma})-converges in the $w^*$-topology to the Fenchel conjugate. It is further proved that both conditions are equivalent to the graphical convergence of the associated subdifferentials with respect to the strong-$w^*$ product topology. Counterexamples demonstrate that these equivalences break down outside the weakly compactly generated setting. The analysis develops new arguments in separable Banach spaces and extends them to the general framework through separable reduction techniques. Additionally, we introduce several new rich families of convex functions that exhibit various separable reduction properties.
☆ Sharpness-Aware Minimization Can Hallucinate Minimizers
Sharpness-Aware Minimization (SAM) is a widely used method that steers training toward flatter minimizers, which typically generalize better. In this work, however, we show that SAM can converge to hallucinated minimizers -- points that are not minimizers of the original objective. We theoretically prove the existence of such hallucinated minimizers and establish conditions for local convergence to them. We further provide empirical evidence demonstrating that SAM can indeed converge to these points in practice. Finally, we propose a simple yet effective remedy for avoiding hallucinated minimizers.
☆ Distributed Time-Varying Optimization via Unbiased Extremum Seeking
This paper proposes a novel distributed optimization framework that addresses time-varying optimization problems without requiring explicit derivative information of the objective functions. Traditional distributed methods often rely on derivative computations, limiting their applicability when only real-time objective function measurements are available. Leveraging unbiased extremum seeking, we develop continuous-time algorithms that utilize local measurements and neighbor-shared data to collaboratively track time-varying optima. Key advancements include compatibility with directed communication graphs, customizable convergence rates (asymptotic, exponential, or prescribed-time), and the ability to handle dynamically evolving objectives. By integrating chirpy probing signals with time-varying frequencies, our unified framework achieves accelerated convergence while maintaining stability under mild assumptions. Theoretical guarantees are established through Lie bracket averaging and Lyapunov-based analysis, with linear matrix inequality conditions ensuring rigorous convergence. Numerical simulations validate the effectiveness of the algorithms.
comment: 13pages, extended version
☆ Approximating value functions via corner Benders' cuts
We introduce a novel technique to generate Benders' cuts from a conic relaxation ("corner") derived from a basis of a higher-dimensional polyhedron that we aim to outer approximate in a lower-dimensional space. To generate facet-defining inequalities for the epigraph associated to this corner, we develop a computationally-efficient algorithm based on a compact reverse polar formulation and a row generation scheme that handles the redundant inequalities. Via a known connection between arc-flow and path-flow formulations, we show that our method can recover the linear programming bound of a Dantzig-Wolfe formulation using multiple cuts in the projected space. In computational experiments, our generic technique enhances the performance of a problem-specific state-of-the-art algorithm for the vehicle routing problem with stochastic demands, a well-studied variant of the classic capacitated vehicle routing problem that accounts for customer demand uncertainty.
☆ On Suboptimal Safety-Critical Tracking Controller Design
This paper proposes a novel framework for safety-critical optimal trajectory tracking in nonlinear systems based on the state-dependent Riccati equation (SDRE) methodology. By embedding barrier states into the system dynamics, the proposed strategy simultaneously ensures safety and tracking requirements, even in scenarios where these objectives may be inherently conflicting. A discounted pseudo-quadratic cost function is formulated to achieve a suboptimal trade-off between tracking accuracy, control effort, and safety objective. We present two distinct controller designs: one utilizing a single barrier state to enforce overall safety constraints, and another employing multiple barrier states to individually tuning the system's conservatism with respect to each safety constraint, providing enhanced flexibility in tuning the system's conservatism toward individual constraints. We establish sufficient conditions to ensure the solvability of the associated Riccati equations. The proposed safe controller is well-suited for real-time implementation in practical systems, given its reasonable computational requirements and compatibility with widely available embedded microprocessors. This is supported by simulation studies involving a mechanical system and a mobile robot collision avoidance scenario, where the safe SDRE controller consistently maintained safety while achieving trajectory tracking objectives in challenging conditions. Additionally, experimental results on a cable-driven parallel robot further demonstrate the practical applicability and effectiveness of the proposed method in real-world control tasks.
comment: 14 pages, 7 figures. This work has been submitted to Automatica for possible publication
☆ Decentralized Detection with Many Sensors: Optimality of Exchangeable and Identical Encoding Policies
We study a class of binary detection problems involving a single fusion center and a large or countably infinite number of sensors. Each sensor acts under a decentralized information structure, accessing only a local noisy observation related to the hypothesis. Based on this observation, sensors select policies to transmit a quantized signal through their actions to the fusion center, which makes the final decision using only these actions. This paper makes the following contributions: i) In the finitely many sensor setting, we provide a formal proof that an optimal encoding policy exists, and such an optimal policy is independent, deterministic, and of threshold type for the sensors and the maximum \emph{a posteriori} probability type for the fusion center; ii) For the finitely many sensor setting, we further show that an optimal encoding policy exhibits an exchangeability (permutation invariance) property; iii) We establish that an optimal encoding policy exists that is symmetric (identical) and independent across sensors in the infinitely many sensor setting under the error exponent cost; iv) Finally, we show that a symmetric optimal policy for the infinite population regime with the error exponent cost is approximately optimal for the large but finite sensor regime under the same cost criterion. We anticipate that the mathematical program used in the paper will find applications in several other massive communications applications.
♻ ☆ What is the Best Way to Do Something? A Discreet Tour of Discrete Optimization
In mathematical optimization, we want to find the best possible solution for a decision-making problem. Curiously, these problems are harder to solve if they have discrete decisions. Imagine that you would like to buy chocolate: you can buy no chocolate or one chocolate bar, but typically you cannot buy just half of a bar. Now imagine that you could also buy many other items, and that you need to meet nutritional needs while minimizing the grocery bill. With more options and more demands, finding the best solution becomes trickier. But since many real-world settings benefit from mathematical optimization, such as scheduling trains and flights, planning truck deliveries, and making better investment decisions, these problems are widely studied in a branch of mathematics called Operations Research (OR). Sometimes we can simply write the mathematical model and find an optimal solution with OR software, but for larger problems we may need to develop new mathematical models and even write our own algorithms. We explore both cases with a simple and well-known problem (the traveling salesperson problem), some computer programming (in Python), and software that is free for academic use (Gurobi).
♻ ☆ Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks. Yet traditional first-order optimizers such as Stochastic Gradient Descent (SGD) and Adam incur prohibitive memory and computational costs that scale poorly with model size. In this paper, we investigate zero-order (ZO) optimization methods as a memory- and compute-efficient alternative, particularly in the context of parameter-efficient fine-tuning techniques like LoRA. We propose $\texttt{JAGUAR SignSGD}$, a ZO momentum-based algorithm that extends ZO SignSGD, requiring the same number of parameters as the standard ZO SGD and only $\mathcal{O}(1)$ function evaluations per iteration. To the best of our knowledge, this is the first study to establish rigorous convergence guarantees for SignSGD in the stochastic ZO case. We further propose $\texttt{JAGUAR Muon}$, a novel ZO extension of the Muon optimizer that leverages the matrix structure of model parameters, and we provide its convergence rate under arbitrary stochastic noise. Through extensive experiments on challenging LLM fine-tuning benchmarks, we demonstrate that the proposed algorithms meet or exceed the convergence quality of standard first-order methods, achieving significant memory reduction. Our theoretical and empirical results establish new ZO optimization methods as a practical and theoretically grounded approach for resource-constrained LLM adaptation. Our code is available at https://github.com/brain-mmo-lab/ZO_LLM
comment: 26 pages, 5 tables
♻ ☆ Optimal Savings with Preference for Wealth
The consumption function maps current wealth and the exogenous state to current consumption. We prove the existence and uniqueness of a consumption function when the agent has a preference for wealth. When the period utility functions are restricted to power functions, we prove that the consumption function is asymptotically linear as wealth tends to infinity and provide a complete characterization of the asymptotic slopes. When the risk aversion with respect to wealth is less than that for consumption, the asymptotic slope is zero regardless of other model parameters, implying wealthy households save a large fraction of their income, consistent with empirical evidence.
♻ ☆ Vibrational Stabilization of Complex Network Systems
Many natural and man-made network systems need to maintain certain patterns, such as working at equilibria or limit cycles, to function properly. Thus, the ability to stabilize such patterns is crucial. Most of the existing studies on stabilization assume that network systems states can be measured online so that feedback control strategies can be used. However, in many real-world scenarios, systems states, e.g., neuronal activity in the brain, are often difficult to measure. In this paper, we take this situation into account and study the stabilization problem of linear network systems with an open-loop control strategy (vibrational control). We derive a graph-theoretic sufficient condition for structural vibrational stabilizability, under which network systems can always be stabilized. We further provide an approach to select the locations in the network for control placement and design corresponding vibrational inputs to stabilize systems that satisfy this condition. Finally, we provide some numerical results that demonstrate the validity of our theoretical findings.
♻ ☆ Convergence, Duality and Well-Posedness in Convex Bilevel Optimization
We consider the convex bilevel optimization problem, also known as simple bilevel programming. There are two challenges in solving convex bilevel optimization problems. Firstly, strong duality is not guaranteed due to the lack of Slater constraint qualification. Secondly, we demonstrate through an example that convergence of algorithms is not guaranteed even when usual subotimality gap bounds are present, due to the possibility of encountering super-optimal solutions. We show that strong duality (but not necessarily dual solvability) is exactly equivalent to ensuring correct asymptotic convergence of both inner and outer function values, and provide a simple condition that guarantees strong duality. Unfortunately, we also show that this simple condition is not sufficient to guarantee convergence to the optimal solution set. We draw connections to Levitin-Polyak well-posedness, and leverage this together with our strong duality equivalence to provide another condition that ensures convergence to the optimal solution set. We also discuss how our conditions have been implicitly present in existing algorithmic work.
♻ ☆ Data-driven Neural Networks for Windkessel Parameter Calibration
In this work, we propose a novel method for calibrating Windkessel (WK) parameters in a dimensionally reduced 1D-0D coupled blood flow model. To this end, we design a data-driven neural network (NN)trained on simulated blood pressures in the left brachial artery. Once trained, the NN emulates the pressure pulse waves across the entire simulated domain, i.e., over time, space and varying WK parameters, with negligible error and computational effort. To calibrate the WK parameters on a measured pulse wave, the NN is extended by dummy neurons and retrained only on these. The main objective of this work is to assess the effectiveness of the method in various scenarios -- particularly, when the exact measurement location is unknown or the data are affected by noise.
comment: 32 pages, 15 figures, for associated git see https://github.com/bhoock/WKcalNN, submitted to International Journal for Numerical Methods in Biomedical Engineering
♻ ☆ On the control of recurrent neural networks using constant inputs
This paper investigates the controllability of a broad class of recurrent neural networks widely used in theoretical neuroscience, including models of large-scale human brain dynamics. Motivated by emerging applications in non-invasive neurostimulation such as transcranial direct current stimulation (tDCS), we study the control synthesis of these networks using constant and piecewise constant inputs. The neural model considered is a continuous-time Hopfield-type system with nonlinear activation functions and arbitrary input matrices representing inter-regional brain interactions. Our main contribution is the formulation and solution of a control synthesis problem for such nonlinear systems using specific solution representations. These representations yield explicit algebraic conditions for synthesizing constant and piecewise constant controls that solve a two-point boundary value problem in state space up to higher-order corrections with respect to the time horizon. In particular, the input is constructed to satisfy a tractable small-time algebraic relation involving the Jacobian of the nonlinear drift, ensuring that the synthesis reduces to verifying conditions on the system matrices. For canonical input matrices that directly actuate $k$ nodes, this implies that the reachable set (with constant inputs) of a given initial state is an affine subspace whose dimension equals the input rank and whose basis can be computed efficiently using a thin QR factorization. Numerical simulations illustrate the theoretical results and demonstrate the effectiveness of the proposed synthesis in guiding the design of brain stimulation protocols for therapeutic and cognitive applications.
♻ ☆ Constrained Search in Imaginary Time
We introduce an imaginary-time evolution method to evaluate the pure-state constrained-search functional from density-functional theory formulated on finite lattices. Simultaneously, it yields a potential that produces a prescribed density of an eigenstate. Besides being a computational scheme, this allows one to gain theoretical insights into the density-potential mapping. The method can be generalized to the optimization of the expectation value of a general self-adjoint operator on a finite-dimensional Hilbert space under a finite number of expectation-value constraints for commuting self-adjoint operators.
♻ ☆ Markov Perfect Equilibria in Discrete Finite-Player and Mean-Field Games
We study dynamic finite-player and mean-field stochastic games within the framework of Markov perfect equilibria (MPE). Our focus is on discrete time and space structures without monotonicity. Unlike their continuous-time analogues, discrete-time finite-player games generally do not admit unique MPE. However, we show that uniqueness is remarkably recovered when the time steps are sufficiently small, and we provide examples demonstrating the necessity of this assumption. This result, established without relying on any monotonicity conditions, underscores the importance of inertia in dynamic games. In both the finite-player and mean-field settings, we show that MPE correspond to solutions of the Nash-Lasry-Lions equation, which is known as the master equation in the mean-field case. We exploit this connection to establish the convergence of discrete-time finite-player games to their mean-field counterpart in short time. Finally, we prove the convergence of finite-player games to their continuous-time version on every time horizon.
♻ ☆ Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy
Recent advancements in machine learning have improved performance while also increasing computational demands. While federated and distributed setups address these issues, their structure is vulnerable to malicious influences. In this paper, we address a specific threat, Byzantine attacks, where compromised clients inject adversarial updates to derail global convergence. We combine the trust scores concept with trial function methodology to dynamically filter outliers. Our methods address the critical limitations of previous approaches, allowing functionality even when Byzantine nodes are in the majority. Moreover, our algorithms adapt to widely used scaled methods like Adam and RMSProp, as well as practical scenarios, including local training and partial participation. We validate the robustness of our methods by conducting extensive experiments on both synthetic and real ECG data collected from medical institutions. Furthermore, we provide a broad theoretical analysis of our algorithms and their extensions to aforementioned practical setups. The convergence guarantees of our methods are comparable to those of classical algorithms developed without Byzantine interference.
♻ ☆ Neural Operators for Adaptive Control of Freeway Traffic
Uncertainty and delayed reactions in human driving behavior lead to stop-and-go traffic congestion on freeways. The freeway traffic dynamics are governed by the Aw-Rascle-Zhang (ARZ) traffic Partial Differential Equation (PDE) models with unknown relaxation time. Motivated by the adaptive traffic control problem, this paper presents a neural operator (NO) based adaptive boundary control design for the coupled 2$\times$2 hyperbolic systems with uncertain spatially varying in-domain coefficients and boundary parameter. In traditional adaptive control for PDEs, solving backstepping kernel online is computationally intensive, as it requires significant resources at each time step to update the estimation of coefficients. To address this challenge, we use operator learning, i.e. DeepONet, to learn the mapping from system parameters to the kernels functions. DeepONet, a class of deep neural networks designed for approximating operators, has shown strong potential for approximating PDE backstepping designs in recent studies. Unlike previous works that focus on approximating single kernel equation associated with the scalar PDE system, we extend this framework to approximate PDE kernels for a class of the first-order coupled 2$\times$2 hyperbolic kernel equations. Our approach demonstrates that DeepONet is nearly two orders of magnitude faster than traditional PDE solvers for generating kernel functions, while maintaining a loss on the order of $10^{-3}$.
♻ ☆ WeightLoRA: Keep Only Necessary Adapters
The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation ($\texttt{LoRA}$), which adds trainable adapters to selected layers. Although $\texttt{LoRA}$ may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, $\texttt{WeightLoRA}$, which overcomes this issue by adaptive selection of the most critical $\texttt{LoRA}$ heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches. The experimental results demonstrate the efficacy of $\texttt{WeightLoRA}$ and the superior performance of $\texttt{WeightLoRA+}$ in almost all cases.
comment: 13 pages, 9 tables
♻ ☆ Sign-SGD is the Golden Gate between Multi-Node to Single-Node Learning: Significant Boost via Parameter-Free Optimization
Quite recently, large language models have made a significant breakthrough across various disciplines. However, training them is an extremely resource-intensive task, even for major players with vast computing resources. One of the methods gaining popularity in light of these challenges is Sign-SGD. This method can be applied both as a memory-efficient approach in single-node training and as a gradient compression technique in the distributed learning. Nevertheless, it is impossible to automatically determine the effective stepsize from the theoretical standpoint. Indeed, it depends on the parameters of the dataset to which we do not have access in the real-world learning paradigm. To address this issue, we design several variants of single-node deterministic Sign-SGD. We extend our approaches to practical scenarios: stochastic single-node and multi-node learning, methods with incorporated momentum. We conduct extensive experiments on real machine learning problems that emphasize the practical applicability of our ideas.
comment: 58 pages, 5 figures, 5 tables
♻ ☆ Online Multi-Agent Control with Adversarial Disturbances
Online multi-agent control problems, where many agents pursue competing and time-varying objectives, are widespread in domains such as autonomous robotics, economics, and energy systems. In these settings, robustness to adversarial disturbances is critical. In this paper, we study online control in multi-agent linear dynamical systems subject to such disturbances. In contrast to most prior work in multi-agent control, which typically assumes noiseless or stochastically perturbed dynamics, we consider an online setting where disturbances can be adversarial, and where each agent seeks to minimize its own sequence of convex losses. Under two feedback models, we analyze online gradient-based controllers with local policy updates. We prove per-agent regret bounds that are sublinear and near-optimal in the time horizon and that highlight different scalings with the number of agents. When agents' objectives are aligned, we further show that the multi-agent control problem induces a time-varying potential game for which we derive equilibrium tracking guarantees. Together, our results take a first step in bridging online control with online learning in games, establishing robust individual and collective performance guarantees in dynamic continuous-state environments.
♻ ☆ Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm provably possesses convergence properties under suitable assumptions and is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training problems.
♻ ☆ Analysis of the SQP Method for Hyperbolic PDE-Constrained Optimization in Acoustic Full Waveform Inversion
In this paper, the SQP method applied to a hyperbolic PDE-constrained optimization problem is considered. The model arises from the acoustic full waveform inversion in the time domain. The analysis is mainly challenging due to the involved hyperbolicity and second-order bilinear structure. This notorious character leads to an undesired effect of loss of regularity in the SQP method, calling for a substantial extension of developed parabolic techniques. We propose and analyze a novel strategy for the well-posedness and convergence analysis based on the use of a smooth-in-time initial condition, a tailored self-mapping operator, and a two-step estimation process along with Stampacchia's method for second-order wave equations. Our final theoretical result is the R-superlinear convergence of the SQP method.
♻ ☆ Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information
Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct mapping from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency.
♻ ☆ Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
This paper studies decentralized optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol \xi}_i)\right]$ which is $(L_0,L_1)$-smooth but possibly nonconvex and the random variable ${\boldsymbol \xi}_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve an $\epsilon$-stationary point at each local agent. We present a new framework for analyzing decentralized first-order methods in the relaxed smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. We show the upper bounds on the sample complexity of ${\mathcal O}(m^{-1}(L_f\sigma^2\Delta_f\epsilon^{-4} + \sigma^2\epsilon^{-2} + L_f^{-2}L_1^3\sigma^2\Delta_f\epsilon^{-1} + L_f^{-2}L_1^2\sigma^2))$ per agent and the communication complexity of $\tilde{\mathcal O}((L_f\epsilon^{-2} + L_1\epsilon^{-1})\gamma^{-1/2}\Delta_f)$, where $L_f=L_0 +L_1\zeta$, $\sigma^2$ is the variance of the stochastic gradient, $\Delta_f$ is the initial optimal function value gap, $\gamma$ is the spectral gap of the network, and $\zeta$ is the degree of the gradient dissimilarity. In the special case of $L_1=0$, the above results (nearly) match the lower bounds of decentralized stochastic nonconvex optimization under the standard smoothness. We also conduct numerical experiments to show the empirical superiority of our method.
♻ ☆ Online Resource Allocation with Average Budget Constraints
We consider the problem of online resource allocation with average budget constraints. At each time point the decision maker makes an irrevocable decision of whether to accept or reject a request before the next request arrives with the goal to maximize the cumulative rewards. In contrast to existing literature requiring the total resource consumption is below a certain level, we require the average resource consumption per accepted request does not exceed a given threshold. This problem can be casted as an online knapsack problem with exogenous random budget replenishment, and can find applications in various fields such as online anomaly detection, sequential advertising, and per-capita public service providers. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such a regret growing rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over accept arrivals, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data of New York City taxi passengers to validate the performance of our proposed policies.
Systems and Control 29
☆ NEO-Grid: A Neural Approximation Framework for Optimization and Control in Distribution Grids
The rise of distributed energy resources (DERs) is reshaping modern distribution grids, introducing new challenges in attaining voltage stability under dynamic and decentralized operating conditions. This paper presents NEO-Grid, a unified learning-based framework for volt-var optimization (VVO) and volt-var control (VVC) that leverages neural network surrogates for power flow and deep equilibrium models (DEQs) for closed-loop control. Our method replaces traditional linear approximations with piecewise-linear ReLU networks trained to capture the nonlinear relationship between power injections and voltage magnitudes. For control, we model the recursive interaction between voltage and inverter response using DEQs, allowing direct fixed-point computation and efficient training via implicit differentiation. We evaluated NEO-Grid on the IEEE 33-bus system, demonstrating that it significantly improves voltage regulation performance compared to standard linear and heuristic baselines in both optimization and control settings. Our results establish NEO-Grid as a scalable, accurate, and interpretable solution for learning-based voltage regulation in distribution grids.
☆ Mitigation of Active Power Oscillation in Multi-VSG Grids: An Impedance-Based Perspective
Active power oscillations frequently arise in inverter-dominated power systems with multiple converters operating under Virtual Synchronous Generator control, posing risks to system stability and protection coordination. While various mitigation strategies have been proposed, many rely on prior knowledge of system parameters, offer limited damping performance, or involve complex models that lack physical interpretability, making them difficult to apply in practice. To address these challenges, this paper first introduces a physically intuitive RLC equivalent circuit model to explain the root causes of APOs in both stand-alone and grid-connected modes. By mapping inertia, damping, and feeder impedance to capacitive, resistive, and inductive elements, respectively, the model reveals how mismatches among converters lead to inter-unit oscillations characterized by LC resonance. Building on this insight, we propose two mode-specific mitigation strategies: in SA mode, a graph theory based impedance control ensures proportional reactive power sharing and effectively suppresses APOs; and in GC mode, adaptive inertia and damping control with feedforward filtering is designed to reshape transient power dynamics while preserving frequency stability. The proposed methods are validated through extensive simulations and real-time hardware-in-the-loop experiments, demonstrating their effectiveness in suppressing oscillations and enhancing the robustness of multi-converter power systems.
☆ A comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs)
This paper presents a new and comprehensive equivalent circuit model for high overtone bulk acoustic resonators (HBARs). HBARs demonstrate several very sharp resonance modes distributed nearly periodically over a very wide frequency range. This spectrum response of HBARs offers unique advantages but poses significant modeling challenges. The proposed circuit incorporates and models the unique physical components of the HBAR: piezoelectric transducer, substrate (a perfectly periodic multimode cavity), piezoelectric coupling, and critically, the imperfectly matched transducer-substrate interface which imparts characteristic aperiodicity to the HBAR mode spectrum. By judicious use of fixed, periodic, or tightly constrained virtual lumped-element branches, and sets of branches, the model retains clear and intuitive links to the physical device, while reducing the complexity needed for fitting dense, broadband datasets. We demonstrate the validity and power of this model by simultaneously fitting measured data for 61 modes of a GaN/NbN/sapphire HBAR over a span of 1 GHz, and extracting modal parameters such as quality factors and coupling coefficients. We show that this new model is compact and yet scalable: by leveraging the inherent internal relationships in an HBAR, the model can be easily expanded to include multiple transducer overtones and envelopes, multiple distinct transducers, and spurious modes. In addition to fitting measured datasets, the new model can also be used to easily analyze various perturbations to the nominal state of the HBAR. We expect the new model to be useful for the design of classical HBAR-based oscillators, filters, and sensors, and for the integration of HBARs into quantum circuits.
comment: 20 pages, 9 figures
☆ Rebuild AC Power Flow Models with Graph Attention Networks
A full power flow (PF) model is a complete representation of the physical power network. Traditional model-based methods rely on the full PF model to implement power flow analysis. In practice, however, some PF model parameters can be inaccurate or even unavailable due to the uncertainties or dynamics in the power systems. Moreover, because the power network keeps evolving with possibly changing topology, the generalizability of a PF model to different network sizes and typologies should be considered. In this paper, we propose a PF rebuild model based on graph attention networks (GAT) by constructing a new graph based on the real and imaginary parts of voltage at each bus. By comparing with two state-of-the-art PF rebuild models for different standard IEEE power system cases and their modified topology variants, we demonstrate the feasibility of our method. Experimental results show that our proposed model achieves better accuracy for a changing network and can generalize to different networks with less accuracy discount.
☆ An enhanced statistical feature fusion approach using an improved distance evaluation algorithm and weighted K-nearest neighbor for bearing fault diagnosis
Bearings are among the most failure-prone components in rotating machinery, and their condition directly impacts overall performance. Therefore, accurately diagnosing bearing faults is essential for ensuring system stability. However, detecting such malfunctions in noisy environments, where data is collected from multiple sensors, necessitates the extraction and selection of informative features. This paper proposes an improved distance evaluation algorithm combined with a weighted K-nearest neighbor (KNN) classifier for bearing fault diagnosis. The process begins with extracting and integrating statistical features of vibration across the time, frequency, and time-frequency domains. Next, the improved distance evaluation algorithm assigns weights to the extracted features, retaining only the most informative ones by eliminating insensitive features. Finally, the selected features are used to train the weighted KNN classifier. To validate the proposed method, we employ bearing data from the University of Ottawa. The results demonstrate the effectiveness of our approach in accurately identifying bearing faults.
☆ Next-Generation Aerial Robots -- Omniorientational Strategies: Dynamic Modeling, Control, and Comparative Analysis
Conventional multi-rotors are under-actuated systems, hindering them from independently controlling attitude from position. In this study, we present several distinct configurations that incorporate additional control inputs for manipulating the angles of the propeller axes. This addresses the mentioned limitations, making the systems "omniorientational". We comprehensively derived detailed dynamic models for all introduced configurations and validated by a methodology using Simscape Multibody simulations. Two controllers are designed: a sliding mode controller for robust handling of disturbances and a novel PID-based controller with gravity compensation integrating linear and non-linear allocators, designed for computational efficiency. A custom control allocation strategy is implemented to manage the input-non-affine nature of these systems, seeking to maximize battery life by minimizing the "Power Consumption Factor" defined in this study. Moreover, the controllers effectively managed harsh disturbances and uncertainties. Simulations compare and analyze the proposed configurations and controllers, majorly considering their power consumption. Furthermore, we conduct a qualitative comparison to evaluate the impact of different types of uncertainties on the control system, highlighting areas for potential model or hardware improvements. The analysis in this study provides a roadmap for future researchers to design omniorientational drones based on their design objectives, offering practical insights into configuration selection and controller design. This research aligns with the project SAC-1, one of the objectives of Sharif AgRoLab.
☆ Continuous-Time System Identification and OCV Reconstruction of Li-ion Batteries via Regularized Least Squares
Accurate identification of lithium-ion (Li-ion) battery parameters is essential for managing and predicting battery behavior. However, existing discrete-time methods hinder the estimation of physical parameters and face the fast-slow dynamics problem presented in the battery. In this paper, we developed a continuous-time approach that enables the estimation of battery parameters directly from sampled data. This method avoids discretization errors in converting continuous-time models into discrete-time ones, achieving more accurate identification. In addition, we jointly identify the open-circuit voltage (OCV) and the state of charge (SOC) relation of the battery without utilizing offline OCV tests. By modeling the OCV-SOC curve as a cubic B-spline, we achieve a high-fidelity representation of the OCV curve, facilitating its estimation. Through solving a rank and L1 regularized least squares problem, we jointly identify battery parameters and the OCV-SOC relation from the battery's dynamic data. Simulated and real-life data demonstrate the effectiveness of the developed method.
☆ Direct Continuous-Time LPV System Identification of Li-ion Batteries via L1-Regularized Least Squares
Accurate identification of lithium-ion battery parameters is essential for estimating battery states and managing performance. However, the variation of battery parameters over the state of charge (SOC) and the nonlinear dependence of the open-circuit voltage (OCV) on the SOC complicate the identification process. In this work, we develop a continuous-time LPV system identification approach to identify the SOC-dependent battery parameters and the OCV-SOC mapping. We model parameter variations using cubic B-splines to capture the piecewise nonlinearity of the variations and estimate signal derivatives via state variable filters, facilitating CT-LPV identification. Battery parameters and the OCV-SOC mapping are jointly identified by solving L1-regularized least squares problems. Numerical experiments on a simulated battery and real-life data demonstrate the effectiveness of the developed method in battery identification, presenting improved performance compared to conventional RLS-based methods.
☆ The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems
Recently, the outstanding performance reached by neural networks in many tasks has led to their deployment in autonomous systems, such as robots and vehicles. However, neural networks are not yet trustworthy, being prone to different types of misbehavior, such as anomalous samples, distribution shifts, adversarial attacks, and other threats. Furthermore, frameworks for accelerating the inference of neural networks typically run on rich operating systems that are less predictable in terms of timing behavior and present larger surfaces for cyber-attacks. To address these issues, this paper presents a software architecture for enhancing safety, security, and predictability levels of learning-based autonomous systems. It leverages two isolated execution domains, one dedicated to the execution of neural networks under a rich operating system, which is deemed not trustworthy, and one responsible for running safety-critical functions, possibly under a different operating system capable of handling real-time constraints. Both domains are hosted on the same computing platform and isolated through a type-1 real-time hypervisor enabling fast and predictable inter-domain communication to exchange real-time data. The two domains cooperate to provide a fail-safe mechanism based on a safety monitor, which oversees the state of the system and switches to a simpler but safer backup module, hosted in the safety-critical domain, whenever its behavior is considered untrustworthy. The effectiveness of the proposed architecture is illustrated by a set of experiments performed on two control systems: a Furuta pendulum and a rover. The results confirm the utility of the fall-back mechanism in preventing faults due to the learning component.
☆ Quaternionic Pole Placement via Companion Forms and the Ackermann Formula
We present an extension of state-feedback pole placement for quaternionic systems, based on companion forms and the Ackermann formula. For controllable single-input quaternionic LTI models, we define a companion polynomial that right-annihilates its companion matrix, characterize spectra via right-eigenvalue similarity classes, and prove coefficient-matching design in controllable coordinates. We then derive a coordinate-free Ackermann gain expression valid for real target polynomials, and state its scope and limitations. Short examples demonstrate correctness, practical use, and numerical simplicity.
comment: 8 pages. Preprint (submitted version). Submitted to IEEE Transactions on Automatic Control (Technical Note) on 22 Sep 2025. Funding: co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22_008/0004590). Keywords: quaternions; pole placement; companion form; Ackermann formula; right eigenvalues; quaternionic systems
☆ On the convergence of a numerical scheme for a boundary controlled 1D linear parabolic PIDE
We consider an 1D partial integro-differential equation (PIDE) comprising of an 1D parabolic partial differential equation (PDE) and a nonlocal integral term. The control input is applied on one of the boundaries of the PIDE. Partitioning the spatial interval into $n+1$ subintervals and approximating the spatial derivatives and the integral term with their finite-difference approximations and Riemann sum, respectively, we derive an $n^{\rm th}$-order semi-discrete approximation of the PIDE. The $n^{\rm th}$-order semi-discrete approximation of the PIDE is an $n^{\rm th}$-order ordinary differential equation (ODE) in time. We establish some of its salient properties and using them prove that the solution of the semi-discrete approximation converges to the solution of the PIDE as $n\to\infty$. We illustrate our convergence results using numerical examples. The results in this work are useful for establishing the null controllability of the PIDE considered.
comment: 6 pages, 3 figures
☆ Dual-Band Flexible Endfire Filtering Antenna With Conformal Capability for Emergency Communication Applications
In this letter, a single-layer dual-band flexible conformal filtering endfire antenna is presented. The proposed antenna is based on two co-designed folded dipoles (FDs) working at two frequencies, where the lower-frequency FD acts as a reflector for the higher-frequency one. Then, by devising an additional reflector for lower-frequency FD, dual-band endfire radiation is realized. Parasitic strips are deliberately introduced around the FDs to generate electric coupling and magnetic coupling in the two operating bands, resulting in significant filtering performance with four radiation nulls. With flexible structure and single-layer configuration, the antenna design exhibits flexible conformability with cylindrical surfaces of diverse diameters, thereby enabling seamless integration into scalable emergency communication systems. To verify our design concept, an antenna prototype is fabricated and measured. The measured working frequency ranges from 1.37 to 1.45 GHz and 1.89 to 2.07 GHz. Out-of-band radiation suppression more than 11 dB is achieved under different bending radii. The proposed design offers several advantages including dual-band endfire filtering radiation, flexible conformability and low-profile.
☆ Revealing Chaotic Dependence and Degree-Structure Mechanisms in Optimal Pinning Control of Complex Networks
Identifying an optimal set of driver nodes to achieve synchronization via pinning control is a fundamental challenge in complex network science, limited by computational intractability and the lack of general theory. Here, leveraging a degree-based mean-field (annealed) approximation from statistical physics, we analytically reveal how the structural degree distribution systematically governs synchronization performance, and derive an analytic characterization of the globally optimal pinning set and constructive algorithms with linear complexity (dominated by degree sorting, O(N+M). The optimal configuration exhibits a chaotic dependence--a discontinuous sensitivity--on its cardinality, whereby adding a single node can trigger abrupt changes in node composition and control effectiveness. This structural transition fundamentally challenges traditional heuristics that assume monotonic performance gains with budget. Systematic experiments on synthetic and empirical networks confirm that the proposed approach consistently outperforms degree-, betweenness-, and other centrality-based baselines. Furthermore, we quantify how key degree-distribution features--low-degree saturation, high-degree cutoff, and the power-law exponent--govern achievable synchronizability and shape the form of optimal sets. These results offer a systematic understanding of how degree heterogeneity shapes the network controllability. Our work establishes a unified link between degree heterogeneity and spectral controllability, offering both mechanistic insights and practical design rules for optimal driver-node selection in diverse complex systems.
comment: 16 pages, 6 figures; primary: eess.SY; cross-lists: cs.SY, math.OC. Submitted to IEEE TAC
☆ Automated Algorithm Design via Nevanlinna-Pick Interpolation NeurIPS 2025
The synthesis of optimization algorithms typically follows a design-first-analyze-later approach, which often obscures fundamental performance limitations and hinders the systematic design of algorithms operating at the achievable theoretical boundaries. Recently, a framework based on frequency-domain techniques from robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating the design and analysis stages, this framework enables the identification of fundamental performance limits. In this paper, we build on this framework and extend it to address algorithms for solving strongly convex problems with equality constraints. As a result, we obtain a new class of algorithms that offers sharp trade-off between number of matrix multiplication per iteration and convergence rate.
comment: Accepted to DynaFront Workshop at NeurIPS 2025
☆ Automated algorithm design for convex optimization problems with linear equality constraints
Synthesis of optimization algorithms typically follows a {\em design-then-analyze\/} approach, which can obscure fundamental performance limits and hinder the systematic development of algorithms that operate near these limits. Recently, a framework grounded in robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating design and analysis stages, fundamental performance bounds are revealed and synthesis of algorithms that achieve them is enabled. In this paper, we apply this framework to design algorithms for solving strongly convex optimization problems with linear equality constraints. Our approach yields a single-loop, gradient-based algorithm whose convergence rate is independent of the condition number of the constraint matrix. This improves upon the best known rate within the same algorithm class, which depends on the product of the condition numbers of the objective function and the constraint matrix.
comment: Accepted to 64th IEEE Conference on Decision Control (CDC), 2025
☆ Frequency Domain Stability Conditions for Hybrid AC/DC Systems
In this article, we investigate small-signal frequency and DC voltage stability of hybrid AC/DC power systems that combine AC and DC transmission, conventional machine- based generation, and converter-interfaced generation. The main contributions of this work are a compact frequency domain representation of hybrid AC/DC systems and associated stability conditions that can be divided into conditions on the individual bus dynamics and conditions on each DC network. The bus- level conditions apply to a wide range of technologies (e.g., synchronous generators, synchronous condensers, grid-forming renewables and energy storage). Moreover, the system-level conditions establish that hybrid AC/DC systems combining a wide range of devices are stable independently of the network topology provided that the frequency response of converters on each DC network is sufficiently coherent relative to the network coupling strength. Additionally, we develop and validate a novel reduced- order damper winding model for multi-machine systems.
comment: presented at IREP 2025, published in Sustainable Energy, Grids and Networks
♻ ☆ Linear Phase Balancing Scheme using Voltage Unbalance Sensitivities in Multi-phase Power Distribution Grids
Power distribution networks, especially in North America, are often unbalanced due to the mix of single-, two- and three-phase networks as well as due to the high penetration of single-phase devices at the distribution level such as electric vehicle (EV) chargers and single-phase solar plants. However, the network operator must adhere to the voltage unbalance levels within the limits specified by IEEE, IEC, and NEMA standards for the safety of the equipment as well as the efficiency of the network operation. Existing works have proposed active and reactive power control in the network to minimize imbalances. However, these optimization problems are highly nonlinear and nonconvex due to the inherent non-linearity of unbalanced metrics and power-flow equations. In this work, we propose a linearization approach of unbalance metrics such as voltage unbalance factors (VUF), phase voltage unbalance rate (PVUR), and line voltage unbalance rate (LVUR) using the first order Taylor's approximation. This linearization is then applied to the phase balancing control scheme; it is formulated as a feedback approach where the linearization is updated successively after the active/reactive control setpoint has been actuated and shows improvement in voltage imbalances. We demonstrate the application of the proposed scheme on a standard IEEE benchmark test case, demonstrating its effectiveness.
comment: To appear at 64th IEEE Conference on Decision and Control
♻ ☆ On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $\ell_2$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 26 pages, 3 figures
♻ ☆ GNN-DT: Graph Neural Network Enhanced Decision Transformer for Efficient Optimization in Dynamic Environments
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT tackles the sparse rewards limitations of online RL algorithms and delivers high-quality solutions in real-time. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT and offline RL baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior offline and online RL approaches.
♻ ☆ Real-time Hybrid System Identification with Online Deterministic Annealing
We introduce a real-time identification method for discrete-time state-dependent switching systems in both the input--output and state-space domains. In particular, we design a system of adaptive algorithms running in two timescales; a stochastic approximation algorithm implements an online deterministic annealing scheme at a slow timescale and estimates the mode-switching signal, and an recursive identification algorithm runs at a faster timescale and updates the parameters of the local models based on the estimate of the switching signal. We first focus on piece-wise affine systems and discuss identifiability conditions and convergence properties based on the theory of two-timescale stochastic approximation. In contrast to standard identification algorithms for switched systems, the proposed approach gradually estimates the number of modes and is appropriate for real-time system identification using sequential data acquisition. The progressive nature of the algorithm improves computational efficiency and provides real-time control over the performance-complexity trade-off. Finally, we address specific challenges that arise in the application of the proposed methodology in identification of more general switching systems. Simulation results validate the efficacy of the proposed methodology.
♻ ☆ Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model
Gas turbine engines are complex and highly nonlinear dynamical systems. Deriving their physics-based models can be challenging because it requires performance characteristics that are not always available, often leading to many simplifying assumptions. This paper discusses the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models, and addresses these issues by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics are estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics is mapped into an optimally constructed Koopman eigenfunction space. This process involves eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model is validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator are then designed within the eigenfunction space and compared to traditional and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure enables targeting individual modes during optimization, leading to improved performance tuning. Results demonstrate that the Koopman-based controller surpasses other benchmark controllers in both reference tracking and disturbance rejection under sea-level and varying flight conditions, due to its global nature.
comment: 34 pages, 28 figures Under review at Springer Nonlinear Dynamics
♻ ☆ Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
♻ ☆ Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of Things
With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distributed sensor networks a pressing challenge. To address the decline in malicious information detection efficiency as network scale expands, this paper investigates a robust set partitioning strategy and, on this basis, develops a distributed attack detection framework with theoretical guarantees. Specifically, we introduce a gain mutual influence metric to characterize the inter-subset interference arising during gain updates, thereby revealing the fundamental reason for the performance gap between distributed and centralized algorithms. Building on this insight, the set partitioning strategy based on Grassmann distance is proposed, which significantly reduces the computational cost of gain updates while maintaining detection performance, and ensures that the distributed setting under subset partitioning preserves the same theoretical performance bound as the baseline algorithm. Unlike conventional clustering methods, the proposed set partitioning strategy leverages the intrinsic observational features of sensors for robust partitioning, thereby enhancing resilience to noise and interference. Simulation results demonstrate that the proposed method limits the performance gap between distributed and centralized detection to no more than 1.648$\%$, while the computational cost decreases at an order of $O(1/m)$ with the number of subsets $m$. Therefore, the proposed algorithm effectively reduces computational overhead while preserving detection accuracy, offering a practical low-cost and highly reliable security detection solution for edge nodes in large-scale IoT systems.
comment: 24 pages, 5 figures
Discovery Learning accelerates battery design evaluation
Fast and reliable validation of novel designs in complex physical systems such as batteries is critical to accelerating technological innovation. However, battery research and development remain bottlenecked by the prohibitively high time and energy costs required to evaluate numerous new design candidates, particularly in battery prototyping and life testing. Despite recent progress in data-driven battery lifetime prediction, existing methods require labeled data of target designs to improve accuracy and cannot make reliable predictions until after prototyping, thus falling far short of the efficiency needed to enable rapid feedback for battery design. Here, we introduce Discovery Learning (DL), a scientific machine-learning paradigm that integrates active learning, physics-guided learning, and zero-shot learning into a human-like reasoning loop, drawing inspiration from learning theories in educational psychology. DL can learn from historical battery designs and actively reduce the need for prototyping, thus enabling rapid lifetime evaluation for unobserved material-design combinations without requiring additional data labeling. To test DL, we present 123 industrial-grade large-format lithium-ion pouch cells, spanning eight material-design combinations and diverse cycling protocols. Trained solely on public datasets of small-capacity cylindrical cells, DL achieves 7.2% test error in predicting the average cycle life under unknown device variability. This results in savings of 98% in time and 95% in energy compared to industrial practices. This work highlights the potential of uncovering insights from historical designs to inform and accelerate the development of next-generation battery technologies. DL represents a key advance toward efficient data-driven modeling and helps realize the promise of machine learning for accelerating scientific discovery and engineering innovation.
comment: Main text, 20 pages, 5 figures
♻ ☆ Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to ensure safety.The CMPC then constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, further preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
♻ ☆ On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances. First, we consider the scenario where the exogenous variable can be estimated in a controlled environment, and subsequently, consider a more practical and challenging scenario where it is unknown in a stochastic setting. Our approach builds on the fundamental learning-based framework of adaptive dynamic programming (ADP), combined with a Lyapunov-based analytical methodology to design the algorithms and derive sample-based approximations motivated from the Markov decision process (MDP)-based approaches. For the scenario involving non-measurable disturbances, we further establish stability and convergence guarantees for the learned control gains under sample-based approximations. The overall methodology emphasizes simplicity while providing rigorous guarantees. Finally, numerical experiments focus on the intricacies and validations for the design of offline continuous-time LQR with exogenous disturbances.
comment: 17 pages, 3 figures
♻ ☆ Application of Battery Storage to Switching Predictive Control of Power Distribution Systems Including Road Heating
In regions with heavy snowfall, the living environment is becoming a serious problem due to heavy snow accumulation. A road heating is an electrical device which promotes snow melting by burying a heating cable as a thermal source underground in such regions. When integrating the road heating into power distribution systems, we need to optimize the flow of electric power by appropriately integrating distributed power sources and conventional power distribution equipment. In this paper, we introduce a battery storage to the power distribution system including road heating, and extend the predictive switching control of the systems due to the authors' previous study to the case where battery storage is installed. As a main result, we propose a predictive switching control that utilizes photovoltaic (PV) power generation and surplus power stored in the battery storage effectively, and achieves the reduction of distribution loss, attenuation of voltage fluctuation, and efficient snow melting, simultaneously. We verify the effectiveness of the application of battery storage through numerical simulation using actual time series data of weather conditions and active power of the PV power generation and load.
comment: 14 pages, 14 figures
♻ ☆ Contact feedback helps snake robots propel against uneven terrain using vertical bending
Snakes can bend their elongate bodies in various forms to traverse various environments. We understand how snakes use lateral bending to push against asperities on flat ground for propulsion, and snake robots can do so effectively. However, snakes can also use vertical bending to push against terrain of large height variation for propulsion, and they can adjust it to adapt to novel terrain presumably using mechano-sensing feedback control. Although some snake robots can traverse uneven terrain, few have used vertical bending for propulsion, and how to control it in novel environments is poorly understood. Here we systematically studied a snake robot with force sensors pushing against large bumps using vertical bending to understand the role of sensory feedback control. We compared a feedforward controller and four feedback controllers that use different senses and generate distinct bending patterns and body-terrain interaction. We challenged the robot with increasing backward load and novel terrain geometry that break its contact with the terrain. We further varied how much the feedback control modulated bending to conform to or push against the terrain to test their effects. Feedforward propagation of vertical bending generated large propulsion when the shape matched terrain geometry. However, when perturbations caused loss of contact, the robot easily lost propulsion or had motor overload. Contact feedback control resolved these by improving contact. Yet excessive conformation interrupted propagation and excessive pushing stalled motors. Unlike that using lateral bending, for propulsion generation using vertical bending, body weight can help maintain contact with the environment but may also overload motors. Our results will help snake robots better traverse terrain with large height variation and can inform how snakes use sensory feedback to control vertical bending for propulsion.
comment: 62 pages, 20 figures
♻ ☆ The Asymptotic Behavior of Attention in Transformers
The transformer architecture has become the foundation of modern Large Language Models (LLMs), yet its theoretical properties are still not well understood. As with classic neural networks, a common approach to improve these models is to increase their size and depth. However, such strategies may be suboptimal, as several works have shown that adding more layers yields increasingly diminishing returns. More importantly, prior studies have shown that increasing depth may lead to model collapse, i.e., all the tokens converge to a single cluster, undermining the ability of LLMs to generate diverse outputs. Building on differential equation models for the transformer dynamics, we prove that all the tokens in a transformer asymptotically converge to a cluster as depth increases. At the technical level we leverage tools from control theory, including consensus dynamics on manifolds and input-to-state stability (ISS). We then extend our analysis to autoregressive models, exploiting their structure to further generalize the theoretical guarantees.
Optimization and Control 39
☆ Introducing Clause Cuts: Strong No-Good Cuts for MaxSAT Problems in Mixed Integer Linear Programming
In this paper we introduce Clause Cuts: linear inequalities obtained from clauses that are logically implied by a CNF formula, resembling strengthened no-good cuts. With these cuts, we tighten mixed-integer linear programming (MILP) formulations of random weighted partial MaxSAT problems, which have remained particularly challenging for core-guided complete MaxSAT solvers. Our approaches treat variables that attain integral values at the LP relaxation as partial assignments which are supplied to a SAT solver as assumptions. When infeasible, these assignments are ruled out with Clause Cuts which are further strengthened with the SAT solver. Two separation algorithms are introduced, one that utilizes a SAT-oracle and finds Clause Cuts in the set of variables with integral values, and another that uses the clauses learned by a conflict driven clause learning (CDCL) SAT solver while evaluating the partial assignment. Experiments on SATLIB benchmarks demonstrate substantial performance gains of up to two orders of magnitude compared to the general purpose MILP solver Gurobi 12, taking only 7.8% of Gurobi's runtime for the whole problem set. Results also surpass the specialized MaxSAT solver RC2, taking only 60% of its runtime. In some cases, our optimization takes only slightly longer than a single SAT call on the SAT-formula. We explain the source of these gains and the limitations of standard MILP formulations in this context.
comment: 21 pages, 9 figures
☆ Regularized Overestimated Newton
We propose Regularized Overestimated Newton (RON), a Newton-type method with low per-iteration cost and strong global and local convergence guarantees for smooth convex optimization. RON interpolates between gradient descent and globally regularized Newton, with behavior determined by the largest Hessian overestimation error. Globally, when the optimality gap of the objective is large, RON achieves an accelerated $O(n^{-2})$ convergence rate; when small, its rate becomes $O(n^{-1})$. Locally, RON converges superlinearly and linearly when the overestimation is exact and inexact, respectively, toward possibly non-isolated minima under the local Quadratic Growth (QG) condition. The linear rate is governed by an improved effective condition number depending on the overestimation error. Leveraging a recent randomized rank-$k$ Hessian approximation algorithm, we obtain a practical variant with $O(\text{dim}\cdot k^2)$ cost per iteration. When the Hessian rank is uniformly below $k$, RON achieves a per-iteration cost comparable to that of first-order methods while retaining the superior convergence rates even in degenerate local landscapes. We validate our theoretical findings through experiments on entropic optimal transport and inverse problems.
☆ Scalable Second-order Riemannian Optimization for $K$-means Clustering
Clustering is a hard discrete optimization problem. Nonconvex approaches such as low-rank semidefinite programming (SDP) have recently demonstrated promising statistical and local algorithmic guarantees for cluster recovery. Due to the combinatorial structure of the $K$-means clustering problem, current relaxation algorithms struggle to balance their constraint feasibility and objective optimality, presenting tremendous challenges in computing the second-order critical points with rigorous guarantees. In this paper, we provide a new formulation of the $K$-means problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm. By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved in linear time. Our numerical experiments show that the proposed method converges significantly faster than the state-of-the-art first-order nonnegative low-rank factorization method, while achieving similarly optimal statistical accuracy.
☆ A regret minimization approach to fixed-point iterations
We propose a conversion scheme that turns regret minimizing algorithms into fixed point iterations, with convergence guarantees following from regret bounds. The resulting iterations can be seen as a grand extension of the classical Krasnoselskii--Mann iterations, as the latter are recovered by converting the Online Gradient Descent algorithm. This approach yields new simple iterations for finding fixed points of non-self operators. We also focus on converting algorithms from the AdaGrad family of regret minimizers, and thus obtain fixed point iterations with adaptive guarantees of a new kind. Numerical experiments on various problems demonstrate faster convergence of AdaGrad-based fixed point iterations over Krasnoselskii--Mann iterations.
☆ Characterizations of Strongly Quasiconvex Functions
We provide new necessary and sufficient conditons for ensuring strong quasiconvexity in the nonsmooth case and, as a consequence, we provide a proof for the differentiable case. Furthermore, we improve the quadratic growth property for strongly quasiconvex functions.
☆ humancompatible.train: Implementing Optimization Algorithms for Stochastically-Constrained Stochastic Optimization Problems NeurIPS
There has been a considerable interest in constrained training of deep neural networks (DNNs) recently for applications such as fairness and safety. Several toolkits have been proposed for this task, yet there is still no industry standard. We present humancompatible.train (https://github.com/humancompatible/train), an easily-extendable PyTorch-based Python package for training DNNs with stochastic constraints. We implement multiple previously unimplemented algorithms for stochastically constrained stochastic optimization. We demonstrate the toolkit use by comparing two algorithms on a deep learning task with fairness constraints.
comment: Accepted at NeurIPS workshop COML 2025
☆ Inverse Reinforcement Learning Using Just Classification and a Few Regressions
Inverse reinforcement learning (IRL) aims to explain observed behavior by uncovering an underlying reward. In the maximum-entropy or Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function and a soft value function that together satisfy the soft Bellman consistency condition and maximize the likelihood of observed actions. While this perspective has had enormous impact in imitation learning for robotics and understanding dynamic choices in economics, practical learning algorithms often involve delicate inner-loop optimization, repeated dynamic programming, or adversarial training, all of which complicate the use of modern, highly expressive function approximators like neural nets and boosting. We revisit softmax IRL and show that the population maximum-likelihood solution is characterized by a linear fixed-point equation involving the behavior policy. This observation reduces IRL to two off-the-shelf supervised learning problems: probabilistic classification to estimate the behavior policy, and iterative regression to solve the fixed point. The resulting method is simple and modular across function approximation classes and algorithms. We provide a precise characterization of the optimal solution, a generic oracle-based algorithm, finite-sample error bounds, and empirical results showing competitive or superior performance to MaxEnt IRL.
☆ On structured condition number of rational matrix functions
We derive the necessary and sufficient conditions for the simple eigenvalues of rational matrix functions with symmetry structure to have the same normwise condition number with respect to arbitrary and structure-preserving perturbations. We obtain an exact expression for the structured condition number of simple eigenvalues of symmetric, skew-symmetric and even/odd rational matrix functions, and tight bounds are obtained for simple eigenvalues of Hermitian, skew-Hermitian, even/odd, and palindromic rational matrix functions.
comment: 22 pages
☆ Machine Learning Powered Feasible Path Framework with Adaptive Sampling for Black-box Optimization
Black-box optimization (BBO) involves functions that are unknown, inexact and/or expensive-to-evaluate. Existing BBO algorithms face several challenges, including high computational cost from extensive evaluations, difficulty in handling complex constraints, lacking theoretical convergence guarantees and/or instability due to large solution quality variation. In this work, a machine learning-powered feasible path optimization framework (MLFP) is proposed for general BBO problems including complex constraints. An adaptive sampling strategy is first proposed to explore optimal regions and pre-filter potentially infeasible points to reduce evaluations. Machine learning algorithms are leveraged to develop surrogates of black-boxes. The feasible path algorithm is employed to accelerate theoretical convergence by updating independent variables rather than all. Computational studies demonstrate MLFP can rapidly and robustly converge around the KKT point, even training surrogates with small datasets. MLFP is superior to the state-of-the-art BBO algorithms, as it stably obtains the same or better solutions with fewer evaluations for benchmark examples.
☆ A Riemannian Variational and Spectral Framework for High-Dimensional Sphere Packing: Barrier-Dynamics Reconciliation, Periodic Rigidity, and Discrete-Time Guarantees
We develop a unified framework that reconciles a barrier based geometric model of periodic sphere packings with a provably convergent discrete time dynamics. First, we introduce a C2 interior barrier U_nu that is compatible with a strict feasibility safeguard and has a Lipschitz gradient on the iterates domain. Second, we correct and formalize the discrete update and give explicit step size and damping rules. Third, we prove barrier to KKT consistency and state an interior variant clarifying the role of the quadratic term. Fourth, we show that strict prestress stability implies periodic infinitesimal rigidity of the contact framework. Fifth, we establish a Lyapunov energy descent principle, an energy nonexpansive feasibility projection (including a joint x and lattice basis B variant), and local linear convergence for the Spectral Projected Interior Trajectory (Spit) method. We also provide practical Hessian vector formulas to estimate smoothness and curvature, minimal schematic illustrations, and a short reproducibility stub. The emphasis is on rigorous assumptions and proofs; empirical evaluation is deferred.
comment: 9 pages
☆ Smoothing Binary Optimization: A Primal-Dual Perspective
Binary optimization is a powerful tool for modeling combinatorial problems, yet scalable and theoretically sound solution methods remain elusive. Conventional solvers often rely on heuristic strategies with weak guarantees or struggle with large-scale instances. In this work, we introduce a novel primal-dual framework that reformulates unconstrained binary optimization as a continuous minimax problem, satisfying a strong max-min property. This reformulation effectively smooths the discrete problem, enabling the application of efficient gradient-based methods. We propose a simultaneous gradient descent-ascent algorithm that is highly parallelizable on GPUs and provably converges to a near-optimal solution in linear time. Extensive experiments on large-scale problems--including Max-Cut, MaxSAT, and Maximum Independent Set with up to 50,000 variables--demonstrate that our method identifies high-quality solutions within seconds, significantly outperforming state-of-the-art alternatives.
☆ Feature Augmentation of GNNs for ILPs: Local Uniqueness Suffices
Integer Linear Programs (ILPs) are central to real-world optimizations but notoriously difficult to solve. Learning to Optimize (L2O) has emerged as a promising paradigm, with Graph Neural Networks (GNNs) serving as the standard backbone. However, standard anonymous GNNs are limited in expressiveness for ILPs, and the common enhancement of augmenting nodes with globally unique identifiers (UIDs) typically introduces spurious correlations that severely harm generalization. To address this tradeoff, we propose a parsimonious Local-UID scheme based on d-hop uniqueness coloring, which ensures identifiers are unique only within each node's d-hop neighborhood. Building on this scheme, we introduce ColorGNN, which incorporates color information via color-conditioned embeddings, and ColorUID, a lightweight feature-level variant. We prove that for d-layer networks, Local-UIDs achieve the expressive power of Global-UIDs while offering stronger generalization. Extensive experiments show that our approach (i) yields substantial gains on three ILP benchmarks, (ii) exhibits strong OOD generalization on linear programming datasets, and (iii) further improves a general graph-level task when paired with a state-of-the-art method.
comment: 9 pages, 6 Tables
☆ $H^\infty$-control problem for a singular parabolic system with convection
We study the $H^\infty-$control problem for an infinite dimensional parabolic system, with a convection term, perturbed by a singular inverse-square potential with control distributed in the interior of a domain, extending part of the results by G. Marinoschi (ESAIM: COCV, 2023).
☆ Universal Complexity Bounds for Universal Gradient Methods in Nonlinear Optimization
In this paper, we provide the universal first-order methods of Composite Optimization with new complexity analysis. It delivers some universal convergence guarantees, which are not linked directly to any parametric problem class. However, they can be easily transformed into the rates of convergence for the particular problem classes by substituting the corresponding upper estimates for the Global Curvature Bound of the objective function. We analyze in this way the simple gradient method for nonconvex minimization, gradient methods for convex composite optimization, and their accelerated variant. For them, the only input parameter is the required accuracy of the approximate solution. The accelerated variant of our scheme automatically ensures the best possible rate of convergence simultaneously for all parametric problem classes containing the smooth part of the objective function.
☆ Complexity of Error Bounds for Systems of Linear Inequalities
Error bounds have been studied for over seventy years, beginning with Hoffman's 1952 result [ J. Res. Natl. Bur. Standards, 49(1952), 263-65], which provided a bound on the distance from any point to the solution set of a linear system. However, little is known about the inherent intractability of error bounds. This paper focuses on the complexity of error bounds and stbaility of error bounds for systems of linear inequalities. For the complexity of error bounds for linear inequalities, we reformulate this problem as the task of solving finitely many min-max problems-defined by certain rows of the given matrix over the sphere and then prove that this problem is not in the class P, but it is in the clathat there exist pseudo-polynomial time algorithms to solve both the error bounds problem and its complement. In particular, the complement is a number problem, but not NP-complete in the strong sense (unless P = NP). For the complexity of stability of error bounds, it is proved that this problem is not in the class P. However, such problem becomes polynomially solvable when the dimension is fixed.
☆ Gaussian splatting holography
In-line holography offers high space-bandwidth product imaging with a simplified lens-free optical system. However, in-line holographic reconstruction is troubled by twin images arising from the Hermitian symmetry of complex fields. Twin images disrupt the reconstruction in solving the ill-posed phase retrieval problem. The known parameters are less than the unknown parameters, causing phase ambiguities. State-of-the-art deep-learning or non-learning methods face challenges in balancing data fidelity with twin-image disturbance. We propose the Gaussian splatting holography (GSH) for twin-image-suppressed holographic reconstruction. GSH uses Gaussian splatting for optical field representation and compresses the number of unknown parameters by a maximum of 15 folds, transforming the original ill-posed phase retrieval into a well-posed one with reduced phase ambiguities. Additionally, the Gaussian splatting tends to form sharp patterns rather than those with noisy twin-image backgrounds as each Gaussian has a spatially slow-varying profile. Experiments show that GSH achieves constraint-free recovery for in-line holography with accuracy comparable to state-of-the-art constraint-based methods, with an average peak signal-to-noise ratio equal to 26 dB, and structure similarity equal to 0.8. Combined with total variation, GSH can be further improved, obtaining a peak signal-to-noise ratio of 31 dB, and a high compression ability of up to 15 folds.
☆ Automated Algorithm Design via Nevanlinna-Pick Interpolation NeurIPS 2025
The synthesis of optimization algorithms typically follows a design-first-analyze-later approach, which often obscures fundamental performance limitations and hinders the systematic design of algorithms operating at the achievable theoretical boundaries. Recently, a framework based on frequency-domain techniques from robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating the design and analysis stages, this framework enables the identification of fundamental performance limits. In this paper, we build on this framework and extend it to address algorithms for solving strongly convex problems with equality constraints. As a result, we obtain a new class of algorithms that offers sharp trade-off between number of matrix multiplication per iteration and convergence rate.
comment: Accepted to DynaFront Workshop at NeurIPS 2025
☆ State-Constrained Chemical Reactions: Discrete-to-Continuous Hamilton--Jacobi Equations and Large Deviations
We study the macroscopic behavior of chemical reactions modeled as random time-changed Poisson processes on discrete state spaces. Using the WKB reformulation, the backward equation of the rescaled process leads to a discrete Hamilton--Jacobi equation with state constraints. As the grid size tends to zero, the limiting solution and its associated variational representation are closely connected to the good rate function of the large deviation principle for state-constrained chemical reactions in the thermodynamic limit. In this work, we focus on the limiting behavior of discrete Hamilton--Jacobi equations defined on bounded domains with state-constraint boundary conditions. For a single chemical reaction, we show that, under a suitable reparametrization, the solution of the discrete Hamilton--Jacobi equation converges to the solution of a continuous Hamilton--Jacobi equation with a Neumann boundary condition. Building on this convergence result and the associated variational representation, we establish the large deviation principle for the rescaled chemical reaction process in bounded domains.
☆ Automated algorithm design for convex optimization problems with linear equality constraints
Synthesis of optimization algorithms typically follows a {\em design-then-analyze\/} approach, which can obscure fundamental performance limits and hinder the systematic development of algorithms that operate near these limits. Recently, a framework grounded in robust control theory has emerged as a powerful tool for automating algorithm synthesis. By integrating design and analysis stages, fundamental performance bounds are revealed and synthesis of algorithms that achieve them is enabled. In this paper, we apply this framework to design algorithms for solving strongly convex optimization problems with linear equality constraints. Our approach yields a single-loop, gradient-based algorithm whose convergence rate is independent of the condition number of the constraint matrix. This improves upon the best known rate within the same algorithm class, which depends on the product of the condition numbers of the objective function and the constraint matrix.
comment: Accepted to 64th IEEE Conference on Decision Control (CDC), 2025
☆ Linear Risk Sharing on Networks
Over the past decade alternatives to traditional insurance and banking have grown in popularity. The desire to encourage local participation has lead products such as peer-to-peer insurance, reciprocal contracts, and decentralized finance platforms to increasingly rely on network structures to redistribute risk among participants. In this paper, we develop a comprehensive framework for linear risk sharing (LRS), where random losses are reallocated through nonnegative linear operators which can accommodate a wide range of networks. Building on the theory of stochastic and doubly stochastic matrices, we establish conditions under which constraints such as budget balance, fairness, and diversification are guaranteed. The convex order framework allows us to compare different allocations rigorously, highlighting variance reduction and majorization as natural consequences of doubly stochastic mixing. We then extend the analysis to network-based sharing, showing how their topology shapes risk outcomes in complete, star, ring, random, and scale-free graphs. A second layer of randomness, where the sharing matrix itself is random, is introduced via Erd\H{o}s--R\'enyi and preferential-attachment networks, connecting risk-sharing properties to degree distributions. Finally, we study convex combinations of identity and network-induced operators, capturing the trade-off between self-retention and diversification. Our results provide design principles for fair and efficient peer-to-peer insurance and network-based risk pooling, combining mathematical soundness with economic interpretability.
♻ ☆ On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks
This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $\ell_2$-norm estimator overcomes the challenges as long as the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.
comment: 26 pages, 3 figures
♻ ☆ A Variational Framework for Residual-Based Adaptivity in Neural PDE Solvers and Operator Learning
Residual-based adaptive strategies are widely used in scientific machine learning but remain largely heuristic. We introduce a unifying variational framework that formalizes these methods by integrating convex transformations of the residual. Different transformations correspond to distinct objective functionals: exponential weights target the minimization of uniform error, while linear weights recover the minimization of quadratic error. Within this perspective, adaptive weighting is equivalent to selecting sampling distributions that optimize the primal objective, thereby linking discretization choices directly to error metrics. This principled approach yields three benefits: (1) it enables systematic design of adaptive schemes across norms, (2) reduces discretization error through variance reduction of the loss estimator, and (3) enhances learning dynamics by improving the gradient signal-to-noise ratio. Extending the framework to operator learning, we demonstrate substantial performance gains across optimizers and architectures. Our results provide a theoretical justification of residual-based adaptivity and establish a foundation for principled discretization and training strategies.
♻ ☆ The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
Computing the polar decomposition and the related matrix sign function has been a well-studied problem in numerical analysis for decades. Recently, it has emerged as an important subroutine within the Muon algorithm for training deep neural networks. However, the requirements of this application differ sharply from classical settings: deep learning demands GPU-friendly algorithms that prioritize high throughput over high precision. We introduce Polar Express, a new method for computing the polar decomposition. Like Newton-Schulz and other classical polynomial methods, our approach uses only matrix-matrix multiplications, making it very efficient on GPUs. Inspired by earlier work of Chen & Chow and Nakatsukasa & Freund, Polar Express adapts the update rule at each iteration by solving a minimax optimization problem. We prove that this strategy minimizes error in a worst-case sense, allowing Polar Express to converge as rapidly as possible both in the early iterations and asymptotically. We also address finite-precision issues, making it practical to use in bfloat16. When integrated into the Muon training framework, our method leads to consistent improvements in validation loss when training a GPT-2 model on one billion tokens from the FineWeb dataset, outperforming recent alternatives across a range of learning rates.
comment: 34 pages, 8 figures, 4 algorithms
♻ ☆ The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization NeurIPS 2018
Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local min-max solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of \{OGDA\}-stable critical points is a superset of \{GDA\}-stable critical points, which is a superset of local min-max solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.
comment: Appeared in NeurIPS 2018. The new version fixes typos
FusedANN: Convexified Hybrid ANN via Attribute-Vector Fusion
Vector search powers transformers technology, but real-world use demands hybrid queries that combine vector similarity with attribute filters (e.g., "top document in category X, from 2023"). Current solutions trade off recall, speed, and flexibility, relying on fragile index hacks that don't scale. We introduce FusedANN (Fused Attribute-Vector Nearest Neighbor), a geometric framework that elevates filtering to ANN optimization constraints and introduces a convex fused space via a Lagrangian-like relaxation. Our method jointly embeds attributes and vectors through transformer-based convexification, turning hard filters into continuous, weighted penalties that preserve top-k semantics while enabling efficient approximate search. We prove that FusedANN reduces to exact filtering under high selectivity, gracefully relaxes to semantically nearest attributes when exact matches are insufficient, and preserves downstream ANN alpha-approximation guarantees. Empirically, FusedANN improves query throughput by eliminating brittle filtering stages, achieving superior recall-latency tradeoffs on standard hybrid benchmarks without specialized index hacks, delivering up to 3 times higher throughput and better recall than state-of-the-art hybrid and graph-based systems. Theoretically, we provide explicit error bounds and parameter selection rules that make FusedANN practical for production. This establishes a principled, scalable, and verifiable bridge between symbolic constraints and vector similarity, unlocking a new generation of filtered retrieval systems for large, hybrid, and dynamic NLP/ML workloads.
comment: 62 pages,12 figures
♻ ☆ Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications
We introduce a novel two-level overlapping additive Schwarz preconditioner for accelerating the training of scientific machine learning applications. The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping additive Schwarz preconditioner. The neural network parameters are decomposed into groups (subdomains) with overlapping regions. In addition, the network's feed-forward structure is indirectly imposed through a novel subdomain-wise synchronization strategy and a coarse-level training step. Through a series of numerical experiments, which consider physics-informed neural networks and operator learning approaches, we demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBFGS) optimizer while also yielding more accurate machine learning models. Moreover, the devised preconditioner is designed to take advantage of model-parallel computations, which can further reduce the training time.
comment: 24 pages, 9 figures
♻ ☆ Understanding Optimization in Deep Learning with Central Flows ICLR 2025
Traditional theories of optimization cannot describe the dynamics of optimization in deep learning, even in the simple setting of deterministic training. The challenge is that optimizers typically operate in a complex, oscillatory regime called the "edge of stability." In this paper, we develop theory that can describe the dynamics of optimization in this regime. Our key insight is that while the *exact* trajectory of an oscillatory optimizer may be challenging to analyze, the *time-averaged* (i.e. smoothed) trajectory is often much more tractable. To analyze an optimizer, we derive a differential equation called a "central flow" that characterizes this time-averaged trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these central flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizers "adapt" to the local loss landscape; and how adaptive optimizers implicitly navigate towards regions where they can take larger steps. Our results suggest that central flows can be a valuable theoretical tool for reasoning about optimization in deep learning.
comment: First two authors contributed equally; author order determined by coin flip. This is the full version of a paper published at ICLR 2025. We encourage readers to explore the blog version of this paper, with animated optimization trajectories, at https://centralflows.github.io. Our code can be found at https://github.com/centralflows/centralflows
♻ ☆ Data-driven stabilization of polynomial systems using density functions
This paper studies data-driven stabilization of a class of unknown polynomial systems using data corrupted by bounded noise. Existing work addressing this problem has focused on designing a controller and a Lyapunov function so that a certain state-dependent matrix is negative definite, which ensures asymptotic stability of all closed-loop systems compatible with the data. However, as we demonstrate in this paper, considering the negative definiteness of this matrix introduces conservatism, which limits the applicability of current approaches. To tackle this issue, we develop a new method for the data-driven stabilization of polynomial systems using the concept of density functions. The control design consists of two steps. Firstly, a dual Lyapunov theorem is used to formulate a sum of squares program that allows us to compute a rational state feedback controller for all systems compatible with the data. By the dual Lyapunov theorem, this controller ensures that the trajectories of the closed-loop system converge to zero for almost all initial states. Secondly, we propose a method to verify whether the designed controller achieves asymptotic stability of all closed-loop systems compatible with the data. Apart from reducing conservatism of existing methods, the proposed approach can also readily take into account prior knowledge on the system parameters. A key technical result developed in this paper is a new type of S-lemma for a specific class of matrices that, in contrast to the classical S-lemma, avoids the use of multipliers.
♻ ☆ Monotone Lipschitz-Gradient Denoiser: Explainability of Operator Regularization Approaches Free From Lipschitz Constant Control
This paper addresses explainability of the operator-regularization approach under the use of monotone Lipschitz-gradient (MoL-Grad) denoiser -- an operator that can be expressed as the Lipschitz continuous gradient of a differentiable convex function. We prove that an operator is a MoL-Grad denoiser if and only if it is the ``single-valued'' proximity operator of a weakly convex function. An extension of Moreau's decomposition is also shown with respect to a weakly convex function and the conjugate of its convexified function. Under these arguments, two specific algorithms, the forward-backward splitting algorithm and the primal-dual splitting algorithm, are considered, both employing MoL-Grad denoisers. These algorithms generate a sequence of vectors converging weakly, under conditions, to a minimizer of a certain cost function which involves an ``implicit regularizer'' induced by the denoiser. Unlike the previous studies of operator regularization, our framework requires no control of the Lipschitz constant in learning the denoiser. The theoretical findings are supported by simulations.
comment: 17 pages, 6 figures
♻ ☆ A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity
We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function. While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice. In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of $\widetilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.
♻ ☆ Data-driven h2 model reduction for linear discrete-time systems
We present a data-driven framework for $h^{2}$-optimal model reduction for linear discrete-time systems. Our main contribution is to create optimal reduced-order models in the $h^{2}$-norm sense directly from the measurement data alone, without using the information about the original system. In particular, we focus on the fact that the gradients of the $h^{2}$ model reduction problem are expressed using the discrete-time Lyapunov equation and the discrete-time Sylvester equation, and derive the data-driven gradients. The proposed algorithm uses the output of an existing MOR as the initial point, and convergence to a stationary point is guaranteed under certain assumptions. In numerical experiments, we demonstrate that, for a modeling task in neuroscience, our method constructs a reduced-order model that outperforms DMDc in terms of the $h^{2}$-norm.
comment: 8 pages, 1 figure
♻ ☆ Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration
This paper analyzes the factorizability and geometry of transition matrices of multivariate Markov chains. Specifically, we demonstrate that the induced chains on factors of a product space can be regarded as information projections with respect to the Kullback-Leibler divergence. This perspective yields Han-Shearer type inequalities and submodularity of the entropy rate of Markov chains, as well as applications in the context of large deviations and mixing time comparison. As concrete algorithmic applications in Markov chain Monte Carlo (MCMC), we provide two illustrations based on lifted MCMC and swapping algorithm respectively to demonstrate projection samplers improve mixing over the original samplers. The projection sampler based on the swapping algorithm resamples the highest-temperature coordinate at stationarity at each step, and we prove that such practice accelerates the mixing time by multiplicative factors related to the number of temperatures and the dimension of the underlying state space when compared with the original swapping algorithm. Through simple numerical experiments on a bimodal target distribution, we show that the projection samplers mix effectively, in contrast to lifted MCMC and the swapping algorithm, which mix less well.
comment: 41 pages, 4 figures
♻ ☆ FEM-based A-optimal sensor placement for heat source inversion from final time measurement
Within the field of optimal experimental design, \emph{sensor placement} refers to the act of finding the optimal locations of data collecting sensors, with the aim to optimise reconstruction of an unknown parameter from finite data. In this work, we investigate sensor placement for the inverse problem of reconstructing a heat source given final time measurements. Employing forward and adjoint analysis of this PDE-driven model, we show how one can leverage the first author's recently invented \emph{redundant-dominant $p$-continuation} algorithm to obtain binary A-optimal sensor placements also for this time-dependent model.
comment: To be submitted to the Proceedings of Domain Decomposition Methods in Science and Engineering XXIX
♻ ☆ Isogeometric Topology Optimization Based on Topological Derivatives
Topology optimization is a valuable tool in engineering, facilitating the design of optimized structures. However, topological changes often require a remeshing step, which can become challenging. In this work, we propose an isogeometric approach to topology optimization driven by topological derivatives. The combination of a level-set method together with an immersed isogeometric framework allows seamless geometry updates without the necessity of remeshing. At the same time, topological derivatives provide topological modifications without the need to define initial holes [7]. We investigate the influence of higher-degree basis functions in both the level-set representation and the approximation of the solution. Two numerical examples demonstrate the proposed approach, showing that employing higher-degree basis functions for approximating the solution improves accuracy, while linear basis functions remain sufficient for the level-set function representation.
comment: 19 pages, 11 figures, pre-print,
♻ ☆ The Stochastic Steepest Descent Method for Robust Optimization in Banach Spaces
Stochastic gradient methods have been a popular and powerful choice of optimization methods, aimed at minimizing functions. Their advantage lies in the fact that that one approximates the gradient as opposed to using the full Jacobian matrix. One research direction, related to this, has been on the application to infinite-dimensional problems, where one may naturally have a Hilbert space framework. However, there has been limited work done on considering this in a more general setup, such as where the natural framework is that of a Banach space. This article aims to address this by the introduction of a novel stochastic method, the stochastic steepest descent method (SSD). The SSD will follow the spirit of stochastic gradient descent, which utilizes Riesz representation to identify gradients and derivatives. Our choice for using such a method is that it naturally allows one to adopt a Banach space setting, for which recent applications have exploited the benefit of this, such as in PDE-constrained shape optimization. We provide a convergence theory related to this under mild assumptions. Furthermore, we demonstrate the performance of this method on a couple of numerical applications, namely a $p$-Laplacian and an optimal control problem. Our assumptions are verified in these applications.
♻ ☆ Primal-dual proximal bundle and conditional gradient methods for convex problems
This paper studies the primal-dual convergence and iteration-complexity of proximal bundle methods for solving nonsmooth problems with convex structures. More specifically, we develop a family of primal-dual proximal bundle methods for solving convex nonsmooth composite optimization problems and establish the iteration-complexity in terms of a primal-dual gap. We also propose a class of proximal bundle methods for solving convex-concave nonsmooth composite saddle-point problems and establish the iteration-complexity to find an approximate saddle-point. This paper places special emphasis on the primal-dual perspective of the proximal bundle method. In particular, we discover an interesting duality between the conditional gradient method and the cutting-plane scheme used within the proximal bundle method. Leveraging this duality, we further develop novel variants of both the conditional gradient method and the cutting-plane scheme. Additionally, we report numerical experiments to demonstrate the effectiveness and efficiency of the proposed proximal bundle methods in comparison with the subgradient method for solving a regularized matrix game.
comment: 41 pages
♻ ☆ Application of Battery Storage to Switching Predictive Control of Power Distribution Systems Including Road Heating
In regions with heavy snowfall, the living environment is becoming a serious problem due to heavy snow accumulation. A road heating is an electrical device which promotes snow melting by burying a heating cable as a thermal source underground in such regions. When integrating the road heating into power distribution systems, we need to optimize the flow of electric power by appropriately integrating distributed power sources and conventional power distribution equipment. In this paper, we introduce a battery storage to the power distribution system including road heating, and extend the predictive switching control of the systems due to the authors' previous study to the case where battery storage is installed. As a main result, we propose a predictive switching control that utilizes photovoltaic (PV) power generation and surplus power stored in the battery storage effectively, and achieves the reduction of distribution loss, attenuation of voltage fluctuation, and efficient snow melting, simultaneously. We verify the effectiveness of the application of battery storage through numerical simulation using actual time series data of weather conditions and active power of the PV power generation and load.
comment: 14 pages, 14 figures
♻ ☆ A Linear-Quadratic Stackelberg Differential Game with Mixed Deterministic and Stochastic Controls
This paper is concerned with a linear-quadratic (LQ) leader-follower differential game with mixed deterministic and stochastic controls. In the game, the follower is a random controller which means that the follower can choose adapted stochastic processes, while the leader is a deterministic controller which means that the leader can choose only deterministic time functions. Such problem is motivated by a pension fund insurance problem, with government, supervisory or employer being a deterministic leader and individual producer or retail investor being a random follower. An open-loop Stackelberg equilibrium solution is considered. First, an optimal control process of the follower is characterized by a stationary condition of forward-backward stochastic differential equation (FBSDE) and a convexity condition of SDE. Then it is represented as a linear functional of optimal state variable of the follower and the leader's control variable, via a classical Riccati equation. Then an optimal control function of the leader is first characterized by a convexity condition of FBSDE and a stationary condition of mean-field type FBSDE. And it is represented as a functional of expectation of optimal state variable of the leader, with the help of a system consisting of two cross-coupled Riccati equations and a two-point boundary value problem of ordinary differential equations (ODEs). The solvabilities of this new system of Riccati equations and two-point boundary value problem and investigated.
comment: 24 pages
♻ ☆ The Asymptotic Behavior of Attention in Transformers
The transformer architecture has become the foundation of modern Large Language Models (LLMs), yet its theoretical properties are still not well understood. As with classic neural networks, a common approach to improve these models is to increase their size and depth. However, such strategies may be suboptimal, as several works have shown that adding more layers yields increasingly diminishing returns. More importantly, prior studies have shown that increasing depth may lead to model collapse, i.e., all the tokens converge to a single cluster, undermining the ability of LLMs to generate diverse outputs. Building on differential equation models for the transformer dynamics, we prove that all the tokens in a transformer asymptotically converge to a cluster as depth increases. At the technical level we leverage tools from control theory, including consensus dynamics on manifolds and input-to-state stability (ISS). We then extend our analysis to autoregressive models, exploiting their structure to further generalize the theoretical guarantees.
Systems and Control 36
☆ Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in higher multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks with over 30 steps. We also theoretically and empirically validate the strong cross-task generalizability that the models trained on complex tasks can lead to the successful completion of all simpler subtasks.
☆ Data-Driven State Observers for Measure-Preserving Systems
The increasing use of data-driven control strategies gives rise to the problem of learning-based state observation. Motivated by this need, the present work proposes a data-driven approach for the synthesis of state observers for discrete-time nonlinear systems with measure-preserving dynamics. To this end, Kazantzis-Kravaris/Luenburger (KKL) observers are shown to be well-defined, where the observer design boils down to determining a nonlinear injective mapping of states and its pseudo-inverse. For its learning-based construction, the KKL observer is related to the Koopman and Perron-Frobenius operators, defined on a Sobolev-type reproducing kernel Hilbert space (RKHS) on which they are shown to be normal operators and thus have a spectral resolution. Hence, observer synthesis algorithms, based on kernel interpolation/regression routines for the desired injective mapping in the observer and its pseudo-inverse, have been proposed in various settings of available dataset -- (i) many orbits, (ii) single long orbit, and (iii) snapshots. Theoretical error analyses are provided, and numerical studies on a chaotic Lorenz system are demonstrated.
comment: 40 pages, 11 figures, submitted to Journal of Nonlinear Science
☆ PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models NeurIPS 2025
Diffusion models have demonstrated strong generative capabilities across scientific domains, but often produce outputs that violate physical laws. We propose a new perspective by framing physics-informed generation as a sparse reward optimization problem, where adherence to physical constraints is treated as a reward signal. This formulation unifies prior approaches under a reward-based paradigm and reveals a shared bottleneck: reliance on diffusion posterior sampling (DPS)-style value function approximations, which introduce non-negligible errors and lead to training instability and inference inefficiency. To overcome this, we introduce Physics-Informed Reward Fine-tuning (PIRF), a method that bypasses value approximation by computing trajectory-level rewards and backpropagating their gradients directly. However, a naive implementation suffers from low sample efficiency and compromised data fidelity. PIRF mitigates these issues through two key strategies: (1) a layer-wise truncated backpropagation method that leverages the spatiotemporally localized nature of physics-based rewards, and (2) a weight-based regularization scheme that improves efficiency over traditional distillation-based methods. Across five PDE benchmarks, PIRF consistently achieves superior physical enforcement under efficient sampling regimes, highlighting the potential of reward fine-tuning for advancing scientific generative modeling.
comment: 18 pages, 6 figures; NeurIPS 2025 AI for science workshop
☆ Adaptive Altitude Control of a Tethered Multirotor Autogyro under Varying Wind Speeds using Differential Rotor Braking
A tethered multirotor autogyro can function as an unmanned aerial vehicle for energy-efficient and prolonged deployment, as it uses the available wind energy to sustain flight. This article presents an adaptive altitude control strategy for such a device. At a constant wind speed, the equilibrium altitude can be approximated by a quadratic function of the pitch angle. The proposed adaptive control estimates the coefficients of this quadratic function. The estimates are used for altitude control and to attain the maximum altitude (and minimum horizontal drift) for a given wind speed. A feedback controller based on regenerative differential rotor braking is used as the actuation to modulate the autogyro's pitch angle. Implementation of the controller using a control-oriented, higher-order dynamic model demonstrates the controller's capability to regulate the altitude and maintain stable flights under varying wind speeds. Based on the system's maximum altitude tracking performance, the adaptive control is adjusted to improve performance under substantial changes in wind speeds.
☆ Approximately Optimal Toll Design for Efficiency and Equity in Arc-Based Traffic Assignment Models
Congestion pricing policies have emerged as promising traffic management tools to alleviate traffic congestion caused by travelers' selfish routing behaviors. The core principle behind deploying tolls is to impose monetary costs on frequently overcrowded routes, to incentivize self-interested travelers to select less easily congested routes. Recent literature has focused on toll design based on arc-based traffic assignment models (TAMs), which characterize commuters as traveling through a traffic network by successively selecting an outgoing arc from every intermediate node along their journey. However, existing tolling mechanisms predicated on arc-based TAMs often target the design of a single congestion-minimizing toll, ignoring crucial fairness considerations, such as the financial impact of high congestion fees on low-income travelers. To address these shortcomings, in this paper, we pose the dual considerations of efficiency and equity in traffic routing as bilevel optimization problems. Since such problems are in general computationally intractable to solve precisely, we construct a linear program approximation by introducing a polytope approximation for the set of all tolls that induce congestion-minimizing traffic flow patterns. Finally, we provide numerical results that validate our theoretical conclusions.
☆ Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
☆ Adversarial Pursuits in Cislunar Space
Cislunar space is becoming a critical domain for future lunar and interplanetary missions, yet its remoteness, sparse infrastructure, and unstable dynamics create single points of failure. Adversaries in cislunar orbits can exploit these vulnerabilities to pursue and jam co-located communication relays, potentially severing communications between lunar missions and the Earth. We study a pursuit-evasion scenario between two spacecraft in a cislunar orbit, where the evader must avoid a pursuer-jammer while remaining close to its nominal trajectory. We model the evader-pursuer interaction as a zero-sum adversarial differential game cast in the circular restricted three-body problem. This formulation incorporates critical aspects of cislunar orbital dynamics, including autonomous adjustment of the reference orbit phasing to enable aggressive evading maneuvers, and shaping of the evader's cost with the orbit's stable and unstable manifolds. We solve the resulting nonlinear game locally using a continuous-time differential dynamic programming variant, which iteratively applies linear-quadratic approximations to the Hamilton-Jacobi-Isaacs equation. We simulate the evader's behavior against both a worst-case and a linear-quadratic pursuer. Our results pave the way for securing future missions in cislunar space against emerging cyber threats.
comment: 17 pages, 9 figures
☆ On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
☆ Certified Learning-Enabled Noise-Aware Motion Planning for Urban Air Mobility
Urban Air Mobility (UAM) has emerged as a promising solution to alleviate urban congestion and transportation challenges. Nevertheless, the noise generated by eVTOL aircrafts poses a significant barrier to public acceptance and regulatory approval, potentially limiting the operational scope and scalability of UAM systems. Hence, the successful adoption of UAM systems hinges on the ability to predict generated noise levels, and further develop motion planning strategies that comply with community-level noise regulations while maintaining operational efficiency. To this end, this paper proposes a novel noise-aware motion planning framework for UAM systems that ensures compliance with noise regulations. We first develop a certifiable neural network model to accurately predict eVTOL noise propagation patterns in urban environments, providing provable bounds on its correctness. To achieve a desired level of accuracy, we propose an active sampling strategy to efficiently build the dataset used to train and test the noise model. Next, we develop a noise-aware motion planning algorithm that utilizes the noise model to generate eVTOL trajectories that guarantee compliance with community noise regulations. The algorithm exploits the monotonic structure of the noise model to efficiently sample the configuration space, ensuring that the generated trajectories are both noise-compliant and operationally efficient. We demonstrate the effectiveness of the proposed framework through a number of experiments for Vahana eVTOLs. The results show that the framework can generate noise-compliant flight plans for a fleet of eVTOLs that adhere to community noise regulations while optimizing operational efficiency.
comment: 16 pages, 10 figures
☆ Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.
comment: Submitted to IEEE TAC
☆ Koopman-Operator-Based Model Predictive Control for Drag-free Satellite
This paper presents a data-driven modelling method for nonlinear dynamics of drag-free satellite based on Koopman operator theory, and a model predictive controller is designed based on the identified model. The nonlinear dynamics of drag-free satellite are identified and controlled based on Sparse Identification of Nonlinear Dynamics (SINDy). Using the manually constructed nonlinear function dictionary as observables, the system approximation is obtained by SINDy algorithm, and a linear Model Predictive Control (MPC) controller is designed for test mass capture based on the SINDy model. Finally, the effectiveness of MPC control is verified by numerical examples.
☆ Orbital Stabilization and Time Synchronization of Unstable Periodic Motions in Underactuated Robots
This paper presents a control methodology for achieving orbital stabilization with simultaneous time synchronization of periodic trajectories in underactuated robotic systems. The proposed approach extends the classical transverse linearization framework to explicitly incorporate time-desynchronization dynamics. To stabilize the resulting extended transverse dynamics, we employ a combination of time-varying LQR and sliding-mode control. The theoretical results are validated experimentally through the implementation of both centralized and decentralized control strategies on a group of six Butterfly robots.
☆ Distributed Koopman Operator Learning from Sequential Observations
This paper presents a distributed Koopman operator learning framework for modeling unknown nonlinear dynamics using sequential observations from multiple agents. Each agent estimates a local Koopman approximation based on lifted data and collaborates over a communication graph to reach exponential consensus on a consistent distributed approximation. The approach supports distributed computation under asynchronous and resource-constrained sensing. Its performance is demonstrated through simulation results, validating convergence and predictive accuracy under sensing-constrained scenarios and limited communication.
☆ Voltage-sensitive distribution factors for contingency analysis and topology optimization
Topology optimization is a promising approach for mitigating congestion and managing changing grid conditions, but it is computationally challenging and requires approximations. Conventional distribution factors like PTDFs and LODFs, based on DC power flow, fail to capture voltage variations, reactive power, and losses - limiting their use in detailed optimization tasks such as busbar splitting. This paper introduces generalized distribution factors derived from a voltage-sensitive linearization of the full AC power flow equations. The proposed formulation accurately reflects reactive power flows, Ohmic losses, and voltage deviations while remaining computationally efficient. We derive and evaluate generalized PTDFs, LODFs, and topology modification factors using matrix identities. We discuss potential applications including voltage-aware N-1 security analysis, and topology optimization with a focus on busbar splitting. Numerical experiments demonstrate close agreement with full AC solutions, significantly outperforming the traditional DC approximation.
comment: 7 pages, 4 figures
☆ Control and Navigation of a 2-D Electric Rocket
This work addresses the control and navigation of a simulated two-dimensional electric rocket. The model provides a simplified framework that neglects actuator dynamics and aerodynamic effects while capturing the complexities of underactuation and state coupling. Trajectory tracking is achieved through a modularized and layered control architecture, with employement of a Linear Quadratic Regulator (LQR) and Lyapunov theory. Full-state estimation is achieved through Kalman filtering techniques, part of the navigation module. The solutions are thoroughly evaluated in a custom-built MATLAB/Simulink testbed, simulating real-world conditions while maintaining a simplified setup. The results reveal limitations along the lateral axis, whose resolution is suggested for future work.
☆ Modeling and Control of Deep Sign-Definite Dynamics with Application to Hybrid Powertrain Control
Deep learning is increasingly used for complex, large-scale systems where first-principles modeling is difficult. However, standard deep learning models often fail to enforce physical structure or preserve convexity in downstream control, leading to physically inconsistent predictions and discontinuous inputs owing to nonconvexity. We introduce sign constraints--sign restrictions on Jacobian entries--that unify monotonicity, positivity, and sign-definiteness; additionally, we develop model-construction methods that enforce them, together with a control-synthesis procedure. In particular, we design exactly linearizable deep models satisfying these constraints and formulate model predictive control as a convex quadratic program, which yields a unique optimizer and a Lipschitz continuous control law. On a two-tank system and a hybrid powertrain, the proposed approach improves prediction accuracy and produces smoother control inputs than existing methods.
comment: Submitted to Automatica
☆ Scalable and Approximation-free Symbolic Control for Unknown Euler-Lagrange Systems
We propose a novel symbolic control framework for enforcing temporal logic specifications in Euler-Lagrange systems that addresses the key limitations of traditional abstraction-based approaches. Unlike existing methods that require exact system models and provide guarantees only at discrete sampling instants, our approach relies only on bounds on system parameters and input constraints, and ensures correctness for the full continuous-time trajectory. The framework combines scalable abstraction of a simplified virtual system with a closed-form, model-free controller that guarantees trajectories satisfy the original specification while respecting input bounds and remaining robust to unknown but bounded disturbances. We provide feasibility conditions for the construction of confinement regions and analyze the trade-off between efficiency and conservatism. Case studies on pendulum dynamics, a two-link manipulator, and multi-agent systems, including hardware experiments, demonstrate that the proposed approach ensures both correctness and safety while significantly reducing computational time and memory requirements. These results highlight its scalability and practicality for real-world robotic systems where precise models are unavailable and continuous-time guarantees are essential.
☆ CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks
The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.
comment: Submitted to IEEE for review
☆ An early termination strategy for the distributed biased min-consensus protocol under disturbances
The distributed biased min-consensus (DBMC) protocol is an iterative scheme that solves the shortest path problem asymptotically, requiring only local information exchange between neighboring nodes. By appropriately designing the gain function, prior work [1] proposed a DBMC-based system that ensures convergence within a pre-specified time interval. However, this guarantee assumes the absence of disturbances. In this paper, we study the DBMC-based system under disturbances affecting the edge weights. We first establish rigorous error bounds on the resulting state estimates. Building on this analysis, we then propose a practical early termination strategy to prevent potential singularities, specifically, unbounded gain, that may arise in the presence of disturbances, while still ensuring that the shortest paths are correctly identified.Simulations are performed to validate and illustrate the theoretical results.
comment: paper accepted to IEEE ICNSC 2025
☆ Zonotope-Based Elastic Tube Model Predictive Control
Tube-based Model Predictive Control (MPC) is a widely adopted robust control framework for constrained linear systems under additive disturbance. The paper is focused on reducing the numerical complexity associated with the tube parameterization, described as a sequence of elastically-scaled zonotopic sets. A new class of scaled-zonotope inclusion conditions is proposed, alleviating the need for a priori specification of certain set-containment constraints and achieving significant reductions in complexity. A comprehensive complexity analysis is provided for both the polyhedral and the zonotopic setting, illustrating the trade-off between an enlarged domain of attraction and the required computational effort. The proposed approach is validated through extensive numerical experiments.
☆ Dispersion Formation Control: from Geometry to Distribution
We introduce and develop the concept of dispersion formation control, bridging a gap between shape-assembly studies in physics and biology and formation control theory. In current formation control studies, the control objectives typically focus on achieving desired local geometric properties, such as inter-agent distances, bearings, or relative positions. In contrast, our dispersion formation control approach enables agents to directly regulate the dispersion of their spatial distribution, a global variable associated with a covariance matrix. Specifically, we introduce the notion of covariance similarity to define the target spatial dispersion of agents. Building on this framework, we propose two control strategies: a centralized approach to illustrate the key ideas, and a distributed approach that enables agents to control the global dispersion but using only local information. Our stability analysis demonstrates that both strategies ensure exponential convergence of the agents' distribution to the desired dispersion. Notably, controlling a global variable rather than multiple local ones enhances the resiliency of the system, particularly against malfunctioning agents. Simulations validate the effectiveness of the proposed dispersion formation control.
comment: 8 pages, 4 figures
☆ Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).
comment: 10 pages, 12 figures
☆ SoK: A Systematic Review of Malware Ontologies and Taxonomies and Implications for the Quantum Era
The threat of quantum malware is real and a growing security concern that will have catastrophic scientific and technological impacts, if not addressed early. If weaponised or exploited especially by the wrong hands, malware will undermine highly sophisticated critical systems supported by next-generation quantum architectures, for example, in defence, communications, energy, and space. This paper explores the fundamental nature and implications of quantum malware to enable the future development of appropriate mitigations and defences, thereby protecting critical infrastructure. By conducting a systematic literature review (SLR) that draws on knowledge frameworks such as ontologies and taxonomies to explore malware, this provides insights into how malicious behaviours can be translated into attacks on quantum technologies, thereby providing a lens to analyse the severity of malware against quantum technologies. This study employs the European Competency Framework for Quantum Technologies (CFQT) as a guide to map malware behaviour to several competency layers, creating a foundation in this emerging field.
comment: 40 pages, 9 figures, 5 tables
Universal Dynamics with Globally Controlled Analog Quantum Simulators
Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
comment: 10 pages, 5 figures with Methods. HYH, AMG, and LC contributed equally to this work. Updated acknowledgement
♻ ☆ Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets
In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets (ST-GCS) enable us to generate minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static, as well as globally optimal trajectories in cluttered dynamic environments.
comment: 16 pages with references, 20 figures
♻ ☆ MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees NeurIPS 2025
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of $2\times$ cost savings compared to existing LLM routing techniques.
comment: NeurIPS 2025. Code: https://github.com/laminair/mess-plus
♻ ☆ Adaptive Policy Learning to Additional Tasks
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/\epsilon)$, respectively, where $T$ denotes the number of iterations and $\epsilon$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.
comment: The uploaded paper has technique issues, and we decide to withdraw it
♻ ☆ Probability-One Optimization of Generalized Rayleigh Quotient Sum For Multi-Source Generalized Total Least-Squares
This paper addresses the global optimization of the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. While various methods have been proposed for this problem, they fail to reliably converge to the global maximizer. To overcome this limitation, we propose an extension of the Riemannian Trust Region algorithm based on the probability-one homotopy optimization method, which enhances convergence to a global maximizer and, under certain conditions, ensures convergence to the global maximizer. In addition to the proposed method, existing state-of-the-art approaches are also presented, along with an explanation of their limitations and their connection to the proposed method. The proposed method is evaluated alongside the state-of-the-art approaches through numerical experiments, assessing convergence speed, success in reaching the global maximizer, and scalability with increasing problem dimension. Furthermore, we demonstrate how this ties in with the multi-source Bayesian Generalized Total Least-Squares (B-GTLS) problem, illustrating its applicability.
comment: This is the preprint version prior to peer review
♻ ☆ Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: 8 pages, 3 figures
♻ ☆ Unlocking Transmission Flexibility under Uncertainty: Getting Dynamic Line Ratings into Electricity Markets
Static transmission line ratings may lead to underutilization of line capacity due to overly conservative assumptions. Grid-enhancing technologies (GETs) such as dynamic line ratings (DLRs), which adjust line capacity based on real-time conditions, are a techno-economically viable alternative to increase the utilization of existing power lines. Nonetheless, their adoption has been slow, partly due to the absence of operational tools that effectively account for simultaneous impacts on dispatch and pricing. In this paper, we represent transmission capacity with DLRs as a stock-like resource with time-variant interdependency, which is modeled via an approximation of line temperature evolution process, decoupling the impacts of ambient weather conditions and power flow on transmission line temperature and thus capacity. We integrate DLRs into a multi-period DC optimal power flow problem, with chance constrains addressing correlated uncertainty in DLRs and renewable generation. This yields non-convex problems that we transform into a tractable convex form by linearization. We derive locational marginal energy and ancillary services prices consistent with a competitive equilibrium. Numerical experiments on the 11-zone and 1814-node NYISO systems demonstrate its performance, including impacts on dispatch, pricing, and marginal carbon emissions.
♻ ☆ Minimum Time Consensus of Multi-agent System under Fuel Constraints
This work addresses the problem of finding minimum time consensus point in the state space for a set of $N$ identical double integrator agents with bounded inputs and fixed fuel budget constraint. To address the problem, characterization of the attainable set for each agent subject to bounded inputs and fixed fuel budget constraints is done. Such attainable set is shown to be a convex set. The minimum time to consensus is the least time when the attainable sets of all agents intersect and the corresponding consensus state is the point of intersection. Using Helly's theorem, it is shown that the intersection is not empty at the time when all triplets of agents exhibit a non-empty intersection. Thus, a closed-form expression for the minimum time to consensus for a triplet of agents is obtained. The calculation of minimum time consensus for each of the triplets is performed independently and is distributed evenly among $N$ agents. The overall minimum time to consensus of $N$ agents is then given by the triplet that has the greatest minimum time to consensus. In the process, the set of initial conditions for agents from which consensus is possible under these input and fuel budget constraints is also characterized.
♻ ☆ Coordination Control of Discrete Event Systems under Cyber Attacks
In this paper, coordination control of discrete event systems under joint sensor and actuator attacks is investigated. Sensor attacks are described by a set of attack languages using a proposed ALTER model. Several local supervisors are used to control the system. The goal is to design local supervisors to ensure safety of the system even under cyber attacks (CA). The necessary and sufficient conditions for the existence of such supervisors are derived in terms of conditional decomposability, CA-controllability and CA-observability. A method is developed to calculate local state estimates under sensor attacks. Two methods are also developed to design local supervisors, one for discrete event systems satisfying conditional decomposability, CA-controllability and CA-observability, and one for discrete event systems satisfying conditional decomposability only. The approach works for both stealthy and non-stealthy attacks. A practical example is given to illustrate the results.
comment: 23 pages, attack model was redefined, description was improved
♻ ☆ Exploring Noncommutative Polynomial Equation Methods for Discrete-Time Quaternionic Control
We present new polynomial-based methods for discrete-time quaternionic systems, highlighting how noncommutative multiplication modifies classical control approaches. Defining quaternionic polynomials via a backward-shift operator, we examine left and right fraction representations of transfer functions, showing that right zeros correspond to similarity classes of quaternionic matrix right eigenvalues. We then propose a feedback design procedure that generalizes pole placement to quaternions - a first approach using a genuine quaternionic polynomial equation.
comment: Published in: 2025 33rd Mediterranean Conference on Control and Automation (MED), Tangier, Morocco, 2025. DOI: 10.1109/MED64031.2025.11073531. \c{opyright} 2025 IEEE. Personal use permitted; all other uses by permission of IEEE. Published version: https://doi.org/10.1109/MED64031.2025.11073531
♻ ☆ TuneS: Patient-specific model-based optimization of contact configuration in deep brain stimulation
Objective: The objective of this study is to develop and evaluate a systematic approach to optimize Deep Brain Stimulation (DBS) parameters, addressing the challenge of identifying patient-specific settings and optimal stimulation targets for various neurological and mental disorders. Methods: TuneS, a novel pipeline to predict clinically optimal DBS contact configurations based on predefined targets and constraints, is introduced. The method relies upon patient-specific models of stimulation spread and extends optimization beyond traditional neural structures to include automated, model-based targeting of streamlines. Results: Initial findings show that both the STN motor subdivision and STN motor streamlines are consistently engaged under clinical settings, while regions of avoidance receive minimal stimulation. Given these findings, the value of model-based contact predictions for assessing stimulation targets while observing anatomical constraints is demonstrated at the example of a small cohort of Parkinson's disease patients. The predicted settings were generally found to achieve higher target coverages while providing a better trade-off between maximizing target coverage and minimizing stimulation of regions associated with side effects. Conclusion: TuneS shows promise as a research tool, enabling systematic assessment of DBS target effectiveness and facilitating constraint-aware optimization of stimulation parameters. Significance: The presented pipeline offers a pathway to improve patient-specific DBS therapies and contributes to the broader understanding of effective DBS targeting strategies.
comment: 8 pages, 9 figures, under review for IEEE Transactions on Biomedical Engineering
♻ ☆ Scalable Two-Stage Stochastic Optimal Power Flow via Separable Approximation
This paper proposes a Separable Projective Approximation Routine-Optimal Power Flow (SPAR-OPF) framework for solving two-stage stochastic optimization problems in power systems. The framework utilizes a separable piecewise linear approximation of the value function and learns the function based on sample sub-gradient information. We present two formulations to model the learned value function, and compare their effectiveness. Additionally, an efficient statistical method is introduced to assess the quality of the obtained solutions. The effectiveness of the proposed framework is validated using distributed generation siting and sizing problem in three-phase unbalanced power distribution systems as an example. Results show that the framework approximates the value function with over 98% accuracy and provides high-quality solutions with an optimality gap of less than 1%. The framework scales efficiently with system size, generating high-quality solutions in a short time when applied to a 9500-node distribution system with 1200 scenarios, while the extensive formulations and progressive hedging failed to solve the problem.
♻ ☆ Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers
We study the steady-state performance of parallel-server systems under an immediate routing architecture with two sources of heterogeneity: servers and job classes, subject to compatibility constraints. We focus on the weighted-workload-task-allocation (WWTA) policy, a load-balancing scheme known to be throughput-optimal for such systems. Under a relaxed complete-resource-pooling (CRP) condition, we prove a "strong form" of state-space collapse in heavy traffic and that the scaled workload of each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Our analysis yields three main insights. First, the conventional heavy-traffic requirement of a unique static allocation plan can be dropped; a relaxed CRP condition suffices. Second, the limiting workload distribution is shown to be independent of local scheduling policy on server side, allowing substantial flexibility. Third, the inefficient (non-basic) activities prescribed by static allocation plan is proved to receive an asymptotically negligible fraction of routing and service, even though WWTA has no prior knowledge of which activities are basic, highlighting its robustness to changing arrival rates.
Optimization and Control 34
☆ Optimal phase change for a generalized Grover's algorithm
We study the generalized Grover's algorithm with an arbitrary amplitude vector to find the optimal phase change for maximizing the gain in probability for the target of each iteration. In the classic setting of Grover's algorithm with a real initial amplitude vector, we find that a phase change of $\pi$ stays optimal until the probability of observing the target is quite close to 1. We provide a formula for identifying this cut-off point based on the size of the data set. When the amplitude is truly complex, we find that the optimal phase change depends non-trivially on the complexity of the amplitude vector. We provide an optimization formula to identify the required optimal phase change.
comment: 21 pages, 3 figures
☆ A Recovery Theory for Diffusion Priors: Deterministic Analysis of the Implicit Prior Algorithm
Recovering high-dimensional signals from corrupted measurements is a central challenge in inverse problems. Recent advances in generative diffusion models have shown remarkable empirical success in providing strong data-driven priors, but rigorous recovery guarantees remain limited. In this work, we develop a theoretical framework for analyzing deterministic diffusion-based algorithms for inverse problems, focusing on a deterministic version of the algorithm proposed by Kadkhodaie \& Simoncelli \cite{kadkhodaie2021stochastic}. First, we show that when the underlying data distribution concentrates on a low-dimensional model set, the associated noise-convolved scores can be interpreted as time-varying projections onto such a set. This leads to interpreting previous algorithms using diffusion priors for inverse problems as generalized projected gradient descent methods with varying projections. When the sensing matrix satisfies a restricted isometry property over the model set, we can derive quantitative convergence rates that depend explicitly on the noise schedule. We apply our framework to two instructive data distributions: uniform distributions over low-dimensional compact, convex sets and low-rank Gaussian mixture models. In the latter setting, we can establish global convergence guarantees despite the nonconvexity of the underlying model set.
☆ A Recovery Guarantee for Sparse Neural Networks
We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.
comment: Code is available at https://github.com/voilalab/MLP-IHT
☆ On Robustness of Consensus over Pseudo-Undirected Path Graphs
Consensus over networked agents is typically studied using undirected or directed communication graphs. Undirected graphs enforce symmetry in information exchange, leading to convergence to the average of initial states, while directed graphs permit asymmetry but make consensus dependent on root nodes and their influence. Both paradigms impose inherent restrictions on achievable consensus values and network robustness. This paper introduces a theoretical framework for achieving consensus over a class of network topologies, termed pseudo-undirected graphs, which retains bidirectional connectivity between node pairs but allows the corresponding edge weights to differ, including the possibility of negative values under bounded conditions. The resulting Laplacian is generally non-symmetric, yet it guarantees consensus under connectivity assumptions, to expand the solution space, which enables the system to achieve a stable consensus value that can lie outside the convex hull of the initial state set. We derive admissibility bounds for negative weights for a pseudo-undirected path graph, and show an application in the simultaneous interception of a moving target.
☆ Beyond adjacency: Graph encoding with reachability and shortest paths
Graph-structured data is central to many scientific and industrial domains, where the goal is often to optimize objectives defined over graph structures. Given the combinatorial complexity of graph spaces, such optimization problems are typically addressed using heuristic methods, and it remains unclear how to systematically incorporate structural constraints to effectively reduce the search space. This paper introduces explicit optimization formulations for graph search space that encode properties such as reachability and shortest paths. We provide theoretical guarantees demonstrating the correctness and completeness of our graph encoding. To address the symmetry issues arising from graph isomorphism, we propose lexicographic constraints over neighborhoods to eliminate symmetries and theoretically prove that adding those constraints will not reduce the original graph space. Our graph encoding, along with the corresponding symmetry-breaking constraints, forms the basis for downstream optimization tasks over graph spaces.
comment: 24 pages, 12 figures, 4 tables
☆ Leaf it to renewal: Improved predictive maintenance policies via renewal theory and decision trees
We propose a general method for deriving prognostics-based predictive maintenance policies. The method takes into account the available decision options at hand, the information on the future state of the system provided by a prognostic model, as well as the costs of the underlying renewal-reward process. It results in heuristic policies with only a few parameters, which can be determined based on theoretical considerations or by optimization from run-to-failure data. We show the potential of the method on two separate predictive maintenance decision settings, namely preventive replacement and preventive ordering. Numerical investigations show that the derived heuristic policies achieve significantly lower cost ratios than other benchmark heuristic policies, while at the same time being more robust against overfitting.
☆ Ergodic Properties of Quantum Markov Semigroups
In this paper, we study the ergodic theorem for infinite-dimensional quantum Markov semigroups, originally introduced by Frigerio and Verri in 1982, and its latest version developed by Carbone and Girotti in 2021. We provide a sufficient condition that ensures exponential convergence towards the positive recurrent subspace, a well-known result for irreducible quantum Markov semigroups in finite-dimensional Hilbert spaces. Several illustrative examples are presented to demonstrate the application of the ergodic theorem. Moreover, we show that the positive recurrent subspace plays a crucial role in the study of global asymptotic stability.
comment: This will be published in IEEE CDC 2025
☆ Unifying HJB and Riccati equations: A Koopman operator approach to nonlinear optimal control
This paper proposes an operator-theoretic framework that recasts the minimal value function of a nonlinear optimal control problem as an abstract bilinear form on a suitable function space. The resulting bilinear form is shown to satisfy an operator equation with quadratic nonlinearity obtained by formulating the Lyapunov equation for a Koopman lift of the optimal closed-loop dynamics to an infinite-dimensional state space. It is proven that the minimal value function admits a rapidly convergent sum-of-squares expansion, a direct consequence of the fast spectral decay of the bilinear form. The framework thereby establishes a natural link between the Hamilton-Jacobi-Bellman and a Riccati-like operator equation and further motivates numerical low-rank schemes.
☆ Reproduction Number and Spatial Connectivity Structure Estimation via Graph Sparsity-Promoting Penalized Functional
During an epidemic outbreak, decision makers crucially need accurate and robust tools to monitor the pathogen propagation. The effective reproduction number, defined as the expected number of secondary infections stemming from one contaminated individual, is a state-of-the-art indicator quantifying the epidemic intensity. Numerous estimators have been developed to precisely track the reproduction number temporal evolution. Yet, COVID-19 pandemic surveillance raised unprecedented challenges due to the poor quality of worldwide reported infection counts. When monitoring the epidemic in different territories simultaneously, leveraging the spatial structure of data significantly enhances both the accuracy and robustness of reproduction number estimates. However, this requires a good estimate of the spatial structure. To tackle this major limitation, the present work proposes a joint estimator of the reproduction number and connectivity structure. The procedure is assessed through intensive numerical simulations on carefully designed synthetic data and illustrated on real COVID-19 spatiotemporal infection counts.
comment: 11 pages, 3 figures
☆ Approximation Analysis of the Entropic Penalty in Quadratic Programming
Quadratic assignment problems are a fundamental class of combinatorial optimization problems which are ubiquitous in applications, yet their exact resolution is NP-hard. To circumvent this impasse, it was proposed to regularize such problems via an entropic penalty, leading to computationally tractable proxies. Indeed, this enabled efficient algorithms, notably in the context of Gromov-Wasserstein (GW) problems, but it is unknown how well solutions of the regularized problem approximate those of the original one for small regularization parameters. Treating the broader framework of general quadratic programs (QPs), we establish that the approximation gap decays exponentially quickly for concave QPs, while the rate for general indefinite or convex QPs can be as slow as linear. Our analysis builds on the study of the entropic penalty in linear programming by leveraging a new representation for concave QPs, which connects them to a family of linear programs with varying costs. Building on these results, we design an algorithm which, given a local solution of the entropic QP, returns a candidate minimizer of the original QP and certifies it. We apply these findings to a general class of discrete GW problems, yielding new variational forms and the first exponentially vanishing entropic approximation bound in the GW literature.
comment: 30 pages, 5 figures
☆ Exploring the potential resource integration under passenger-freight shared mobility: collaborative optimization of multi-type bus scheduling and dynamic vehicle capacity allocation for urban-rural bus routes
Under the global background of developing urban-rural travel patterns, traditional urban-rural public transport systems are generally faced with the serious challenges of passenger loss and operating deficit, leading to a reduction in the bus frequency and service reliability. In order to break the vicious circle of demand decline-supply shrinkage, passenger-freight shared mobility (PFSM), an innovative operation mode, can achieve synergies between urban-rural logistics and public transport services by integrating public transit network resources and vehicle spare capacity. However, PFSM has changed the operating characteristics of urban-rural bus systems, posing some new challenges. To expand the relevant theory and find the solutions to those challenges, this study proposes an economy-efficiency-low-carbon -oriented resource reconfiguration strategy by formulating the collaborative bilevel optimization of multi-type bus scheduling and dynamic vehicle capacity allocation for urban-rural bus routes. The improved jellyfish search algorithm is developed to solve the premature convergence problem of the traditional algorithms in solving a high-dimensional hybrid discrete-continuous optimization. The results of a case of two urban-rural bus lines in Shanxi Province, China, indicate that the proposed scheme can improve operating revenue by 328.45% and reduce freight carbon emissions by 19.12 tons/year within the increase of 19.46% in average passenger travel time. The sensitivity analysis explicates key parameters selected for PFSM in terms of economic, efficiency and environmental dimensions. The proposed method provides some novel insights and solutions for the sustainable development of urban-rural public transport systems and the last kilometer problem of rural logistics, with significant values of both economic growth and environmental carbon reduction.
☆ GPU-accelerated FREDopt package for simultaneous dose and LETd proton radiotherapy plan optimization via superiorization methods
This study presents FREDopt, a newly developed GPU-accelerated open-source optimization software for simultaneous proton dose and dose-averaged LET (LETd) optimization in IMPT treatment planning. FREDopt was implemented entirely in Python, leveraging CuPy for GPU acceleration and incorporating fast Monte Carlo (MC) simulations from the FRED code. The treatment plan optimization workflow includes pre-optimization and optimization, the latter equipped with a novel superiorization of feasibility-seeking algorithms. Feasibility-seeking requires finding a point that satisfies prescribed constraints. Superiorization interlaces computational perturbations into iterative feasibility-seeking steps to steer them toward a superior feasible point, replacing the need for costly full-fledged constrained optimization. The method was validated on two treatment plans of patients treated in a clinical proton therapy center, with dose and LETd distributions compared before and after reoptimization. Simultaneous dose and LETd optimization using FREDopt led to a substantial reduction of LETd and (dose)x(LETd) in organs at risk (OARs) while preserving target dose conformity. Computational performance evaluation showed execution times of 14-50 minutes, depending on the algorithm and target volume size-satisfactory for clinical and research applications while enabling further development of the well-tested, documented open-source software.
comment: 29 pages. Open Access at: https://iopscience.iop.org/article/10.1088/1361-6560/ade841
Integrated Forecasting of Marine Renewable Power: An Adaptively Bayesian-Optimized MVMD-LSTM Framework for Wind-Solar-Wave Energy
Integrated wind-solar-wave marine energy systems hold broad promise for supplying clean electricity in offshore and coastal regions. By leveraging the spatiotemporal complementarity of multiple resources, such systems can effectively mitigate the intermittency and volatility of single-source outputs, thereby substantially improving overall power-generation efficiency and resource utilization. Accurate ultra-short-term forecasting is crucial for ensuring secure operation and optimizing proactive dispatch. However, most existing forecasting methods construct separate models for each energy source, insufficiently account for the complex couplings among multiple energies, struggle to capture the system's nonlinear and nonstationary dynamics, and typically depend on extensive manual parameter tuning-limitations that constrain both predictive performance and practicality. We address this issue using a Bayesian-optimized Multivariate Variational Mode Decomposition-Long Short-Term Memory (MVMD-LSTM) framework. The framework first applies MVMD to jointly decompose wind, solar and wave power series so as to preserve cross-source couplings; it uses Bayesian optimization to automatically search the number of modes and the penalty parameter in the MVMD process to obtain intrinsic mode functions (IMFs); finally, an LSTM models the resulting IMFs to achieve ultra-short-term power forecasting for the integrated system. Experiments based on field measurements from an offshore integrated energy platform in China show that the proposed framework significantly outperforms benchmark models in terms of MAPE, RMSE and MAE. The results demonstrate superior predictive accuracy, robustness, and degree of automation.
☆ Voltage-sensitive distribution factors for contingency analysis and topology optimization
Topology optimization is a promising approach for mitigating congestion and managing changing grid conditions, but it is computationally challenging and requires approximations. Conventional distribution factors like PTDFs and LODFs, based on DC power flow, fail to capture voltage variations, reactive power, and losses - limiting their use in detailed optimization tasks such as busbar splitting. This paper introduces generalized distribution factors derived from a voltage-sensitive linearization of the full AC power flow equations. The proposed formulation accurately reflects reactive power flows, Ohmic losses, and voltage deviations while remaining computationally efficient. We derive and evaluate generalized PTDFs, LODFs, and topology modification factors using matrix identities. We discuss potential applications including voltage-aware N-1 security analysis, and topology optimization with a focus on busbar splitting. Numerical experiments demonstrate close agreement with full AC solutions, significantly outperforming the traditional DC approximation.
comment: 7 pages, 4 figures
☆ A twice continuously differentiable penalty function for nonlinear semidefinite programming problems and its application
This paper presents a twice continuously differentiable penalty function for nonlinear semidefinite programming problems. In some optimization methods, such as penalty methods and augmented Lagrangian methods, their convergence property can be ensured by incorporating a penalty function into them, and hence several types of penalty functions have been proposed. In particular, these functions are designed to apply optimization methods to find first-order stationary points. Meanwhile, in recent years, second-order sequential optimality, such as Approximate Karush-Kuhn-Tucker2 (AKKT2) and Complementarity AKKT2 (CAKKT2) conditions, has been introduced, and the development of methods for such second-order stationary points would be required in future research. However, existing well-known penalty functions have low compatibility with such methods because they are not twice continuously differentiable. In contrast, the proposed function is expected to have a high affinity for methods to find second-order stationary points. To verify the high affinity, we also present a practical penalty method to find points that satisfy the AKKT and CAKKT conditions by exploiting the proposed function and show their convergence properties.
comment: 17 pages
☆ Optimal Control in Infinite Dimensional Spaces and Economic Modeling: State of the Art and Perspectives
This survey collects, within a unified framework, various results (primarily by the authors themselves) on the use of Deterministic Infinite-Dimensional Optimal Control Theory to address applied economic models. The main aim is to illustrate, through several examples, the typical features of such models (including state constraints, non-Lipschitz data, and non-regularizing differential operators) and the corresponding methods needed to handle them. This necessitates developing aspects of the existing Deterministic Infinite-Dimensional Optimal Control Theory (see, e.g., the book by Li and Yong, 2012) in specific and often nontrivial directions. Given the breadth of this area, we emphasize the Dynamic Programming Approach and its application to problems where explicit or quasi-explicit solutions of the associated Hamilton-Jacobi-Bellman (HJB) equations can be obtained. We also provide insights and references for cases where such explicit solutions are not available.
☆ Lyndon Word Transduction to Improve the Efficiency of Computing Chen-Fliess Series
The class of input-output systems representable as Chen-Fliess series arises often in control theory. One well known drawback of this representation, however, is that the iterated integrals which appear in these series are algebraically related by the shuffle product. This becomes relevant when one wants to numerically evaluate these series in applications, as this redundancy leads to unnecessary computational expense. The general goal of this paper is to present a computationally efficient way to evaluate these series by introducing what amounts to a change of basis for the computation. The key idea is to use the fact that the shuffle algebra on (proper) polynomials over a finite alphabet is isomorphic to the polynomial algebra generated by the Lyndon words over this alphabet. The iterated integrals indexed by Lyndon words contain all the input information needed to compute the output. The change of basis is accomplished by applying a transduction, that is, a linear map between formal power series in different alphabets, to re-index the Chen-Fliess series in terms of Lyndon monomials. The method is illustrated using a simulation of a continuously stirred-tank reactor system under a cyber-physical attack.
☆ An Alternating Direction Method of Multipliers for Topology Optimization
We consider a class of integer-constrained optimization problems governed by partial differential equation (PDE) constraints and regularized via total variation (TV) in the context of topology optimization. The presence of discrete design variables, nonsmooth regularization, and non-convex objective renders the problem computationally challenging. To address this, we adopt the alternating direction method of multipliers (ADMM) framework, which enables a decomposition of the original problem into simpler subproblems that can be solved efficiently. The augmented Lagrangian formulation ensures consistency across variable updates while facilitating convergence under appropriate conditions.
☆ Sparse Regularization by Smooth Non-separable Non-convex Penalty Function Based on Ultra-discretization Formula
In sparse optimization, the $\ell_{1}$ norm is widely adopted for its convexity, yet it often yields solutions with smaller magnitudes than expected. To mitigate this drawback, various non-convex sparse penalties have been proposed. Some employ non-separability, with ordered weighting as an effective example, to retain large components while suppressing small ones. Motivated by these approaches, we propose ULPENS, a non-convex, non-separable sparsity-inducing penalty function that enables control over the suppression of elements. Derived from the ultra-discretization formula, ULPENS can continuously interpolate between the $\ell_{1}$ norm and a non-convex selective suppressing function by adjusting parameters inherent to the formula. With the formula, ULPENS is smooth, allowing the use of efficient gradient-based optimization algorithms. We establish key theoretical properties of ULPENS and demonstrate its practical effectiveness through numerical experiments.
comment: This work has been submitted to the IEEE for possible publication
☆ Modeling and Control of Deep Sign-Definite Dynamics with Application to Hybrid Powertrain Control
Deep learning is increasingly used for complex, large-scale systems where first-principles modeling is difficult. However, standard deep learning models often fail to enforce physical structure or preserve convexity in downstream control, leading to physically inconsistent predictions and discontinuous inputs owing to nonconvexity. We introduce sign constraints--sign restrictions on Jacobian entries--that unify monotonicity, positivity, and sign-definiteness; additionally, we develop model-construction methods that enforce them, together with a control-synthesis procedure. In particular, we design exactly linearizable deep models satisfying these constraints and formulate model predictive control as a convex quadratic program, which yields a unique optimizer and a Lipschitz continuous control law. On a two-tank system and a hybrid powertrain, the proposed approach improves prediction accuracy and produces smoother control inputs than existing methods.
comment: Submitted to Automatica
☆ Minimal time control for the heat equation with multiple impulses
This paper investigates a minimal time control problem for the heat equation with multiple impulse controls. We first establish the maximum principles for this problem and then prove the equivalence between the minimal time impulse control problem and its corresponding minimal norm impulse control problem. Extending our analysis to the minimal time function itself, we examine its analytical properties (particularly continuity and monotonicity) with respect to the control constraint. This leads to a notable discovery: despite being continuous and generally monotonic, the function may fail to be strictly decreasing.
comment: 21 pages,2 figures
☆ ALNS for Tugboat Scheduling in Inland Waterway
This paper focuses on the barges shipping problem, also known as the tugboats scheduling problem, within the context of a scenario where a single tugboat has the capacity to tow multiple barges and conduct multiple trips in a drop-and-pull mode during a daily work shift. The problem is mathematically formalized as mixed-integer programming models. To tackle real-world-sized problem instances, an adaptive large neighborhood search (ALNS) algorithm integrated with a decoding mathematical model is proposed. When applied to large-scale instances, the ALNS algorithm showcases performance superiority over the strengthened mathematical model.
☆ Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).
comment: 10 pages, 12 figures
☆ A Riemannian AdaGrad-Norm Method
We propose a manifold AdaGrad-Norm method (\textsc{MAdaGrad}), which extends the norm version of AdaGrad (AdaGrad-Norm) to Riemannian optimization. In contrast to line-search schemes, which may require several exponential map computations per iteration, \textsc{MAdaGrad} requires only one. Assuming the objective function $f$ has Lipschitz continuous Riemannian gradient, we show that the method requires at most $\mathcal{O}(\varepsilon^{-2})$ iterations to compute a point $x$ such that $\|\operatorname{grad} f(x)\|\leq \varepsilon$. Under the additional assumptions that $f$ is geodesically convex and the manifold has sectional curvature bounded from below, we show that the method takes at most $\mathcal{O}(\varepsilon^{-1})$ to find $x$ such that $f(x)-f_{low}\leq\epsilon$, where $f_{low}$ is the optimal value. Moreover, if $f$ satisfies the Polyak--\L{}ojasiewicz condition globally on the manifold, we establish a complexity bound of $\mathcal{O}(\log(\varepsilon^{-1}))$, provided that the norm of the initial Riemannian gradient is sufficiently large. For the manifold of symmetric positive definite matrices, we construct a family of nonconvex functions satisfying the PL condition. Numerical experiments illustrate the remarkable performance of \textsc{MAdaGrad} in comparison with Riemannian Steepest Descent equipped with Armijo line-search.
☆ Efficient Online Large-Margin Classification via Dual Certificates
Online classification is a central problem in optimization, statistical learning and data science. Classical algorithms such as the perceptron offer efficient updates and finite mistake guarantees on linearly separable data, but they do not exploit the underlying geometric structure of the classification problem. We study the offline maximum margin problem through its dual formulation and use the resulting geometric insights to design a principled and efficient algorithm for the online setting. A key feature of our method is its translation invariance, inherited from the offline formulation, which plays a central role in its performance analysis. Our theoretical analysis yields improved mistake and margin bounds that depend only on translation-invariant quantities, offering stronger guarantees than existing algorithms under the same assumptions in favorable settings. In particular, we identify a parameter regime where our algorithm makes at most two mistakes per sequence, whereas the perceptron can be forced to make arbitrarily many mistakes. Our numerical study on real data further demonstrates that our method matches the computational efficiency of existing online algorithms, while significantly outperforming them in accuracy.
♻ ☆ Bayesian Optimization with Preference Exploration using a Monotonic Neural Network Ensemble
Many real-world black-box optimization problems have multiple conflicting objectives. Rather than attempting to approximate the entire set of Pareto-optimal solutions, interactive preference learning allows to focus the search on the most relevant subset. However, few previous studies have exploited the fact that utility functions are usually monotonic. In this paper, we address the Bayesian Optimization with Preference Exploration (BOPE) problem and propose using a neural network ensemble as a utility surrogate model. This approach naturally integrates monotonicity and supports pairwise comparison data. Our experiments demonstrate that the proposed method outperforms state-of-the-art approaches and exhibits robustness to noise in utility evaluations. An ablation study highlights the critical role of monotonicity in enhancing performance.
♻ ☆ Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets
In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets (ST-GCS) enable us to generate minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static, as well as globally optimal trajectories in cluttered dynamic environments.
comment: 16 pages with references, 20 figures
♻ ☆ Fair Clustering with Minimum Representation Constraints
Clustering is a well-studied unsupervised learning task that aims to partition data points into a number of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels), where a group (e.g., social or demographic) benefits only if it reaches a minimum level of representation in the cluster (e.g., 50% to elect their preferred candidate). In this paper, we study the k-means and k-medians clustering problems under the additional fairness constraint that each group must attain a minimum level of representation in at least a specified number of clusters. We formulate this problem as a mixed-integer (nonlinear) optimization problem and propose an alternating minimization algorithm, called MiniReL, to solve it. Although incorporating fairness constraints results in an NP-hard assignment problem within the MiniReL algorithm, we present several heuristic strategies that make the approach practical even for large datasets. Numerical results demonstrate that our method yields fair clusters without increasing clustering cost across standard benchmark datasets.
comment: arXiv admin note: text overlap with arXiv:2302.03151
♻ ☆ Probability-One Optimization of Generalized Rayleigh Quotient Sum For Multi-Source Generalized Total Least-Squares
This paper addresses the global optimization of the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. While various methods have been proposed for this problem, they fail to reliably converge to the global maximizer. To overcome this limitation, we propose an extension of the Riemannian Trust Region algorithm based on the probability-one homotopy optimization method, which enhances convergence to a global maximizer and, under certain conditions, ensures convergence to the global maximizer. In addition to the proposed method, existing state-of-the-art approaches are also presented, along with an explanation of their limitations and their connection to the proposed method. The proposed method is evaluated alongside the state-of-the-art approaches through numerical experiments, assessing convergence speed, success in reaching the global maximizer, and scalability with increasing problem dimension. Furthermore, we demonstrate how this ties in with the multi-source Bayesian Generalized Total Least-Squares (B-GTLS) problem, illustrating its applicability.
comment: This is the preprint version prior to peer review
♻ ☆ Infinite Euclidean Distance Discriminants
We study infinite Euclidean distance discriminants of algebraic varieties, defined as the loci of data points whose fibers under the second projection from the Euclidean distance correspondence are positive-dimensional. In particular, these discriminants contain all data points with infinitely many critical points for the nearest-point problem. We present computer code that computes the infinite Euclidean distance discriminant, and use it to present numerous varieties with nonempty such discriminants. Moreover, we prove that for any data point, the fiber under the second projection is contained in a finite union of hyperspheres centered at that point. For curves, we include a complete characterization; their infinite Euclidean distance discriminants turn out to be affine linear spaces. Finally, we introduce and characterize skew-tube surfaces in three-dimensional space. By construction, these have a one-dimensional infinite Euclidean distance discriminant. We further demonstrate that many skew-tubes have significantly lower Euclidean distance degrees than generic surfaces of the same degree.
♻ ☆ Asymmetric Feedback Learning in Online Convex Games
This paper considers convex games involving multiple agents that aim to minimize their own cost functions using locally available information. A common assumption in the study of such games is that the agents are symmetric, meaning that they have access to the same type of information. Here we lift this assumption, which is often violated in practice, and instead consider asymmetric agents; specifically, we assume some agents have access to first-order gradient information and others have access to the zeroth-order oracles (cost function evaluations). We propose an asymmetric learning algorithm that combines the agent information mechanisms. We analyze the regret and Nash equilibrium convergence of this algorithm for convex and strongly monotone games, respectively. Specifically, we show that our algorithm always performs between pure first- and zeroth-order methods, and can match the performance of these two extremes by adjusting the number of agents with access to zeroth-order oracles. Therefore, our algorithm incorporates the pure first- and zeroth-order methods as special cases. We provide numerical experiments on a market problem for both deterministic and risk-averse games to demonstrate the performance of the proposed algorithm.
♻ ☆ Scalable Two-Stage Stochastic Optimal Power Flow via Separable Approximation
This paper proposes a Separable Projective Approximation Routine-Optimal Power Flow (SPAR-OPF) framework for solving two-stage stochastic optimization problems in power systems. The framework utilizes a separable piecewise linear approximation of the value function and learns the function based on sample sub-gradient information. We present two formulations to model the learned value function, and compare their effectiveness. Additionally, an efficient statistical method is introduced to assess the quality of the obtained solutions. The effectiveness of the proposed framework is validated using distributed generation siting and sizing problem in three-phase unbalanced power distribution systems as an example. Results show that the framework approximates the value function with over 98% accuracy and provides high-quality solutions with an optimality gap of less than 1%. The framework scales efficiently with system size, generating high-quality solutions in a short time when applied to a 9500-node distribution system with 1200 scenarios, while the extensive formulations and progressive hedging failed to solve the problem.
♻ ☆ Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers
We study the steady-state performance of parallel-server systems under an immediate routing architecture with two sources of heterogeneity: servers and job classes, subject to compatibility constraints. We focus on the weighted-workload-task-allocation (WWTA) policy, a load-balancing scheme known to be throughput-optimal for such systems. Under a relaxed complete-resource-pooling (CRP) condition, we prove a "strong form" of state-space collapse in heavy traffic and that the scaled workload of each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Our analysis yields three main insights. First, the conventional heavy-traffic requirement of a unique static allocation plan can be dropped; a relaxed CRP condition suffices. Second, the limiting workload distribution is shown to be independent of local scheduling policy on server side, allowing substantial flexibility. Third, the inefficient (non-basic) activities prescribed by static allocation plan is proved to receive an asymptotically negligible fraction of routing and service, even though WWTA has no prior knowledge of which activities are basic, highlighting its robustness to changing arrival rates.
♻ ☆ On the inadequacy of nudging data assimilation algorithms for non-dissipative systems
In this work, we study the applicability of the Azouani-Olson-Titi (AOT) nudging algorithm for continuous data assimilation to evolutionary dynamical systems that are not dissipative. Specifically, we apply the AOT algorithm to a partially dissipative variant of the Lorenz 1963 system, the Korteweg-de Vries equation (KdV) in 1D, and the 2D incompressible Euler equations. Our analysis reveals that both the Euler and KdV equations lack the finitely many determining modes property, leading to the construction of infinitely many solutions with exactly the same sparse observational data, which data assimilation methods cannot distinguish between. Simultaneously, we numerically verify that the AOT algorithm successfully recovers these counterexamples for the damped and driven KdV equation, which is dissipative. Additionally, to further support our argument, we present numerical evidence showing that the AOT algorithm is ineffective at accurately recovering solutions for a partially dissipative variant of the Lorenz 1963 system.
Systems and Control 48
☆ Minimalistic Autonomous Stack for High-Speed Time-Trial Racing
Autonomous racing has seen significant advancements, driven by competitions such as the Indy Autonomous Challenge (IAC) and the Abu Dhabi Autonomous Racing League (A2RL). However, developing an autonomous racing stack for a full-scale car is often constrained by limited access to dedicated test tracks, restricting opportunities for real-world validation. While previous work typically requires extended development cycles and significant track time, this paper introduces a minimalistic autonomous racing stack for high-speed time-trial racing that emphasizes rapid deployment and efficient system integration with minimal on-track testing. The proposed stack was validated on real speedways, achieving a top speed of 206 km/h within just 11 hours' practice run on the track with 325 km in total. Additionally, we present the system performance analysis, including tracking accuracy, vehicle dynamics, and safety considerations, offering insights for teams seeking to rapidly develop and deploy an autonomous racing stack with limited track access.
comment: The data associated with this paper is available at https://doi.org/10.5281/zenodo.17187680
☆ Federated Aggregation of Demand Flexibility
This paper proposes a federated framework for demand flexibility aggregation to support grid operations. Unlike existing geometric methods that rely on a static, pre-defined base set as the geometric template for aggregation, our framework establishes a true federated process by enabling the collaborative optimization of this base set without requiring the participants sharing sensitive data with the aggregator. Specifically, we first formulate the base set optimization problem as a bilevel program. Using optimal solution functions, we then reformulate the bilevel program into a single-level, unconstrained learning task. By exploiting the decomposable structure of the overall gradient, we further design a decentralized gradient-based algorithm to solve this learning task. The entire framework, encompassing base set optimization, aggregation, and disaggregation, operates by design without exchanging raw user data. Numerical results demonstrate that our proposed framework unlocks substantially more flexibility than the approaches with static base sets, thus providing a promising framework for efficient and privacy-enhanced approaches to coordinate demand flexibility at scale.
comment: Longer version of a paper to be submitted to IEEE Transactions on Smart Grid
☆ Modular Machine Learning with Applications to Genetic Circuit Composition
In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.
☆ From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting
The widespread deployment of autonomous systems in safety-critical environments such as urban air mobility hinges on ensuring reliable, performant, and safe operation under varying environmental conditions. One such approach, value function-based safety filters, minimally modifies a nominal controller to ensure safety. Recent advances leverage offline learned value functions to scale these safety filters to high-dimensional systems. However, these methods assume detailed priors on all possible sources of model mismatch, in the form of disturbances in the environment -- information that is rarely available in real world settings. Even in well-mapped environments like urban canyons or industrial sites, drones encounter complex, spatially-varying disturbances arising from payload-drone interaction, turbulent airflow, and other environmental factors. We introduce SPACE2TIME, which enables safe and adaptive deployment of offline-learned safety filters under unknown, spatially-varying disturbances. The key idea is to reparameterize spatial variations in disturbance as temporal variations, enabling the use of precomputed value functions during online operation. We validate SPACE2TIME on a quadcopter through extensive simulations and hardware experiments, demonstrating significant improvement over baselines.
comment: The first three authors contributed equally. This work has been accepted for publication at the Conference on Robot Learning
☆ Trust and Transparency in AI: Industry Voices on Data, Ethics, and Compliance
The EU Artificial Intelligence (AI) Act directs businesses to assess their AI systems to ensure they are developed in a way that is human-centered and trustworthy. The rapid adoption of AI in the industry has outpaced ethical evaluation frameworks, leading to significant challenges in accountability, governance, data quality, human oversight, technological robustness, and environmental and societal impacts. Through structured interviews with fifteen industry professionals, paired with a literature review conducted on each of the key interview findings, this paper investigates practical approaches and challenges in the development and assessment of Trustworthy AI (TAI). The findings from participants in our study, and the subsequent literature reviews, reveal complications in risk management, compliance and accountability, which are exacerbated by a lack of transparency, unclear regulatory requirements and a rushed implementation of AI. Participants reported concerns that technological robustness and safety could be compromised by model inaccuracies, security vulnerabilities, and an overreliance on AI without proper safeguards in place. Additionally, the negative environmental and societal impacts of AI, including high energy consumption, political radicalisation, loss of culture and reinforcement of social inequalities, are areas of concern. There is a pressing need not just for risk mitigation and TAI evaluation within AI systems but for a wider approach to developing an AI landscape that aligns with the social and cultural values of the countries adopting those technologies.
☆ Metriplectic Conditional Flow Matching for Dissipative Dynamics
Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.
☆ Four-Transistor Bipolar Series-Parallel Module Structure for Cascaded Bridge and Modular Multilevel Circuits
With their great scalability and flexibility, cascaded-bridge and modular multilevel converters have enabled a variety of energy applications, such as offshore wind power, high-voltage dc power transmission, power-quality management, and cutting-edge medical instrumentation. The incorporation of parallel connectivity between modules equips systems with advantages such as sensorless balancing, switched-capacitor energy exchange, and reduced impedance. However, existing topologies require many individual switches -- eight transistors per module. Efforts to use fewer switches, instead, have previously compromised their functionality. We propose a new module topology, named the direction-selective parallel (DiSeP) structure, which requires only four transistors per module -- the same as an H bridge -- but can achieve bidirectional equilibration, bipolar module output, and inter-module switched-capacitor features. This topology is highly attractive for existing converters with cascaded bridge elements, as the addition of only four diodes enables key features such as sensorless balancing and inter-module energy exchange. Thus, the module can outcompete H bridges in their applications, as it adds parallel modes without any additional transistors. Compared to double-H bridges (CH2B), it saves as many as half of the transistors. We elaborate on its working principles and key design considerations. We validate our theories on an experimental prototype with six modules. This prototype attains a total voltage harmonic distortion plus noise (THD+N) of 10.3% and a peak efficiency of 96.3%. Furthermore, the modules achieve autonomous sensorless balancing under open-loop control.
☆ linrax: A JAX Compatible, Simplex Method Linear Program Solver
We present linrax, the first simplex based linear program (LP) solver compatible with the JAX ecosystem. In many control algorithms, LPs are often automatically generated and frequently solved either offline or online in the control loop. This motivates the design of linrax, which is especially suited for compilation into a complex JAX-based pipeline as a subroutine. We discuss the challenges associated with implementing a general purpose LP solver under strict design requirements from JAX. Notably, we can solve general problems which may include dependent constraints-something not possible with existing JAX-compatible LP solvers that use first-order techniques and may fail to converge. We demonstrate the utility of linrax through several examples, including a robust control synthesis pipeline for a nonlinear vehicle model using automatic differentiation through a LP-based reachable set framework.
comment: 8 pages, 3 figures
☆ Robust Near-Optimal Nonlinear Target Enclosing Guidance
This paper proposes a nonlinear optimal guidance law that enables a pursuer to enclose a target within arbitrary geometric patterns, which extends beyond conventional circular encirclement. The design operates using only relative state measurements and formulates a target enclosing guidance law in which the vehicle's lateral acceleration serves as the steering control, making it well-suited for aerial vehicles with turning constraints. Our approach generalizes and extends existing guidance strategies that are limited to target encirclement and provides a degree of optimality. At the same time, the exact information of the target's maneuver is unnecessary during the design. The guidance law is developed within the framework of a state-dependent Riccati equation (SDRE), thereby providing a systematic way to handle nonlinear dynamics through a pseudo-linear representation to design locally optimal feedback guidance commands through state-dependent weighting matrices. While SDRE ensures near-optimal performance in the absence of strong disturbances, we further augment the design to incorporate an integral sliding mode manifold to compensate when disturbances push the system away from the nominal trajectory, and demonstrate that the design provides flexibility in the sense that the (possibly time-varying) stand-off curvature could also be treated as unknown. Simulations demonstrate the efficacy of the proposed approach.
☆ Crater Observing Bio-inspired Rolling Articulator (COBRA)
NASA aims to establish a sustainable human basecamp on the Moon as a stepping stone for future missions to Mars and beyond. The discovery of water ice on the Moon's craters located in permanently shadowed regions, which can provide drinking water, oxygen, and rocket fuel, is therefore of critical importance. However, current methods to access lunar ice deposits are limited. While rovers have been used to explore the lunar surface for decades, they face significant challenges in navigating harsh terrains, such as permanently shadowed craters, due to the high risk of immobilization. This report introduces COBRA (Crater Observing Bio-inspired Rolling Articulator), a multi-modal snake-style robot designed to overcome mobility challenges in Shackleton Crater's rugged environment. COBRA combines slithering and tumbling locomotion to adapt to various crater terrains. In snake mode, it uses sidewinding to traverse flat or low inclined surfaces, while in tumbling mode, it forms a circular barrel by linking its head and tail, enabling rapid movement with minimal energy on steep slopes. Equipped with an onboard computer, stereo camera, inertial measurement unit, and joint encoders, COBRA facilitates real-time data collection and autonomous operation. This paper highlights COBRAs robustness and efficiency in navigating extreme terrains through both simulations and experimental validation.
☆ Automatic and Scalable Safety Verification using Interval Reachability with Subspace Sampling
Interval refinement is a technique for reducing the conservatism of traditional interval based reachability methods by lifting the system to a higher dimension using new auxiliary variables and exploiting the introduced structure through a refinement procedure. We present a novel, efficiently scaling, automatic refinement strategy based on a subspace sampling argument and motivated by reducing the number of interval operations through sparsity. Unlike previous methods, we guarantee that refined bounds shrink as additional auxiliary variables are added. This additionally encourages automation of the lifting phase by allowing larger groups of auxiliary variables to be considered. We implement our strategy in JAX, a high-performance computational toolkit for Python and demonstrate its efficacy on several examples, including regulating a multi-agent platoon to the origin while avoiding an obstacle.
comment: 6 pages, 3 figures. Updated to correct a small error in the dynamics presented in equation (12)
☆ Policy Gradient Bounds in Multitask LQR
We analyze the performance of policy gradient in multitask linear quadratic regulation (LQR), where the system and cost parameters differ across tasks. The main goal of multitask LQR is to find a controller with satisfactory performance on every task. Prior analyses on relevant contexts fail to capture closed-loop task similarities, resulting in conservative performance guarantees. To account for such similarities, we propose bisimulation-based measures of task heterogeneity. Our measures employ new bisimulation functions to bound the cost gradient distance between a pair of tasks in closed loop with a common stabilizing controller. Employing these measures, we derive suboptimality bounds for both the multitask optimal controller and the asymptotic policy gradient controller with respect to each of the tasks. We further provide conditions under which the policy gradient iterates remain stabilizing for every system. For multiple random sets of certain tasks, we observe that our bisimulation-based measures improve upon baseline measures of task heterogeneity dramatically.
☆ Proactive-reactive detection and mitigation of intermittent faults in robot swarms
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
☆ Watts and Drops: Co-Scheduling Power and Water in Desalination Plants
We develop a mathematical framework to jointly schedule water and electricity in a profit-maximizing renewable colocated water desalination plant that integrates both thermal and membrane based technologies. The price-taking desalination plant sells desalinated water to a water utility at a given price and engages in bidirectional electricity transactions with the grid, purchasing or selling power based on its net electricity demand. We show that the optimal scheduling policy depends on the plant's internal renewable generation and follows a simple threshold structure. Under the optimal policy, thermal based water output decreases monotonically with renewable output, while membrane based water output increases monotonically. We characterize the structure and intuition behind the threshold policy and examine key special properties.
comment: 5 pages, 6 figures. To appear in Proceedings of the 61st Allerton Conference on Communication, Control, and Computing
☆ Robust Synchronous Reference Frame Phase-Looked Loop (PLL) with Feed-Forward Frequency Estimation
Synchronous reference frame phase-looked loop (SRF-PLL) techniques are widely used for interfacing and control applications in the power systems and energy conversion at large. Since a PLL system synchronizes its output with an exogenous harmonic signal, often 3-phases voltage or current, the locking of the frequency and phase angle depends on the performance of the feedback loop with at least two integrator terms, and on the distortions of the measured input quantities. For the conventional SRF-PLL with a proportional-integral (PI) control in feedback, we are providing a robust design which maximizes the phase margin and uses the normalization scheme for yielding the loop insensitive to the input amplitude variations. The main improvement in the transient behavior and also in tracking of frequency ramps is achieved by using the robust feed-forward frequency estimator, which is model-free and suitable for the noisy and time-varying harmonic signals. The proposed feed-forward-feedback SRF-PLL scheme is experimentally evaluated on the 3-phases harmonic currents from standard PMSM drives with varying angular speeds and loads. Both, the tracked angular frequency and locked phase angle are assessed as performance metrics of the robust SRF-PLL scheme with feedforwarding.
comment: 8 pages, 9 figures
☆ A Fast Initialization Method for Neural Network Controllers: A Case Study of Image-based Visual Servoing Control for the multicopter Interception
Reinforcement learning-based controller design methods often require substantial data in the initial training phase. Moreover, the training process tends to exhibit strong randomness and slow convergence. It often requires considerable time or high computational resources. Another class of learning-based method incorporates Lyapunov stability theory to obtain a control policy with stability guarantees. However, these methods generally require an initially stable neural network control policy at the beginning of training. Evidently, a stable neural network controller can not only serve as an initial policy for reinforcement learning, allowing the training to focus on improving controller performance, but also act as an initial state for learning-based Lyapunov control methods. Although stable controllers can be designed using traditional control theory, designers still need to have a great deal of control design knowledge to address increasingly complicated control problems. The proposed neural network rapid initialization method in this paper achieves the initial training of the neural network control policy by constructing datasets that conform to the stability conditions based on the system model. Furthermore, using the image-based visual servoing control for multicopter interception as a case study, simulations and experiments were conducted to validate the effectiveness and practical performance of the proposed method. In the experiment, the trained control policy attains a final interception velocity of 15 m/s.
☆ AI-Enabled Smart Hygiene System for Real-Time Glucose Detection
This research presents a smart urinary health monitoring system incorporating a coplanar waveguide (CPW)-fed slot-loop antenna biosensor designed to analyse various urine samples. The antenna demonstrates distinct resonant frequency shifts when exposed to five specific urine conditions, deviating from its baseline 1.42 GHz operation. These measurable frequency variations enable the antenna to function as an effective microwave sensor for urinary biomarker detection. A potential artificial intelligence-based Convolutional Neural Networks Long Short-Term Memory (CNN-LSTM) framework is also discussed to overcome the limitations of overlapping frequency responses, aiming to improve the accuracy of health condition detection. These components contribute to the development of a smart toilet system that displays real-time health information on a wall-mounted urinal screen, without requiring any user effort or behavioural change.
☆ A Weighted Least Squares Error Hetero-functional Graph State Estimator of the American Multi-modal Energy System
As one of the most pressing challenges of the 21st century, global climate change demands a host of changes across at least four critical energy infrastructures: the electric grid, the natural gas system, the oil system, and the coal system. In the context of the United States, this paper refers to this system-of-systems as ``The American Multi-Modal Energy System (AMES)". These combined changes necessitate an understanding of the AMES interdependencies, both structurally and behaviorally, to develop and enact effective policies. This work focuses on behavioral analysis methods to provide examples of how to analyze system behavior and the critical matter and energy flows through the system. Building upon past works, two regions of the AMES are modeled, and their behavior is analyzed using Hetero-functional Graph Theory (HFGT). More specifically, the work presents a weighted least square error state estimation model of the AMES. State estimation has played a major role in the operation and development of the American Electric Power System. This work extends the state estimation analysis beyond the single-operand electric grid environment into the heterogeneous environment of the AMES. Employing a data-driven and model-based systems engineering approach in combination with HFGT, a Weighted Least Squares Error Hetero-functional Graph State Estimation (WLSEHFGSE) optimization program is developed to estimate the optimal flows of mass and energy through the AMES. This work is the first to integrate state estimation methods with HFGT. Furthermore, it demonstrates how such a WLSEHFGSE recovers the mass and energy flows in a system-of-systems like the AMES with asset-level granularity.
comment: 27 pages, 3 Tables, 11 Figures
☆ Adaptive Override Control under High-Relative-Degree Nonovershooting Constraints
This paper considers the problem of adaptively overriding unsafe actions of a nominal controller in the presence of high-relative-degree nonovershooting constraints and parametric uncertainties. To prevent the design from being coupled with high-order derivatives of the parameter estimation error, we adopt a modular design approach in which the controller and the parameter identifier are designed separately. The controller module ensures that any safety violations caused by parametric uncertainties remain bounded, provided that the parameter estimation error and its first-order derivative are either bounded or square-integrable. The identifier module, in turn, guarantees that these requirements on the parameter estimation error are satisfied. Both theoretical analysis and simulation results demonstrate that the closed-loop safety violation is bounded by a tunable function of the initial estimation error. Moreover, as time increases, the parameter estimate converges to the true value, and the amount of safety violation decreases accordingly.
☆ Frequency-Varying Optimization: A Control Framework for New Dynamic Frequency Response Services
To address the variability of renewable generation, initiatives have been launched globally to provide faster and more effective frequency responses. In the UK, the National Energy System Operator (NESO) has introduced a suite of three new dynamic services, where aggregation of assets is expected to play a key role. For an Aggregated Response Unit (ARU), the required level of frequency response varies with grid frequency, resulting in a frequency-varying equality constraint that assets should meet collectively. We show that the optimal coordination of an ARU constitutes a Frequency-Varying Optimization (FVO) problem, in which the optimal trajectory for each asset evolves dynamically. To facilitate online optimization, we reformulate the FVO problem into Tracking of the Optimal Trajectory (TOT) problems, with algorithms proposed for two scenarios: one where the asset dynamics are negligible, and another where they must be accounted for. Under reasonable conditions, the ARU converges to the optimal trajectory within a fixed time, and within the maximum delivery time requested by NESO. The proposed framework can be readily distributed to coordinate a large number of assets. Numerical results verify the effectiveness and scalability of the proposed control framework.
comment: in IEEE Transactions on Power Systems, 2025
☆ On the Boundary of the Robust Admissible Set in State and Input Constrained Nonlinear Systems
In this paper, we consider nonlinear control systems subject to bounded disturbances and to both state and input constraints. We introduce the definition of robust admissible set - the set of all initial states from which the state and input constraints can be satisfied for all times against all admissible disturbances. We focus on its boundary that can be decomposed into the usable part on the state constraint boundary and the barrier, interior to the state constraints. We show that, at the intersection of these two components, the boundary of the admissible set must be tangent to the state constraints and separate the interior of the robust admissible set and its complement. Moreover, we prove that the barrier must satisfy a saddle-point principle on a Hamiltonian, in the spirit of Pontryagin's maximum principle, thus providing a direct computational tool. Lastly, we illustrate our results by calculating the robust admissible set for an adaptive cruise control example.
comment: 18 pages, 4 figures
☆ Integration of Concentrated Solar Power Plants in Renewable-Only VPP with Electrical and Thermal Demands: A Two-Stage Robust Bidding Approach
This paper proposes the integration of Concentrated Solar Power Plant (CSP) in the Renewable-only virtual power plant (RVPP) for bidding in the electricity day-ahead and secondary reserve markets, as well as trading thermal energy through a heat purchase agreement. A reformulated two-stage robust optimization approach is introduced to account for multiple uncertainties, including electricity prices, non-dispatchable renewable energy sources electrical production, CSP thermal production, and uncertainties in electrical and thermal demand consumption. The provision of energy and reserve by the thermal storage of CSP is modeled using an adjustable approach, which allocates a share of energy for up and down reserves based on the profitability of the RVPP. Simulations are conducted for several case studies to demonstrate the effectiveness and computational efficiency of the proposed approach under different RVPP operator decisions against uncertain parameters and various trading strategies for electricity and thermal energy. The simulation results show that integrating CSP into RVPP enhances RVPP flexibility for both electrical and thermal trading. Furthermore, the results indicate that the profitability of the RVPP increases when all trading options are considered, across different levels of conservatism adopted by the RVPP operator in response to uncertain parameters.
☆ Guaranteed Robust Nonlinear MPC via Disturbance Feedback
Robots must satisfy safety-critical state and input constraints despite disturbances and model mismatch. We introduce a robust model predictive control (RMPC) formulation that is fast, scalable, and compatible with real-time implementation. Our formulation guarantees robust constraint satisfaction, input-to-state stability (ISS) and recursive feasibility. The key idea is to decompose the uncertain nonlinear system into (i) a nominal nonlinear dynamic model, (ii) disturbance-feedback controllers, and (iii) bounds on the model error. These components are optimized jointly using sequential convex programming. The resulting convex subproblems are solved efficiently using a recent disturbance-feedback MPC solver. The approach is validated across multiple dynamics, including a rocket-landing problem with steerable thrust. An open-source implementation is available at https://github.com/antoineleeman/robust-nonlinear-mpc.
comment: Code: https://github.com/antoineleeman/robust-nonlinear-mpc
☆ An Extended Kalman Filter for Systems with Infinite-Dimensional Measurements
This article examines state estimation in discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements, motivated by real-world applications such as vision-based localization and tracking. We develop an extended Kalman filter (EKF) for real-time state estimation, with the measurement noise modeled as an infinite-dimensional random field. When applied to vision-based state estimation, the measurement Jacobians required to implement the EKF are shown to correspond to image gradients. This result provides a novel system-theoretic justification for the use of image gradients as features for vision-based state estimation, contrasting with their (often heuristic) introduction in many computer-vision pipelines. We demonstrate the practical utility of the EKF on a public real-world dataset involving the localization of an aerial drone using video from a downward-facing monocular camera. The EKF is shown to outperform VINS-MONO, an established visual-inertial odometry algorithm, in some cases achieving mean squared error reductions of up to an order of magnitude.
comment: 8 pages
☆ Dual Iterative Learning Control for Multiple-Input Multiple-Output Dynamics with Validation in Robotic Systems
Solving motion tasks autonomously and accurately is a core ability for intelligent real-world systems. To achieve genuine autonomy across multiple systems and tasks, key challenges include coping with unknown dynamics and overcoming the need for manual parameter tuning, which is especially crucial in complex Multiple-Input Multiple-Output (MIMO) systems. This paper presents MIMO Dual Iterative Learning Control (DILC), a novel data-driven iterative learning scheme for simultaneous tracking control and model learning, without requiring any prior system knowledge or manual parameter tuning. The method is designed for repetitive MIMO systems and integrates seamlessly with established iterative learning control methods. We provide monotonic convergence conditions for both reference tracking error and model error in linear time-invariant systems. The DILC scheme -- rapidly and autonomously -- solves various motion tasks in high-fidelity simulations of an industrial robot and in multiple nonlinear real-world MIMO systems, without requiring model knowledge or manually tuning the algorithm. In our experiments, many reference tracking tasks are solved within 10-20 trials, and even complex motions are learned in less than 100 iterations. We believe that, because of its rapid and autonomous learning capabilities, DILC has the potential to serve as an efficient building block within complex learning frameworks for intelligent real-world systems.
comment: 11 pages, 4 figures
☆ Verification and Synthesis of Discrete-Time Control Barrier Functions
Discrete-time Control Barrier Functions (DTCBFs) have recently attracted interest for guaranteeing safety and synthesizing safe controllers for discrete-time dynamical systems. This paper addresses the open challenges of verifying candidate DTCBFs and synthesizing DTCBFs for general nonlinear discrete-time systems with input constraints and arbitrary safe sets. In particular, we propose a branch-and-bound method, inspired by the $\alpha$BB algorithm, for the verification of candidate DTCBFs in both cases, whether a corresponding control policy is known or unknown. We prove that this method, in a finite number of iterations, either verifies a given candidate function as a valid DTCBF or falsifies it by providing a counterexample (within predefined tolerances). As a second main contribution, we propose a novel bilevel optimization approach to synthesize a DTCBF and a corresponding control policy in finite time. This involves determining the unknown coefficients of a parameterized DTCBF and a parameterized control policy. Furthermore, we introduce various strategies to reduce the computational burden of the bilevel approach. We also demonstrate our methods using numerical case studies.
comment: To appear in IEEE Transactions on Automatic Control
☆ 3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space
Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.
comment: 7 main scripts + 2 reference pages
☆ Distributionally Robust Safe Motion Planning with Contextual Information
We present a distributionally robust approach for collision avoidance by incorporating contextual information. Specifically, we embed the conditional distribution of future trajectory of the obstacle conditioned on the motion of the ego agent in a reproducing kernel Hilbert space (RKHS) via the conditional kernel mean embedding operator. Then, we define an ambiguity set containing all distributions whose embedding in the RKHS is within a certain distance from the empirical estimate of conditional mean embedding learnt from past data. Consequently, a distributionally robust collision avoidance constraint is formulated, and included in the receding horizon based motion planning formulation of the ego agent. Simulation results show that the proposed approach is more successful in avoiding collision compared to approaches that do not include contextual information and/or distributional robustness in their formulation in several challenging scenarios.
☆ Interaction-aware Lane-Changing Early Warning System in Congested Traffic
Lane changes (LCs) in congested traffic are complex, multi-vehicle interactive events that pose significant safety concerns. Providing early warnings can enable more proactive driver assistance system and support more informed decision-making for drivers under LCs. This paper presents an interaction-aware Lane-Changing Early Warning (LCEW) system designed to issue reliable early warning signals based on future trajectory predictions. We first investigate the stochastic nature of LCs, characterized by (i) variable-size multi-vehicle interactions and (ii) the direct and indirect risks resulting from these interactions. To model these stochastic interactions, a Social Spatio-Temporal Graph Convolutional Neural Network framework informed by mutual information (STGCNN-MI) is introduced to predict multi-vehicle trajectories. By leveraging a MI-based adjacency matrix, the framework enhances trajectory prediction accuracy while providing interpretable representations of vehicle interactions. Then, potential collisions between the LC vehicle and adjacent vehicles (direct risks) or among the non-adjacent vehicles (indirect risks) are identified using oriented bounding box detection applied to the predicted trajectories. Finally, a warning signal is generated to inform the LC driver of location of potential collisions within the predicted time window. Traffic simulation experiments conducted in SUMO demonstrate that the proposed interaction-aware LCEW improves both vehicle-level safety and overall traffic efficiency, while also promoting more natural behavioral adaptation.
☆ The First Open-Source Framework for Learning Stability Certificate from Data
Before 2025, no open-source system existed that could learn Lyapunov stability certificates directly from noisy, real-world flight data. No tool could answer the critical question: is this controller still stabilizable-especially when its closed-loop system is a total black box. We broke that boundary. This year, we released the first-ever open-source framework that can learn Lyapunov functions from trajectory data under realistic, noise-corrupted conditions. Unlike statistical anomaly detectors, our method does not merely flag deviations-it directly determines whether the system can still be proven stable. Applied to public data from the 2024 SAS severe turbulence incident, our method revealed that, within just 60 seconds of the aircrafts descent becoming abnormal, no Lyapunov function could be constructed to certify system stability. Moreover, this is the first known data-driven stability-theoretic method ever applied to a civil airliner accident. And our approach works with zero access to the controller logic-a breakthrough for commercial aircraft where control laws are proprietary and opaque. The implementation of the proposed framework is open-sourced and available at: https://github.com/HansOersted/stability
☆ VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In the exploration phase, structured prompts guide VLM-based search toward informative and diverse trajectories, yielding compact scene graph representations. In the deployment phase, a neurosymbolic planner reasons over the scene graph and environmental observations to generate executable plans, while a cache-enabled execution module accelerates adaptation by reusing previously computed task-location trajectories. By combining rapid exploration, symbolic reasoning, and cache-enabled execution, the proposed framework overcomes the computational inefficiency and poor generalization of prior vision-language navigation methods, enabling robust and scalable decision-making in unseen environments. VLN-Zero achieves 2x higher success rate compared to state-of-the-art zero-shot models, outperforms most fine-tuned baselines, and reaches goal locations in half the time with 55% fewer VLM calls on average compared to state-of-the-art models across diverse environments. Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/.
comment: Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/
☆ AI Agent Access (A\^3) Network: An Embodied, Communication-Aware Multi-Agent Framework for 6G Coverage
The vision of 6G communication demands autonomous and resilient networking in environments without fixed infrastructure. Yet most multi-agent reinforcement learning (MARL) approaches focus on isolated stages - exploration, relay formation, or access - under static deployments and centralized control, limiting adaptability. We propose the AI Agent Access (A\^3) Network, a unified, embodied intelligence-driven framework that transforms multi-agent networking into a dynamic, decentralized, and end-to-end system. Unlike prior schemes, the A\^3 Network integrates exploration, target user access, and backhaul maintenance within a single learning process, while supporting on-demand agent addition during runtime. Its decentralized policies ensure that even a single agent can operate independently with limited observations, while coordinated agents achieve scalable, communication-optimized coverage. By embedding link-level communication metrics into actor-critic learning, the A\^3 Network couples topology formation with robust decision-making. Numerical simulations demonstrate that the A\^3 Network not only balances exploration and communication efficiency but also delivers system-level adaptability absent in existing MARL frameworks, offering a new paradigm for 6G multi-agent networks.
☆ Refined Barrier Conditions for Finite-Time Safety and Reach-Avoid Guarantees in Stochastic Systems
Providing finite-time probabilistic safety and reach-avoid guarantees is crucial for safety-critical stochastic systems. Existing barrier certificate methods often rely on a restrictive boundedness assumption for auxiliary functions, limiting their applicability. This paper presents refined barrier-like conditions that remove this assumption. Specifically, we establish conditions for deriving upper bounds on finite-time safety probabilities in discrete-time systems and lower bounds on finite-time reach-avoid probabilities in continuous-time systems. This key relaxation significantly expands the class of verifiable systems, especially those with unbounded state spaces, and facilitates the application of advanced optimization techniques, such as semi-definite programming with polynomial functions. The efficacy of our approach is validated through numerical examples.
♻ ☆ Energy Management for Renewable-Colocated Artificial Intelligence Data Centers
We develop an energy management system (EMS) for artificial intelligence (AI) data centers with colocated renewable generation. Under a cost-minimizing framework, the EMS of renewable-colocated data center (RCDC) co-optimizes AI workload scheduling, on-site renewable utilization, and electricity market participation. Within both wholesale and retail market participation models, the economic benefit of the RCDC operation is maximized. Empirical evaluations using real-world traces of electricity prices, data center power consumption, and renewable generation demonstrate significant electricity cost reduction from renewable and AI data center colocations.
♻ ☆ Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures
Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe
comment: Conference on Robot Learning (CoRL 2025)
♻ ☆ The Bayesian Separation Principle for Data-driven Control
In this paper we investigate the existence of a separation principle between model identification and control design in the context of model predictive control. First, we clarify that such a separation principle holds asymptotically in the number of data in a Fisherian context, and show that it holds universally, i.e. regardless of the data size, in a Bayesian context. Then, by formulating model predictive control within a Gaussian regression framework, we describe how the Bayesian separation principle can be used to derive computable, uncertainty-aware expressions for the control cost and optimal input sequence, thereby bridging direct and indirect data-driven approaches. Numerical results in both linear and nonlinear scenarios illustrate that the proposed approach outperform nominal methods that neglect uncertainty, highlighting the advantages of incorporating uncertainty in the control design process.
comment: 16 pages, 2 figures
♻ ☆ Drift Plus Optimistic Penalty -- A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
♻ ☆ Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for coordinated multi-robot maneuvering in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: This work has been submitted to the IEEE for possible publication
♻ ☆ Low-pass sampling in Model Predictive Path Integral Control
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, offering flexibility in handling arbitrary dynamics and cost functions. However, the original MPPI suffers from high-frequency noise in the sampled control trajectories, leading to actuator wear and inefficient exploration. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and improve the effectiveness of the control trajectories exploration. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled trajectories, enhancing sampling efficiency and control smoothness. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
♻ ☆ Sliding Speed Influences Electrovibration-Induced Finger Friction Dynamics on Touchscreens
Electrovibration technology enables tactile texture rendering on capacitive touchscreens by modulating friction between the finger and the screen through electrostatic attraction forces, generated by applying an alternating voltage signal to the screen. Accurate signal calibration is essential for robust texture rendering but remains challenging due to variations in sliding speed, applied force, and individual skin mechanics, all of which unpredictably affect frictional behavior. Here, we investigate how exploration conditions affect electrovibration-induced finger friction on touchscreens and the role of skin mechanics in this process. Ten participants slid their index fingers across an electrovibration-enabled touchscreen at five sliding speeds ($20\sim100$ mm/s) and applied force levels ($0.2\sim0.6$ N). Contact forces and skin accelerations were measured while amplitude modulated voltage signals spanning the tactile frequency range were applied to the screen. We modeled the finger-touchscreen friction response as a first-order system and the skin mechanics as a mass-spring-damper system. Results showed that sliding speed influenced the friction response's cutoff frequency, along with the estimated finger moving mass and stiffness. For every $1$ mm/s increase in speed, the cutoff frequency, the finger moving mass, and stiffness increased by $13.8$ Hz, $3.23\times 10^{-5}$ kg, and $4.04$ N/m, respectively. Correlation analysis revealed that finger stiffness had a greater impact on the cutoff frequency than moving mass. Notably, we observed a substantial inter-participant variability in both finger-display interaction and skin mechanics parameters. Finally, we developed a speed-dependent friction model to support consistent and perceptually stable electrovibration-based haptic feedback across varying user conditions.
comment: 26 pages, 14 figures, Tribology journal
♻ ☆ Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested remedial actions.
comment: This preprint is currently under peer review
♻ ☆ Optimization-Based Ramping Reserve Allocation of BESS for AGC Enhancement
The transient behavior of Automatic Generation Control (AGC) systems is a critical aspect of power system operation. Therefore, fully extracting the potential of Battery Energy Storage Systems (BESSs) for AGC enhancement is of paramount importance. In light of the challenges posed by diverse resource interconnections and the variability associated, we propose an online optimization scheme that can adapt to changes in an unknown and variable environment. To leverage the synergy between BESSs and Conventional Generators (CGs), we devise a variant of the Area Injection Error (AIE) as a measure to quantify the ramping needs. Based on this measure, we develop a distributed optimization algorithm with adaptive learning rates for the allocation of the ramping reserve. The algorithm restores a larger learning rate for compliance with the ramping needs upon detecting a potentially destabilizing event. We demonstrate the effectiveness and scalability of the proposed scheme through comprehensive case studies. It is shown that the proposed scheme can improve the transient behavior of the AGC system by bridging the gap in ramping capability.
♻ ☆ DANSE: Data-driven Non-linear State Estimation of Model-free Process in Unsupervised Learning Setup
We address the tasks of Bayesian state estimation and forecasting for a model-free process in an unsupervised learning setup. For a model-free process, we do not have any a-priori knowledge of the process dynamics. In the article, we propose DANSE -- a Data-driven Nonlinear State Estimation method. DANSE provides a closed-form posterior of the state of the model-free process, given linear measurements of the state. In addition, it provides a closed-form posterior for forecasting. A data-driven recurrent neural network (RNN) is used in DANSE to provide the parameters of a prior of the state. The prior depends on the past measurements as input, and then we find the closed-form posterior of the state using the current measurement as input. The data-driven RNN captures the underlying non-linear dynamics of the model-free process. The training of DANSE, mainly learning the parameters of the RNN, is executed using an unsupervised learning approach. In unsupervised learning, we have access to a training dataset comprising only a set of measurement data trajectories, but we do not have any access to the state trajectories. Therefore, DANSE does not have access to state information in the training data and can not use supervised learning. Using simulated linear and non-linear process models (Lorenz attractor and Chen attractor), we evaluate the unsupervised learning-based DANSE. We show that the proposed DANSE, without knowledge of the process model and without supervised learning, provides a competitive performance against model-driven methods, such as the Kalman filter (KF), extended KF (EKF), unscented KF (UKF), a data-driven deep Markov model (DMM) and a recently proposed hybrid method called KalmanNet. In addition, we show that DANSE works for high-dimensional state estimation.
comment: 12 pages, Accepted for publication in IEEE Transactions in Signal Processing. Added a fix in the latest ArXiV version for Fig. 4 (and related figs) due to a slight misalignment in MSE calculations. Relative performance trends are unchanged and consistent
♻ ☆ Recent advances in data-driven methods for degradation modelling across applications
Understanding degradation is crucial for ensuring the longevity and performance of materials, systems, and organisms. To illustrate the similarities across applications, this article provides a review of data-based method in materials science, engineering, and medicine. The methods analyzed in this paper include regression analysis, factor analysis, cluster analysis, Markov Chain Monte Carlo, Bayesian statistics, hidden Markov models, nonparametric Bayesian modeling of time series, supervised learning, and deep learning. The review provides an overview of degradation models, referencing books and methods, and includes detailed tables highlighting the applications and insights offered in medicine, power engineering, and material science. It also discusses the classification of methods, emphasizing statistical inference, dynamic prediction, machine learning, and hybrid modeling techniques. Overall, this review enhances understanding of degradation modelling across diverse domains.
♻ ☆ A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277. Newest version: 6 pages, 5 figures, the only change is the presentation. Under review
Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data
We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences. Investigating the mean square asymptotic stability of the equations, we show that if the regularization path is slowly time-varying, then the algorithm's output achieves mean square consistency with the regularization path. Leveraging operator theory, particularly the monotonicity of the inverses of operators and the spectral decomposition of compact operators, we introduce the RKHS persistence of excitation condition (i.e. there exists a fixed-length time period, such that the conditional expectation of the operators induced by the input data accumulated over every period has a uniformly strictly positive compact lower bound) and develop a dominated convergence method to prove the mean square consistency between the algorithm's output and an unknown function. Finally, for independent and non-identically distributed data streams, the algorithm achieves the mean square consistency if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
♻ ☆ LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems, but in long horizon sequential planning tasks, their errors compound, often leading to unreliable or inefficient behavior. We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic that communicates through Linear Temporal Logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. LogicGuard supports both fixed safety rules and adaptive, learned constraints, and is model-agnostic: any LLM-based planner can serve as the actor, with LogicGuard acting as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LogicGuard to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. To demonstrate generality, we evaluate LogicGuard across two distinct settings: short-horizon general tasks and long-horizon specialist tasks. On the Behavior benchmark of 100 household tasks, LogicGuard increases task completion rates by 25% over a baseline InnerMonologue planner. On the Minecraft diamond-mining task, which is long-horizon and requires multiple interdependent subgoals, LogicGuard improves both efficiency and safety compared to SayCan and InnerMonologue. These results show that enabling LLMs to supervise each other through temporal logic yields more reliable, efficient and safe decision-making for both embodied agents.
comment: Modified version of prior LTLCrit work with new robotics dataset
♻ ☆ Spectrum Opportunities for the Wireless Future: From Direct-to-Device Satellite Applications to 6G Cellular
For next-generation wireless networks, both the upper mid-band and terahertz spectra are gaining global attention. This article provides an in-depth analysis of recent regulatory rulings and spectrum preferences issued by international standard bodies. We highlight promising frequency bands earmarked for 6G and beyond, and offer examples that illuminate the passive service protections and spectrum feasibility for coexistence between terrestrial and non-terrestrial wireless networks.
Optimization and Control 41
☆ Green Inventory Management: Leveraging Multiobjective Reverse Logistics
The paper proposes a novel Economic Production Quantity (EPQ) inventory model within a reverse logistics framework, addressing new and repaired products with varying quality and demand patterns. The model integrates production and remanufacturing rates as functions of lot sizes and cycle numbers to develop a feasible inventory cost function. A key contribution of the study is formulating a multiobjective optimization framework that simultaneously minimizes inventory costs and accounts for environmental sustainability by considering greenhouse gas (GHG) emissions and energy consumption during production processes. The problem is formulated as a mixed-integer nonlinear programming (MINLP) model, with integer constraints on lot sizes and cycle counts and a continuous return rate. Numerical case studies taking test problems from existing literature are used to validate the model through extensive sensitivity analyses. Both mathematical optimization and heuristic optimization methods are applied to solve multiobjective optimization problems, and Pareto fronts are illustrated along with the interpretation of the results. The results, obtained using solvers in MATLAB and AMPL, highlight the models ability to balance operational efficiency and environmental responsibility. Pareto frontiers generated from the analysis provide strategic insights for decision-makers seeking to optimize cost and sustainability in inventory systems.
comment: 27 pages, 16 figures
☆ A Multiobjective Mathematical Model for Optimal Irrigation Water Allocation
Water resource management is vital for sustainable irrigation and food security under growing climatic and economic pressures. In this study, we develop two single-objective optimization models for irrigation planning, one that maximizes net benefit and the other that minimizes environmental flow deficiency, and benchmark them against previous works. We then extend the analysis to a multiobjective linear programming formulation solved through different approaches to evaluate trade-offs. Numerical experiments on the Muhuri Irrigation Project reveal three outcomes: (i) a complete scenario view with profits ranging from $0.2 \times 10^9$ to $1.497\times 10^9$ and environmental flow deficits between 0 and 1200 GL, where the 1200 GL represents the theoretical annual maximum under a 100 GL uniform monthly target; (ii) explicit trade-offs showing higher profits correspond to greater ecological shortfalls; and (iii) an integration-based approach producing nearly 1000 Pareto-optimal solutions within seconds, greatly outperforming earlier studies.
comment: 21 pages, 1 figure
☆ Automatic Structure Identification for Highly Nonlinear MIMO Volterra Tensor Networks
The Volterra Tensor Network lifts the curse of dimensionality for truncated, discrete times Volterra models, enabling scalable representation of highly nonlinear system. This scalability comes at the cost of introducing randomness through initialization, and leaves open the challenge of how to efficiently determine the hyperparameters model order and memory length. In this paper, we present a unified framework that simultaneously addresses both challenges: We derive two algorithms that incrementally increase the model order and memory length along. Further we proof that the updates are performed along conjugate directions by establishing a mathematical equivalence between our proposed algorithms and equality constrained least squares systems. We present several strategies how to use our proposed algorithms for initialization and hyperparameter selection. In numerical experiments, we demonstrate that our proposed algorithms are more accurate and efficient than the state-of-the-art Volterra Tensor Network and achieve competitive results to several state-of-the-art Volterra models.
☆ Inexact and Stochastic Gradient Optimization Algorithms with Inertia and Hessian Driven Damping
In a real Hilbert space setting, we study the convergence properties of an inexact gradient algorithm featuring both viscous and Hessian driven damping for convex differentiable optimization. In this algorithm, the gradient evaluation can be subject to deterministic and stochastic perturbations. In the deterministic case, we show that under appropriate summability assumptions on the perturbation, our algorithm enjoys fast convergence of the objective values, of the gradients and weak convergence of the iterates toward a minimizer of the objective. In the stochastic case, assuming the perturbation is zero-mean, we can weaken our summability assumptions on the error variance and provide fast convergence of the values both in expectation and almost surely. We also improve the convergence rates from $\mathcal{O}(\cdot)$ to $o(\cdot)$ in almost sure sense. We also prove almost sure summability property of the gradients, which implies the almost sure fast convergence of the gradients towards zero. We will highlight the trade-off between fast convergence and the applicable regime on the sequence of errors in the gradient computations. We finally report some numerical results to support our findings.
☆ Design, modeling and control of a hybrid grid-connected photovoltaic-wind system for the region of Adrar, Algeria
The use of fossil energy for electricity production is an evident source of pollution, global warming and climate change. Consequently, researchers have been working to shift toward sustainable and clean energy by exploiting renewable an environmentally friendly resources such as wind and solar energies. On the other hand, energy security can only be achieved by considering multiple resources. Large-scale renewable energy power plants are a key solution for diversifying the total energy mix and ensuring energy security. This paper presents a contribution to diversify the energy mix in Algeria and help mitigate power shortages and improve grid performance. In particular, the paper aims at designing and modeling a large-scale hybrid photovoltaic-wind system that is grid connected. An innovative control approach using improved particle swarm optimized PI controllers is proposed to control the hybrid system and generate the maximum power from the available wind and solar energy resources. Furthermore, economic, environmental and feasibility studies of the project were conducted using HOMER software to assess the viability of the system and its contribution to reduce greenhouse gas emissions. Code can be found here: https://github.com/yassinekebbati/Hybrid_PV_WIND_System
☆ Nearly Optimal Chaotic Desynchronization of Neural Oscillators
Motivated by deep brain stimulation treatment of neural disorders such as Parkinson's disease, it has been proposed that desynchronization of neural oscillators can be achieved by maximizing the Lyapunov exponent of the phase difference between pairs of oscillators. Here we consider two approximations to optimal stimuli for chaotic desynchronization of neural oscillators. These approximations are based on the oscillators' phase response curve, and unlike previous approaches do not require numerical solution of a two-point boundary value problem. It is shown that these approximations can achieve nearly optimal desynchronization, and can be used with an event-based control scheme to desynchronize populations of noisy, coupled neurons.
comment: 6 pages, 8 figures. Accepted for presentation at 2025 64th IEEE Conference on Decision and Control (CDC)
☆ Coordinated PSO-PID based longitudinal control with LPV-MPC based lateral control for autonomous vehicles
Autonomous driving is achieved by controlling the coupled nonlinear longitudinal and lateral vehicle dynamics. Longitudinal control greatly affects lateral dynamics and must preserve lateral stability conditions, while lateral controllers must take into account actuator limits and ride comfort. This work deals with the coordinated longitudinal and lateral control for autonomous driving. An improved particle swarm optimized PID (PSO-PID) is proposed to handle the task of speed tracking based on nonlinear longitudinal dynamics. An enhanced linear parameter varying model predictive controller (LPV-MPC) is also designed to control lateral dynamics, the latter is formulated with an adaptive LPV model in which the tire cornering stiffness coefficients are estimated by a recursive estimator. The proposed LPV-MPC is enhanced with an improved cost function to provide better performance and stability. Matlab/Carsim co-simulations are carried out to validate the proposed controllers. Code can be found here: https://github.com/yassinekebbati/PSO_PID-with-LPV_MPC-for-autonomous-driving
☆ Autonomous driving using an optimized neural network based adaptive LPV-MPC controller
Driverless vehicles are complex systems operating in constantly changing environments. Automated driving is achieved by controlling the coupled longitudinal and lateral vehicle dynamics. Model predictive control is one of the most promising tools for this type of application due to its optimal performance and ability to handle constraints. This paper addresses autonomous driving with an adaptive linear parameter varying model predictive controller (LPV-MPC), which is adapted by a neural network and optimized by an improved Genetic Algorithm. The proposed controller is evaluated on a challenging track under variable wind disturbance. Code can be found here: https://github.com/yassinekebbati/GA-optimized-MLP-based-LPV_MPC
☆ Hierarchical null controllability of a degenerate parabolic equation with nonlocal coefficient
In this paper we use a Stackelberg-Nash strategy to show the local null controllability of a parabolic equation where the diffusion coefficient is the product of a degenerate function in space and a nonlocal term. We consider one control called \textit{leader} and two controls called \textit{followers}. To each leader we associate a Nash equilibrium corresponding to a bi-objective optimal control problem; then, we find a leader that solves the null controllability problem. The linearized degenerated system is treated adapting Carleman estimates for degenerated systems from Demarque, L\'imaco and Viana \cite{DemarqueLimacoViana_deg_sys2020} and the local controllability of the non-linear system is obtained using Liusternik's inverse function theorem. The nonlocal coefficient originates a multiplicative coupling in the optimality system that gives rise to interesting calculations in the applications of the inverse function theorem.
☆ Robust Near-Optimal Nonlinear Target Enclosing Guidance
This paper proposes a nonlinear optimal guidance law that enables a pursuer to enclose a target within arbitrary geometric patterns, which extends beyond conventional circular encirclement. The design operates using only relative state measurements and formulates a target enclosing guidance law in which the vehicle's lateral acceleration serves as the steering control, making it well-suited for aerial vehicles with turning constraints. Our approach generalizes and extends existing guidance strategies that are limited to target encirclement and provides a degree of optimality. At the same time, the exact information of the target's maneuver is unnecessary during the design. The guidance law is developed within the framework of a state-dependent Riccati equation (SDRE), thereby providing a systematic way to handle nonlinear dynamics through a pseudo-linear representation to design locally optimal feedback guidance commands through state-dependent weighting matrices. While SDRE ensures near-optimal performance in the absence of strong disturbances, we further augment the design to incorporate an integral sliding mode manifold to compensate when disturbances push the system away from the nominal trajectory, and demonstrate that the design provides flexibility in the sense that the (possibly time-varying) stand-off curvature could also be treated as unknown. Simulations demonstrate the efficacy of the proposed approach.
☆ Policy Gradient Bounds in Multitask LQR
We analyze the performance of policy gradient in multitask linear quadratic regulation (LQR), where the system and cost parameters differ across tasks. The main goal of multitask LQR is to find a controller with satisfactory performance on every task. Prior analyses on relevant contexts fail to capture closed-loop task similarities, resulting in conservative performance guarantees. To account for such similarities, we propose bisimulation-based measures of task heterogeneity. Our measures employ new bisimulation functions to bound the cost gradient distance between a pair of tasks in closed loop with a common stabilizing controller. Employing these measures, we derive suboptimality bounds for both the multitask optimal controller and the asymptotic policy gradient controller with respect to each of the tasks. We further provide conditions under which the policy gradient iterates remain stabilizing for every system. For multiple random sets of certain tasks, we observe that our bisimulation-based measures improve upon baseline measures of task heterogeneity dramatically.
☆ Hybrid Adaptive Robust Stochastic Optimization Model for the Design of a Photovoltaic Battery Energy Storage System
Future energy projections and their inherent uncertainty play a key role in the design of photovoltaic-battery energy storage systems (PV-BESS) for household use. In this study, both stochastic and robust optimization techniques are simultaneously integrated into a Hybrid Adaptive Robust-Stochastic Optimization (HARSO) model. Uncertainty in future PV generation is addressed using a stochastic approach, while uncertainty in power demand is handled through robust optimization. To solve the tri-level structure emerging from the hybrid approach, a Column-and-Constraint Generation (CCG) algorithm is implemented. The model also accounts for battery degradation by considering multiple commercially available battery chemistries, enabling a more realistic evaluation of long-term system costs and performance. To demonstrate its applicability, the model is applied to a case study involving the optimal design of a PV-BESS system for a household in Spain. The empirical analysis includes both first-life (FL) and second-life (SL) batteries with different chemistries, providing a comprehensive evaluation of design alternatives under uncertainty. Results indicate that the optimal solution is highly dependent on the level of robustness considered, leading to a shift in design strategy. Under less conservative settings, robustness is achieved by increasing battery capacity, while higher levels of conservatism favor expanding PV capacity to meet demand.
☆ A failure mode dependent continuum damage model for laminated composites with optimized model parameters : Application to curved beams
In this article, a failure mode dependent and thermodynamically consistent continuum damage model with polynomial-based damage hardening functions is proposed for continuum damage modeling of laminated composite panels. The damage model parameters are characterized based on all uniaxial/shear experimental stress-strain curves. Steepest descent optimization algorithm is used to minimize the difference between model predicted and experimental stress-strain curves to get the optimzed model parameters. The fully characterized damage evolution equations are used for damage prediction of a moderately thick laminated composite curved beam modeled using first-order shear deformation theory. Finite element method with load control is used to get the non-linear algebraic equations which are solved using Newton Raphson method. The developed model is compared with the existing failure mode dependent and failure mode independent damage models. The results depict the efficacy of the proposed model to capture non-linearity in the load vs deflection curve due to stiffness degradation and different damage in tension andcompression consistent with uniaxial/shear stress-strain response and strength properties of the material, respectively.
☆ A Weighted Least Squares Error Hetero-functional Graph State Estimator of the American Multi-modal Energy System
As one of the most pressing challenges of the 21st century, global climate change demands a host of changes across at least four critical energy infrastructures: the electric grid, the natural gas system, the oil system, and the coal system. In the context of the United States, this paper refers to this system-of-systems as ``The American Multi-Modal Energy System (AMES)". These combined changes necessitate an understanding of the AMES interdependencies, both structurally and behaviorally, to develop and enact effective policies. This work focuses on behavioral analysis methods to provide examples of how to analyze system behavior and the critical matter and energy flows through the system. Building upon past works, two regions of the AMES are modeled, and their behavior is analyzed using Hetero-functional Graph Theory (HFGT). More specifically, the work presents a weighted least square error state estimation model of the AMES. State estimation has played a major role in the operation and development of the American Electric Power System. This work extends the state estimation analysis beyond the single-operand electric grid environment into the heterogeneous environment of the AMES. Employing a data-driven and model-based systems engineering approach in combination with HFGT, a Weighted Least Squares Error Hetero-functional Graph State Estimation (WLSEHFGSE) optimization program is developed to estimate the optimal flows of mass and energy through the AMES. This work is the first to integrate state estimation methods with HFGT. Furthermore, it demonstrates how such a WLSEHFGSE recovers the mass and energy flows in a system-of-systems like the AMES with asset-level granularity.
comment: 27 pages, 3 Tables, 11 Figures
☆ Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression
Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur sample-size memory overhead. This paper introduces Clapping, a Communication compression algorithm with LAzy samPling for Pipeline-parallel learnING. Clapping adopts a lazy sampling strategy that reuses data samples across steps, breaking sample-wise memory barrier and supporting convergence in few-epoch or online training regimes. Clapping comprises two variants including Clapping-FC and Clapping-FU, both of which achieve convergence without unbiased gradient assumption, effectively addressing compression error propagation in multi-worker settings. Numerical experiments validate the performance of Clapping across different learning tasks.
comment: 60 pages
☆ Central Limit Theorems for Asynchronous Averaged Q-Learning
This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We present a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.
☆ Improved Trial and Error Learning for Random Games
When a game involves many agents or when communication between agents is not possible, it is useful to resort to distributed learning where each agent acts in complete autonomy without any information on the other agents' situations. Perturbation-based algorithms have already been used for such tasks. We propose some improvements based on practical observations to improve the performance of these algorithms. We show that the introduction of these changes preserves their theoretical convergence properties towards states that maximize the average reward and improve them in the case where optimal states exist. Moreover, we show that these algorithms can be made robust to the addition of randomness to the rewards, achieving similar convergence guarantees. Finally, we discuss the possibility for the perturbation factor of the algorithm to decrease during the learning process, akin to simulated annealing processes.
☆ Diffusive Stochastic Master Equation (SME) with dispersive qubit/cavity coupling
A detailed analysis of the diffusive Stochastic Master Equation (SME) for qubit/cavity systems with dispersive coupling is provided. This analysis incorporates classical input signals and output signals (measurement outcomes through homodyne detection). The dynamics of the qubit/cavity density operator is shown to converge exponentially towards a slow invariant manifold, parameterized via a time-varying deterministic Kraus map by the density operator of a fictitious qubit. This fictitious qubit is governed by a SME incorporating the classical input/output signals. Extension where the qubit is replaced by any qudit dispersively coupled to an arbitrary set of modes with collective input/output classical signals.
comment: Submitted
☆ Solving Sparse MIQCQPs: Application to the Unit Commitment Problem with ACOPF Constraints
Mixed-Integer Quadratically Constrained Quadratic Programs arise in a variety of applications, particularly in energy, water, and gas systems, where discrete decisions interact with nonconvex quadratic constraints. These problems are computationally challenging due to the combination of combinatorial complexity and nonconvexity, often rendering traditional exact methods ineffective for large-scale instances. In this paper, we propose a solution framework for sparse MIQCQPs that integrates semidefinite programming relaxations with chordal decomposition techniques to exploit both term and correlative sparsity. By leveraging problem structure, we significantly reduce the size of the semidefinite constraints into smaller, tractable blocks, improving the scalability of the relaxation and the overall branch-and-bound procedure. We evaluate our framework on the Unit Commitment problem with AC Optimal Power Flow constraints to show that our method produces strong bounds and high-quality solutions on standard IEEE test cases up to 118 buses, demonstrating its effectiveness and scalability in solving MIQCQPs.
☆ On the Boundary of the Robust Admissible Set in State and Input Constrained Nonlinear Systems
In this paper, we consider nonlinear control systems subject to bounded disturbances and to both state and input constraints. We introduce the definition of robust admissible set - the set of all initial states from which the state and input constraints can be satisfied for all times against all admissible disturbances. We focus on its boundary that can be decomposed into the usable part on the state constraint boundary and the barrier, interior to the state constraints. We show that, at the intersection of these two components, the boundary of the admissible set must be tangent to the state constraints and separate the interior of the robust admissible set and its complement. Moreover, we prove that the barrier must satisfy a saddle-point principle on a Hamiltonian, in the spirit of Pontryagin's maximum principle, thus providing a direct computational tool. Lastly, we illustrate our results by calculating the robust admissible set for an adaptive cruise control example.
comment: 18 pages, 4 figures
☆ On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
Policy mirror descent (PMD) is a general policy optimization framework in reinforcement learning, which can cover a wide range of typical policy optimization methods by specifying different mirror maps. Existing analysis of PMD requires exact or approximate evaluation (for example unbiased estimation via Monte Carlo simulation) of action values solely based on policy. In this paper, we consider policy mirror descent with temporal difference evaluation (TD-PMD). It is shown that, given the access to exact policy evaluations, the dimension-free $O(1/T)$ sublinear convergence still holds for TD-PMD with any constant step size and any initialization. In order to achieve this result, new monotonicity and shift invariance arguments have been developed. The dimension free $\gamma$-rate linear convergence of TD-PMD is also established provided the step size is selected adaptively. For the two common instances of TD-PMD (i.e., TD-PQA and TD-NPG), it is further shown that they enjoy the convergence in the policy domain. Additionally, we investigate TD-PMD in the inexact setting and give the sample complexity for it to achieve the last iterate $\varepsilon$-optimality under a generative model, which improves the last iterate sample complexity for PMD over the dependence on $1/(1-\gamma)$.
☆ Entropy Regularization in Mean-Field Games of Optimal Stopping
We study mean-field games of optimal stopping (OS-MFGs) and introduce an entropy-regularized framework to enable learning-based solution methods. By utilizing randomized stopping times, we reformulate the OS-MFG as a mean-field game of singular stochastic controls (SC-MFG) with entropy regularization. We establish the existence of equilibria and prove their stability as the entropy parameter vanishes. Fictitious play algorithms tailored for the regularized setting are introduced, and we show their convergence under both Lasry-Lions monotonicity and supermodular assumptions on the reward functional. Our work lays the theoretical foundation for model-free learning approaches to OS-MFGs.
☆ A New Approach to the Data-Driven Output-Based LQR Problem of Continuous-Time Linear Systems
A promising method for constructing a data-driven output-feedback control law involves the construction of a model-free observer. The Linear Quadratic Regulator (LQR) optimal control policy can then be obtained by both policy-iteration (PI) and value-iteration (VI) algorithms. However, this method requires some unknown parameterization matrix to be of full row rank and needs to solve a sequence of high dimensional linear equations for either PI or VI algorithm. In this paper, we first show that this matrix is of full row rank under the standard controllability assumption of the plant, thus removing the main hurdle for applying this method. Then we further modify the existing method by defining an ancillary system whose LQR solution will lead to a data-driven output-feedback control law. By this new method, the rank condition of the unknown parameterization matrix is not needed any more. Moreover, we derive a new sequence of linear equations for either PI or VI algorithm whose number of unknown variables is significantly less than that of the existing PI or VI algorithm, thus not only improving the computational efficiency but also relaxing the solvability conditions of the existing PI or VI algorithm. Further, since the existing PI or VI algorithm only applies to the case where some Riccati equation admits a unique positive definite solution, we prove that both the PI algorithm and the VI algorithm in the literature can be generalized to the case where the Riccati equation only admits a unique positive semi-definite solution.
☆ Diagonal Linear Networks and the Lasso Regularization Path
Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.
comment: 29 pages, 1 figure
☆ Guaranteed Robust Nonlinear MPC via Disturbance Feedback
Robots must satisfy safety-critical state and input constraints despite disturbances and model mismatch. We introduce a robust model predictive control (RMPC) formulation that is fast, scalable, and compatible with real-time implementation. Our formulation guarantees robust constraint satisfaction, input-to-state stability (ISS) and recursive feasibility. The key idea is to decompose the uncertain nonlinear system into (i) a nominal nonlinear dynamic model, (ii) disturbance-feedback controllers, and (iii) bounds on the model error. These components are optimized jointly using sequential convex programming. The resulting convex subproblems are solved efficiently using a recent disturbance-feedback MPC solver. The approach is validated across multiple dynamics, including a rocket-landing problem with steerable thrust. An open-source implementation is available at https://github.com/antoineleeman/robust-nonlinear-mpc.
comment: Code: https://github.com/antoineleeman/robust-nonlinear-mpc
☆ Bound tightening in lifted formulations: (sub)solver-dependent impact on performance in RLT-based algorithms
In this paper we explore a relevant aspect of the interplay between two core elements of global optimization algorithms for nonconvex nonlinear programming problems, which we believe has been overlooked by past literature. The first one is the reformulation of the original problem, which requires the introduction of auxiliary variables with the goal of defining convex relaxations that can be solved both reliably and efficiently on a node-by-node basis. The second one, bound tightening or, more generally, domain reduction, allows to reduce the search space to be explored by the branch-and-bound algorithm. We are interested in the performance implications of propagating the bounds of the original variables to the auxiliary ones in the lifted space: does this propagation reduce the overall size of the tree? does it improve the efficiency at solving the node relaxations? To better understand the above interplay, we focus on the reformulation-linearization technique for polynomial optimization. In this setting we are able to obtain a theoretical result on the implicit bounds of the auxiliary variables in the RLT relaxations, which sets the stage for the ensuing computational study, whose goal is to assess to what extent the performance of an RLT-based algorithm may be affected by the decision to explicitly propagate the bounds on the original variables to the auxiliary ones.
☆ Learning When to Restart: Nonstationary Newsvendor from Uncensored to Censored Demand
We study nonstationary newsvendor problems under nonparametric demand models and general distributional measures of nonstationarity, addressing the practical challenges of unknown degree of nonstationarity and demand censoring. We propose a novel distributional-detection-and-restart framework for learning in nonstationary environments, and instantiate it through two efficient algorithms for the uncensored and censored demand settings. The algorithms are fully adaptive, requiring no prior knowledge of the degree and type of nonstationarity, and offer a flexible yet powerful approach to handling both abrupt and gradual changes in nonstationary environments. We establish a comprehensive optimality theory for our algorithms by deriving matching regret upper and lower bounds under both general and refined structural conditions with nontrivial proof techniques that are of independent interest. Numerical experiments using real-world datasets, including nurse staffing data for emergency departments and COVID-19 test demand data, showcase the algorithms' superior and robust empirical performance. While motivated by the newsvendor problem, the distributional-detection-and-restart framework applies broadly to a wide class of nonstationary stochastic optimization problems. Managerially, our framework provides a practical, easy-to-deploy, and theoretically grounded solution for decision-making under nonstationarity.
♻ ☆ Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to use stochastic control for sample generation, where the entropy regularizer is introduced to mitigate reward collapse. We also show how the analysis can be extended to fine-tuning with a general $f$-divergence regularizer. Numerical experiments on large-scale text-to-image models--Stable Diffusion v1.5 are conducted to validate our approach.
comment: 17 pages
♻ ☆ Energy Management for Renewable-Colocated Artificial Intelligence Data Centers
We develop an energy management system (EMS) for artificial intelligence (AI) data centers with colocated renewable generation. Under a cost-minimizing framework, the EMS of renewable-colocated data center (RCDC) co-optimizes AI workload scheduling, on-site renewable utilization, and electricity market participation. Within both wholesale and retail market participation models, the economic benefit of the RCDC operation is maximized. Empirical evaluations using real-world traces of electricity prices, data center power consumption, and renewable generation demonstrate significant electricity cost reduction from renewable and AI data center colocations.
♻ ☆ Introducing the method of ellipcenters, a new first order technique for unconstrained optimization
In this paper, we introduce the Method of Ellipcenters (ME) for unconstrained minimization. At the cost of two gradients per iteration and a line search, we compute the next iterate by setting it as the center of an elliptical interpolation. The idea behind the ellipse built in each step is to emulate the original level curve of the objective function constrained to a suitable two-dimensional affine space, which is determined by the current iterate and two appropriate gradient vectors. We present the method for general unconstrained minimization and carry out a convergence analysis for the case where the objective function is quadratic. In this context, ME enjoys linear convergence with the rate being at least as good as the linear rate of the steepest descent (gradient) method with optimal step. In our experiments, however, ME was much faster than the gradient method with optimal step size. Moreover, ME seems highly competitive in comparison to several well established algorithms, including Nesterov's accelerated gradient, Barzilai-Borwein, and conjugate gradient. The efficiency in terms of both time and number of iterations is stressed even more for ill-conditioned problems. A theoretical feature that might be a reason for this is that ME coincides with Newton for quadratic programs in two-dimensional Euclidean spaces, solving them in one single step. In our numerical tests, convergence in one iteration only was also observed for much larger problem sizes.
♻ ☆ Operator Splitting Covariance Steering for Safe Stochastic Nonlinear Control
This paper presents a novel algorithm for solving distribution steering problems featuring nonlinear dynamics and chance constraints. Covariance steering (CS) is an emerging methodology in stochastic optimal control that poses constraints on the first two moments of the state distribution -- thereby being more tractable than full distributional control. Nevertheless, a significant limitation of current approaches for solving nonlinear CS problems, such as sequential convex programming (SCP), is that they often generate infeasible or poor results due to the large number of constraints. In this paper, we address these challenges, by proposing an operator splitting CS approach that temporarily decouples the full problem into subproblems that can be solved in parallel. This relaxation does not require intermediate iterates to satisfy all constraints simultaneously prior to convergence, which enhances exploration and improves feasibility in such non-convex settings. Simulation results across a variety of robotics applications verify the ability of the proposed method to find better solutions even under stricter safety constraints than standard SCP. Finally, the applicability of our framework on real systems is also confirmed through hardware demonstrations
♻ ☆ DRAG: Distributed Resource Allocation Games over Multiple Interacting Coalitions
Despite many distributed resource allocation (DRA) algorithms have been reported in literature, it is still unknown how to allocate the resource optimally over multiple interacting coalitions. One major challenge in solving such a problem is that, the relevance of the decision on resource allocation in a coalition to the benefit of others may lead to conflicts of interest among these coalitions. Under this context, a new type of multi-coalition game is formulated in this paper, termed as resource allocation game, where each coalition contains multiple agents that cooperate to maximize the coalition-level benefit while subject to the resource constraint described by a coupled equality. Inspired by techniques such as variable replacement, gradient tracking and leader-following consensus, two new kinds of DRA algorithms are developed respectively for the scenarios where the individual benefit of each agent explicitly depends on the states of itself and some agents in other coalitions, and on the states of all the game participants. It is shown that the proposed algorithms can converge linearly to the Nash equilibrium (NE) of the multi-coalition game while satisfying the resource constraint during the whole NE-seeking process. Finally, the validity of the present allocation algorithms is verified by numerical simulations.
♻ ☆ On the Linear Speedup of the Push-Pull Method for Decentralized Optimization over Digraphs
The linear speedup property is essential for demonstrating the advantage of distributed algorithms over their single-node counterparts. In this paper, we study the stochastic Push-Pull method, a widely adopted decentralized optimization algorithm over directed graphs (digraphs). Unlike methods that rely solely on row-stochastic or column-stochastic mixing matrices, Push-Pull avoids nonlinear correction and has shown superior empirical performance across a variety of settings. However, its theoretical analysis remains challenging, and the linear speedup property has not been generally establishe--revealing a significant gap between empirical success and limited theoretical understanding. To bridge this gap, we propose a novel analysis framework and prove that Push-Pull achieves linear speedup over arbitrary strongly connected digraphs. Our results provide the comprehensive theoretical understanding for stochastic Push-Pull, aligning its theory with empirical performance. Code: https://github.com/pkumelon/PushPull.
♻ ☆ A strongly convergent inertial inexact proximal-point algorithm for monotone inclusions with applications to variational inequalities
We propose an inertial variant of the strongly convergent inexact proximal-point (PP) method of Solodov and Svaiter (2000) for monotone inclusions. We prove strong convergence of our main algorithm under less restrictive assumptions on the inertial parameters when compared to previous analysis of inertial PP-type algorithms, which makes our method of interest even in finite-dimensional settings. We also performed an iteration-complexity analysis and applied our main algorithm to variational inequalities for monotone operators, obtaining strongly convergent (inertial) variants of Korpolevich's extragradient, forward-backward and Tseng's modified forward-backward methods. Preliminary numerical experiments indicate that our strongly convergent variant of Tseng's modified forward-backward method performs well on certain matrix game problems.
comment: 30 pages, 1 figure
♻ ☆ Impossibility Results of Card-Based Protocols via Mathematical Optimization
This paper introduces mathematical optimization as a new method for proving impossibility results in the field of card-based cryptography. While previous impossibility proofs were often limited to cases involving a small number of cards, this new approach establishes results that hold for a large number of cards. The research focuses on single-cut full-open (SCFO) protocols, which consist of performing one random cut and then revealing all cards. The main contribution is that for any three-variable Boolean function, no new SCFO protocols exist beyond those already known, under the condition that all additional cards have the same color. The significance of this work is that it provides a new framework for proving impossibility results and delivers a proof that is valid for any number of cards, as long as all additional cards have the same color.
♻ ☆ On some applications of the Boundary Control method to spectral estimation and inverse problems
We consider applications of the Boundary Control (BC) method to generalized spectral estimation problems and to inverse source problems. We derive the equations of the BC method for this problems and show that solvability of this equations crucially depends on the controllability properties of the corresponding dynamical system and properties of corresponding families of exponentials.
♻ ☆ Lyapunov-like Stability Inequality with an Asymmetric Matrix and Application to Suboptimal LQ Control Design
The Lyapunov inequality is an indispensable tool for stability analysis in linear control theory. It provides a necessary and sufficient condition for the stability of an autonomous linear-time invariant system in terms of the existence of a symmetric positive-definite Lyapunov matrix. This work proposes a new variant of this inequality in which the constituent Lyapunov matrix is allowed to be asymmetric. After analysing the properties of the proposed inequality for a class of matrices, we derive new results for the stabilisation of linear systems. Subsequently, we utilize the developed results to obtain sufficient conditions for the suboptimal linear quadratic control design problem, where addition to having an asymmetric Lyapunov matrix, which serves as a design matrix for this problem, we provide a characterization of the cost associated with the computed stabilizing suboptimal control laws by deriving an expression for the upper bound on cost in terms of the initial conditions of the system. We demonstrate the applicability of the proposed results using two numerical examples -- one for suboptimal control design for a linear time-invariant system and another for the consensus (state-agreement) protocol design for a multi-agent system where-in we see how the asymmetry of the design matrix emerges as an inherent requirement for the problem.
♻ ☆ Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks
The optimization foundations of deep linear networks have recently received significant attention. However, due to their inherent non-convexity and hierarchical structure, analyzing the loss functions of deep linear networks remains a challenging task. In this work, we study the local geometric landscape of the regularized squared loss of deep linear networks around each critical point. Specifically, we derive a closed-form characterization of the critical point set and establish an error bound for the regularized loss under mild conditions on network width and regularization parameters. Notably, this error bound quantifies the distance from a point to the critical point set in terms of the current gradient norm, which can be used to derive linear convergence of first-order methods. To support our theoretical findings, we conduct numerical experiments and demonstrate that gradient descent converges linearly to a critical point when optimizing the regularized loss of deep linear networks.
comment: 33 pages, 2 figures
♻ ☆ On exactness of SDP relaxation for the maximum cut problem
Semidefinite programming (SDP) provides a powerful relaxation of the maximum cut problem. For a graph $\mathcal{G}$ with rational weights, the decision problem of whether this relaxation is exact is known to be $NP$-complete since Delorme-Poljak (1993), but its complexity was unresolved for unweighted graphs. In this work, we extend the $NP$-completeness result for unweighted graphs. We then investigate and present a few classes of graphs, which we call exact graphs, for which the semidefinite relaxation of the maximum cut problem is exact. For each exact graph class, we determine the optimal objective value by explicitly constructing a maximum cut and then prove uniqueness of the exact solution for two of these classes. We complement these findings by showing that lexicographic products preserve exactness whenever the first factor is exact, with rank-$r$ solutions extending accordingly, and that a graph is exact if and only if when all of its splits are exact, with splitting also preserving the rank of optimal solutions. Addressing two open problems posed by Mirka and Williamson (2024), we demonstrate via counterexamples that exact semidefinite relaxations can admit higher-rank optimal solutions beyond the convex hull of rank-1 reference solutions. We then explain this phenomenon by proving the existence of rank-$(n-1)$ optimal solutions for particular classes of graphs. Finally, we give necessary and sufficient conditions under which the corresponding solution to exact relaxation is unique for particular classes of complete graphs and complete $k$-partite graphs.
♻ ☆ Consensus-Based Optimization Beyond Finite-Time Analysis
We analyze a zeroth-order particle algorithm for the global optimization of a non-convex function, focusing on a variant of Consensus-Based Optimization (CBO) with small but fixed noise intensity. Unlike most previous studies restricted to finite horizons, we investigate its long-time behavior with fixed parameters. In the mean-field limit, a quantitative Laplace principle shows exponential convergence to a neighborhood of the minimizer x * . For finitely many particles, a block-wise analysis yields explicit error bounds: individual particles achieve long-time consistency near x * , and the global best particle converge to x * . The proof technique combines a quantitative Laplace principle with block-wise control of Wasserstein distances, avoiding the exponential blow-up typical of Gr{\"o}nwall-based estimates.
♻ ☆ A generalized global Hartman-Grobman theorem for asymptotically stable semiflows
Recently, Kvalheim and Sontag provided a generalized global Hartman-Grobman theorem for equilibria under asymptotically stable continuous vector fields. By leveraging topological properties of Lyapunov functions, their theorem works without assuming hyperbolicity. We extend their theorem to a class of possibly discontinuous vector fields, in particular, to vector fields generating asymptotically stable semiflows.
comment: Technical note related to the update of 10.48550/arXiv.2411.03277. Newest version: 6 pages, 5 figures, the only change is the presentation. Under review
Systems and Control 41
☆ Compositional System Dynamics: The Higher Mathematics Underlying System Dynamics Diagrams & Practice
This work establishes a robust mathematical foundation for compositional System Dynamics modeling, leveraging category theory to formalize and enhance the representation, analysis, and composition of system models. Here, System Dynamics diagrams, such as stock & flow diagrams, system structure diagrams, and causal loop diagrams, are formulated as categorical constructs, enabling scalable, transparent, and systematic reasoning. By encoding these diagrams as data using attributed C-sets and utilizing advanced categorical tools like structured cospans, pushouts, pullbacks, and functor mappings, the framework supports modular composition, stratification, and seamless mapping between syntax and semantics. The approach underwrites traditional practice with firm mathematical structure, facilitates the identification of certain forms of pathways and feedback loops, the detection of simple patterns within complex diagrams, common structure between diagrams, and structure-preserving mappings between diverse diagram types. Additionally, this framework supports alternative semantics, such as stochastic transition dynamics, extending beyond traditional ordinary differential equation (ODE) representations. Applications in compositional modeling, modularity, and team-based collaboration demonstrate the practical advantages of this advanced framework. Future directions include integrating dimensional annotations, supporting hybrid and agent-based modeling paradigms, and expanding the framework's applicability to global and local temporal reasoning through temporal sheaves. By revealing and formalizing the hidden mathematical structure of System Dynamics diagrams, this work empowers practitioners to tackle complex systems with clarity, scalability, and rigor.
☆ A Counterfactual Reasoning Framework for Fault Diagnosis in Robot Perception Systems
Perception systems provide a rich understanding of the environment for autonomous systems, shaping decisions in all downstream modules. Hence, accurate detection and isolation of faults in perception systems is important. Faults in perception systems pose particular challenges: faults are often tied to the perceptual context of the environment, and errors in their multi-stage pipelines can propagate across modules. To address this, we adopt a counterfactual reasoning approach to propose a framework for fault detection and isolation (FDI) in perception systems. As opposed to relying on physical redundancy (i.e., having extra sensors), our approach utilizes analytical redundancy with counterfactual reasoning to construct perception reliability tests as causal outcomes influenced by system states and fault scenarios. Counterfactual reasoning generates reliability test results under hypothesized faults to update the belief over fault hypotheses. We derive both passive and active FDI methods. While the passive FDI can be achieved by belief updates, the active FDI approach is defined as a causal bandit problem, where we utilize Monte Carlo Tree Search (MCTS) with upper confidence bound (UCB) to find control inputs that maximize a detection and isolation metric, designated as Effective Information (EI). The mentioned metric quantifies the informativeness of control inputs for FDI. We demonstrate the approach in a robot exploration scenario, where a space robot performing vision-based navigation actively adjusts its attitude to increase EI and correctly isolate faults caused by sensor damage, dynamic scenes, and perceptual degradation.
☆ Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games
Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.
☆ Optimal Service Mode Assignment in a Simple Computation Offloading System: Extended Version
We consider a simple computation offloading model where jobs can either be fully processed in the cloud or be partially processed at a local server before being sent to the cloud to complete processing. Our goal is to design a policy for assigning jobs to service modes, i.e., full offloading or partial offloading, based on the state of the system, in order to minimize delay in the system. We show that when the cloud server is idle, the optimal policy is to assign the next job in the system queue to the cloud for processing. However, when the cloud server is busy, we show that, under mild assumptions, the optimal policy is of a threshold type, that sends the next job in the system queue to the local server if the queue exceeds a certain threshold. Finally, we demonstrate this policy structure through simulations.
☆ On the Dynamics of Acceleration in First order Gradient Methods
Ever since the original algorithm by Nesterov (1983), the true nature of the acceleration phenomenon has remained elusive, with various interpretations of why the method is actually faster. The diagnosis of the algorithm through the lens of Ordinary Differential Equations (ODEs) and the corresponding dynamical system formulation to explain the underlying dynamics has a rich history. In the literature, the ODEs that explain algorithms are typically derived by considering the limiting case of the algorithm maps themselves, that is, an ODE formulation follows the development of an algorithm. This obfuscates the underlying higher order principles and thus provides little evidence of the working of the algorithm. Such has been the case with Nesterov algorithm and the various analogies used to describe the acceleration phenomena, viz, momentum associated with the rolling of a Heavy-Ball down a slope, Hessian damping etc. The main focus of our work is to ideate the genesis of the Nesterov algorithm from the viewpoint of dynamical systems leading to demystifying the mathematical rigour behind the algorithm. Instead of reverse engineering ODEs from discrete algorithms, this work explores tools from the recently developed control paradigm titled Passivity and Immersion approach and the Geometric Singular Perturbation theory which are applied to arrive at the formulation of a dynamical system that explains and models the acceleration phenomena. This perspective helps to gain insights into the various terms present and the sequence of steps used in Nesterovs accelerated algorithm for the smooth strongly convex and the convex case. The framework can also be extended to derive the acceleration achieved using the triple momentum method and provides justifications for the non-convergence to the optimal solution in the Heavy-Ball method.
☆ Fully Distributed State Estimation for Multi-agent Systems and its Application in Cooperative Localization
In this paper, we investigate the distributed state estimation problem for a continuous-time linear multi-agent system (MAS) composed of $\mathit{m}$ agents and monitored by the agents themselves. To address this problem, we propose a distributed observer that enables each agent to reconstruct the state of the MAS. The main idea is to let each agent $\mathit{i}$ recover the state of agent $\mathit{j}$ by using leader-follower consensus rules to track agent $\mathit{j}$'s state estimate, which is generated by agent $\mathit{j}$ itself using a Luenberger-like estimation rule. Under the assumptions of node-level observability and topological ordering consistency, we show that the estimation error dynamics are stabilizable if and only if the communication graph is strongly connected. Moreover, we discuss the fully distributed design of the proposed observer, assuming that the agents only know basic MAS configuration information, such as the homogeneity and the maximum number of allowable agents. This design ensures that the proposed observer functions correctly when agents are added or removed. Building on this, we consider cooperative localization as a distributed estimation problem and develop two fully distributed localization algorithms that allow agents to track their own and other agents' positions (and velocities) within the MAS. Finally, we conduct simulations to demonstrate the effectiveness of our proposed theoretical results.
☆ HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba
End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.
comment: 10 pages
☆ Prepare Before You Act: Learning From Humans to Rearrange Initial States
Imitation learning (IL) has proven effective across a wide range of manipulation tasks. However, IL policies often struggle when faced with out-of-distribution observations; for instance, when the target object is in a previously unseen position or occluded by other objects. In these cases, extensive demonstrations are needed for current IL methods to reach robust and generalizable behaviors. But when humans are faced with these sorts of atypical initial states, we often rearrange the environment for more favorable task execution. For example, a person might rotate a coffee cup so that it is easier to grasp the handle, or push a box out of the way so they can directly grasp their target object. In this work we seek to equip robot learners with the same capability: enabling robots to prepare the environment before executing their given policy. We propose ReSET, an algorithm that takes initial states -- which are outside the policy's distribution -- and autonomously modifies object poses so that the restructured scene is similar to training data. Theoretically, we show that this two step process (rearranging the environment before rolling out the given policy) reduces the generalization gap. Practically, our ReSET algorithm combines action-agnostic human videos with task-agnostic teleoperation data to i) decide when to modify the scene, ii) predict what simplifying actions a human would take, and iii) map those predictions into robot action primitives. Comparisons with diffusion policies, VLAs, and other baselines show that using ReSET to prepare the environment enables more robust task execution with equal amounts of total training data. See videos at our project website: https://reset2025paper.github.io/
☆ Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin (DT) simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines DT estimates from real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The accuracy of DTs is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate DTs are used more frequently, while inaccurate DTs are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization (BO) and multi-fidelity methods.
comment: This preprint is intended for submission to IEEE Robotics and Automation Letters (RA-L)
☆ Developing a Dynamic Mobility Model for Backcasting Applications: A Case Study with Shared Autonomous Vehicles
This study proposes the application of a backcasting approach to a mobility model with the aim of defining an optimal decarbonization roadmap. The selected decision variable is the introduction of a fleet of shared autonomous vehicles. The mobility model developed is composed of six interconnected sub-models. After presenting each of these models in detail, a method is introduced to analyze the direct and indirect effects of the measure, and a necessary condition for the occurrence of an undesirable effect is identified. Simulations in both forecasting and backcasting frameworks are then conducted, demonstrating the relevance of backcasting: it enables a 10% reduction in operator costs compared to forecasting results, while maintaining the same level of emissions.
☆ Lipschitz-Based Robustness Certification for Recurrent Neural Networks via Convex Relaxation
Robustness certification against bounded input noise or adversarial perturbations is increasingly important for deployment recurrent neural networks (RNNs) in safety-critical control applications. To address this challenge, we present RNN-SDP, a relaxation based method that models the RNN's layer interactions as a convex problem and computes a certified upper bound on the Lipschitz constant via semidefinite programming (SDP). We also explore an extension that incorporates known input constraints to further tighten the resulting Lipschitz bounds. RNN-SDP is evaluated on a synthetic multi-tank system, with upper bounds compared to empirical estimates. While incorporating input constraints yields only modest improvements, the general method produces reasonably tight and certifiable bounds, even as sequence length increases. The results also underscore the often underestimated impact of initialization errors, an important consideration for applications where models are frequently re-initialized, such as model predictive control (MPC).
comment: 10 pages, 3 figures,
☆ On Fast Attitude Filtering Based on Matrix Fisher Distribution with Stability Guarantee
This paper addresses two interrelated problems of the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions, from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs is then proposed. These filters avoid the burdensome computations in previous MFD-based filters by introducing linearized error systems with right-invariant errors but retaining the two advantageous properties. Moreover, we leverage the two properties and closed-form filtering iteration to prove the almost-global exponential stability of the proposed filter with right-invariant error for the single-axis rotation, which, to our knowledge, is not achieved by existing directional statistics-based filters. Numerical simulations demonstrate that the proposed filters are significantly more accurate than the classic invariant Kalman filter. Besides, they are also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty but consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
☆ An Optimal Control Interpretation of Augmented Distributed Optimization Algorithms
Distributed optimization algorithms are used in a wide variety of problems involving complex network systems where the goal is for a set of agents in the network to solve a network-wide optimization problem via distributed update rules. In many applications, such as communication networks and power systems, transient performance of the algorithms is just as critical as convergence, as the algorithms link to physical processes which must behave well. Primal-dual algorithms have a long history in solving distributed optimization problems, with augmented Lagrangian methods leading to important classes of widely used algorithms, which have been observed in simulations to improve transient performance. Here we show that such algorithms can be seen as being the optimal solution to an appropriately formulated optimal control problem, i.e., a cost functional associated with the transient behavior of the algorithm is minimized, penalizing deviations from optimality during algorithm transients. This is shown for broad classes of algorithm dynamics, including the more involved setting where inequality constraints are present. The results presented improve our understanding of the performance of distributed optimization algorithms and can be used as a basis for improved formulations.
☆ EigenSafe: A Spectral Framework for Learning-Based Stochastic Safety Filtering
We present EigenSafe, an operator-theoretic framework for learning-enabled safety-critical control for stochastic systems. In many robotic systems where dynamics are best modeled as stochastic systems due to factors such as sensing noise and environmental disturbances, it is challenging for conventional methods such as Hamilton-Jacobi reachability and control barrier functions to provide a holistic measure of safety. We derive a linear operator governing the dynamic programming principle for safety probability, and find that its dominant eigenpair provides information about safety for both individual states and the overall closed-loop system. The proposed learning framework, called EigenSafe, jointly learns this dominant eigenpair and a safe backup policy in an offline manner. The learned eigenfunction is then used to construct a safety filter that detects potentially unsafe situations and falls back to the backup policy. The framework is validated in three simulated stochastic safety-critical control tasks.
comment: Workshop on Safe and Robust Robot Learning for Operation in the Real World (SAFE-ROL) at CoRL 2025
☆ Existence and Synthesis of Multi-Resolution Approximate Bisimulations for Continuous-State Dynamical Systems
We present a fully automatic framework for synthesising compact, finite-state deterministic abstractions of deterministic, continuous-state autonomous systems under locally specified resolution requirements. Our approach builds on multi-resolution approximate bisimulations, a generalisation of classical $\epsilon$-approximate bisimulations, that support state-dependent error bounds and subsumes both variable- and uniform-resolution relations. We show that some systems admit multi-resolution bisimulations but no $\epsilon$-approximate bisimulation. We prove the existence of multi-resolution approximately bisimilar abstractions for all incrementally uniformly bounded ($\delta$-UB) systems, thereby broadening the applicability of symbolic verification to a larger class of dynamics; as a trivial special case, this result also covers incrementally globally asymptotically stable ($\delta$-GAS) systems. The Multi-resolution Abstraction Synthesis Problem (MRASP) is solved via a scalable Counterexample-Guided Inductive Synthesis (CEGIS) loop, combining mesh refinement with counterexample-driven refinement. This ensures soundness for all $\delta$-UB systems, and ensures termination in certain special cases. Experiments on linear and nonlinear benchmarks, including non-$\delta$-GAS and non-differentiable cases, demonstrate that our algorithm yields abstractions up to 50\% smaller than Lyapunov-based grids while enforcing tighter, location-dependent error guarantees.
☆ RSU-Assisted Resource Allocation for Collaborative Perception
As a pivotal technology for autonomous driving, collaborative perception enables vehicular agents to exchange perceptual data through vehicle-to-everything (V2X) communications, thereby enhancing perception accuracy of all collaborators. However, existing collaborative perception frameworks often assume ample communication resources, which is usually impractical in real-world vehicular networks. To address this challenge, this paper investigates the problem of communication resource allocation for collaborative perception and proposes RACooper, a novel RSU-assisted resource allocation framework that maximizes perception accuracy under constrained communication resources. RACooper leverages a hierarchical reinforcement learning model to dynamically allocate communication resources while accounting for real-time sensing data and channel dynamics induced by vehicular mobility. By jointly optimizing spatial confidence metrics and channel state information, our approach ensures efficient feature transmission, enhancing the effectiveness of collaborative perception. Simulation results demonstrate that compared to conventional baseline algorithms, RACooper achieves significant improvements in perception accuracy, especially under bandwidth-constrained scenarios.
☆ Holistic Grid-Forming Control for HVDC-Connected Offshore Wind Power Plants to Provide Frequency Response
HVDC-connected offshore wind power plants (OWPPs) are expected to provide inertial response and frequency containment reserve (FCR) to help address the frequency control challenges caused by the growing penetration of power electronics in power systems. Initially dominated by communication-based and grid-following (GFL) control, recent efforts have shifted towards incorporating communication-free and grid-forming (GFM) control into HVDC-OWPP systems to enhance their frequency response capability. This paper proposes a holistic GFM control based on dual-port GFM control to improve the coordination across the entire AC-DC-AC dynamics. A frequency response model of a typical HVDC-OWPP system is developed for GFM control design. Then, dual-port GFM control and virtual synchronous generator control are implemented respectively on the HVDC system and OWPP of the typical system, where the asynchronism of onshore and offshore frequencies is revealed. Next, holistic GFM control is proposed to improve the synchronization and DC voltage regulation. Finally, simulations on the delivery of FCR and inertial response are carried out to verify the feasibility and effectiveness of the proposed control.
comment: This is the author's version of the paper accepted for presentation at IEEE PowerTech 2025. The final version will be available via IEEE Xplore
☆ On continuous-time sparse identification of nonlinear polynomial systems
This paper leverages recent advances in high derivatives reconstruction from noisy-time series and sparse multivariate polynomial identification in order to improve the process of parsimoniously identifying, from a small amount of data, unknown Single-Input/Single-Output nonlinear dynamics of relative degree up to 4. The methodology is illustrated on the Electronic Throttle Controlled automotive system.
☆ Coordinated Battery Electric Vehicle Charging Scheduling across Multiple Charging Stations
The uptake of battery electric vehicles (BEVs) is increasing to reduce greenhouse gas emissions in the transport sector. The rapid adoption of BEVs depends significantly on the coordinated charging/discharging infrastructure. Without it, uncontrolled and erratic charging patterns could lead to increased power losses and voltage fluctuations beyond acceptable thresholds. BEV charge scheduling presents a multi-objective optimization (MOO) challenge, demanding a balance between minimizing network impact and maximizing the benefits for electric vehicle charging station (EVCS) operators and BEV owners. In this paper, we develop an MOO framework incorporating a carbon emission program and a dynamic economic dispatch problem, allowing BEV users to respond by charging and discharging through grid-to-vehicle (G2V) and vehicle-to-grid (V2G) technologies according to the optimal electricity price and compensation. Furthermore, we integrate dynamic economic dispatch with time-of-use tariffs to obtain optimal market electricity prices and reduce total costs over 24 hours. Our experimental results on a sample network show that the proposed scheduling increases participation in V2G services by over 10%, increases EVCS benefits by over 20%, and reduces network losses. Furthermore, increased rates of charging/discharging, coupled with more significant carbon revenue benefits for BEV users and EVCS, contribute to better offsetting battery degradation costs.
comment: 28 pages, 13 figures
☆ Reversible Kalman Filter for state estimation with Manifold
This work introduces an algorithm for state estimation on manifolds within the framework of the Kalman filter. Its primary objective is to provide a methodology enabling the evaluation of the precision of existing Kalman filter variants with arbitrary accuracy on synthetic data, something that, to the best of our knowledge, has not been addressed in prior work. To this end, we develop a new filter that exhibits favorable numerical properties, thereby correcting the divergences observed in previous Kalman filter variants. In this formulation, the achievable precision is no longer constrained by the small-velocity assumption and is determined solely by sensor noise. In addition, this new filter assumes high precision on the sensors, which, in real scenarios require a detection step that we define heuristically, allowing one to extend this approach to scenarios, using either a 9-axis IMU or a combination of odometry, accelerometer, and pressure sensors. The latter configuration is designed for the reconstruction of trajectories in underwater environments.
☆ GPS Denied IBVS-Based Navigation and Collision Avoidance of UAV Using a Low-Cost RGB Camera
This paper proposes an image-based visual servoing (IBVS) framework for UAV navigation and collision avoidance using only an RGB camera. While UAV navigation has been extensively studied, it remains challenging to apply IBVS in missions involving multiple visual targets and collision avoidance. The proposed method achieves navigation without explicit path planning, and collision avoidance is realized through AI-based monocular depth estimation from RGB images. Unlike approaches that rely on stereo cameras or external workstations, our framework runs fully onboard a Jetson platform, ensuring a self-contained and deployable system. Experimental results validate that the UAV can navigate across multiple AprilTags and avoid obstacles effectively in GPS-denied environments.
☆ A Fundamental Study for Multiobjective Optimization Problems in Nonlinear Dynamical Systems
Multiobjective optimization problems are important in analysis and application of nonlinear dynamical systems. As a first step, this paper studies a biobjective optimization problem in a simple nonlinear switched dynamical system: a piecewise linear system based on a boost converter with photovoltaic input. The piecewise linearity enables us to analyze the nonlinear dynamics exactly. In the biobjective optimization problem, the first objective evaluates stability of circuit operation and the second objective evaluates average input power. A main task is analysis of a trade-off between the two objectives. Using the piecewise exact solutions, the two objectives are formulated theoretically. Using the theoretical formulae, the existence of a trade-off between the two objectives is clarified exactly. Relationship between the trade-off and parameters is also considered. The results provide fundamental information to analyze multiobjective optimization problems in various nonlinear systems and to realize their engineering applications.
☆ Methods for Multi-objective Optimization PID Controller for quadrotor UAVs
Integrating unmanned aerial vehicles into daily use requires controllers that ensure stable flight, efficient energy use, and reduced noise. Proportional integral derivative controllers remain standard but are highly sensitive to gain selection, with manual tuning often yielding suboptimal trade-offs. This paper studies different optimization techniques for the automated tuning of quadrotor proportional integral derivative gains under a unified simulation that couples a blade element momentum based aerodynamic model with a fast deep neural network surrogate, six degrees of freedom rigid body dynamics, turbulence, and a data driven acoustic surrogate model that predicts third octave spectra and propagates them to ground receivers. We compare three families of gradient-free optimizers: metaheuristics, Bayesian optimization, and deep reinforcement learning. Candidate controllers are evaluated using a composite cost function that incorporates multiple metrics, such as noise footprint and power consumption, simultaneously. Metaheuristics improve performance consistently, with Grey Wolf Optimization producing optimal results. Bayesian optimization is sample efficient but carries higher per iteration overhead and depends on the design domain. The reinforcement learning agents do not surpass the baseline in the current setup, suggesting the problem formulation requires further refinement. On unseen missions the best tuned controller maintains accurate tracking while reducing oscillations, power demand, and acoustic emissions. These results show that noise aware proportional integral derivative tuning through black box search can deliver quieter and more efficient flight without hardware changes.
comment: 47 pages, 13 figures
☆ Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR
Ensuring the safety of neural networks under input uncertainty is a fundamental challenge in safety-critical applications. This paper builds on and expands Fazlyab's quadratic-constraint (QC) and semidefinite-programming (SDP) framework for neural network verification to a distributionally robust and tail-risk-aware setting by integrating worst-case Conditional Value-at-Risk (WC-CVaR) over a moment-based ambiguity set with fixed mean and covariance. The resulting conditions remain SDP-checkable and explicitly account for tail risk. This integration broadens input-uncertainty geometry-covering ellipsoids, polytopes, and hyperplanes-and extends applicability to safety-critical domains where tail-event severity matters. Applications to closed-loop reachability of control systems and classification are demonstrated through numerical experiments, illustrating how the risk level $\varepsilon$ trades conservatism for tolerance to tail events-while preserving the computational structure of prior QC/SDP methods for neural network verification and robustness analysis.
☆ AERO-MPPI: Anchor-Guided Ensemble Trajectory Optimization for Agile Mapless Drone Navigation
Agile mapless navigation in cluttered 3D environments poses significant challenges for autonomous drones. Conventional mapping-planning-control pipelines incur high computational cost and propagate estimation errors. We present AERO-MPPI, a fully GPU-accelerated framework that unifies perception and planning through an anchor-guided ensemble of Model Predictive Path Integral (MPPI) optimizers. Specifically, we design a multi-resolution LiDAR point-cloud representation that rapidly extracts spatially distributed "anchors" as look-ahead intermediate endpoints, from which we construct polynomial trajectory guides to explore distinct homotopy path classes. At each planning step, we run multiple MPPI instances in parallel and evaluate them with a two-stage multi-objective cost that balances collision avoidance and goal reaching. Implemented entirely with NVIDIA Warp GPU kernels, AERO-MPPI achieves real-time onboard operation and mitigates the local-minima failures of single-MPPI approaches. Extensive simulations in forests, verticals, and inclines demonstrate sustained reliable flight above 7 m/s, with success rates above 80% and smoother trajectories compared to state-of-the-art baselines. Real-world experiments on a LiDAR-equipped quadrotor with NVIDIA Jetson Orin NX 16G confirm that AERO-MPPI runs in real time onboard and consistently achieves safe, agile, and robust flight in complex cluttered environments. The code will be open-sourced upon acceptance of the paper.
♻ ☆ A Vision for Trustworthy, Fair, and Efficient Socio-Technical Control using Karma Economies
Control systems will play a pivotal role in addressing societal-scale challenges as they drive the development of sustainable future smart cities. At the heart of these challenges is the trustworthy, fair, and efficient allocation of scarce public resources, including renewable energy, transportation, data, computation, etc.. Historical evidence suggests that monetary control -- the prototypical mechanism for managing resource scarcity -- is not always well-accepted in socio-technical resource contexts. In this vision article, we advocate for karma economies as an emerging non-monetary mechanism for socio-technical control. Karma leverages the repetitive nature of many socio-technical resources to jointly attain trustworthy, fair, and efficient allocations; by budgeting resource consumption over time and letting resource users ``play against their future selves.'' To motivate karma, we review related concepts in economics through a control systems lens, and make a case for a) shifting the viewpoint of resource allocations from single-shot and static to repeated and dynamic games; and b) adopting long-run Nash welfare as the formalization of ``fairness and efficiency'' in socio-technical contexts. We show that in many dynamic resource settings, karma Nash equilibria maximize long-run Nash welfare. Moreover, we discuss implications for a future smart city built on multi-karma economies: by choosing whether to combine different socio-technical resources, e.g., electricity and transportation, in a single karma economy, or separate into resource-specific economies, karma provides new flexibility to design the scope of fairness and efficiency.
♻ ☆ Agentic DDQN-Based Scheduling for Licensed and Unlicensed Band Allocation in Sidelink Networks
In this paper, we present an agentic double deep Q-network (DDQN) scheduler for licensed/unlicensed band allocation in New Radio (NR) sidelink (SL) networks. Beyond conventional reward-seeking reinforcement learning (RL), the agent perceives and reasons over a multi-dimensional context that jointly captures queueing delay, link quality, coexistence intensity, and switching stability. A capacity-aware, quality of service (QoS)-constrained reward aligns the agent with goal-oriented scheduling rather than static thresholding. Under constrained bandwidth, the proposed design reduces blocking by up to 87.5% versus threshold policies while preserving throughput, highlighting the value of context-driven decisions in coexistence-limited NR SL networks. The proposed scheduler is an embodied agent (E-agent) tailored for task-specific, resource-efficient operation at the network edge.
comment: 6 pages, 3 figures, accepted by 2025 IEEE Globecom Workshops
♻ ☆ Distributed Continuous-Time Control via System Level Synthesis
This paper focuses on the design of H2 and H-Infinity distributed controllers with local communication and local disturbance rejection. We propose a two-step controller design procedure: (1) select closed-loop poles, then (2) optimize over parameterized controllers. Our parameterization builds on the system level synthesis (SLS) formulation -- primarily used in the discrete-time setting -- and extends it to the general continuous-time setting. We verify our approach in simulation on a 9-node grid governed by linearized swing equations, where our distributed controllers (with local communication) achieve performance comparable to that of optimal centralized controllers while facilitating local disturbance rejection.
comment: 8 pages, in submission to conference
♻ ☆ Online Slip Detection and Friction Coefficient Estimation for Autonomous Racing
Accurate knowledge of the tire-road friction coefficient (TRFC) is essential for vehicle safety, stability, and performance, especially in autonomous racing, where vehicles often operate at the friction limit. However, TRFC cannot be directly measured with standard sensors, and existing estimation methods either depend on vehicle or tire models with uncertain parameters or require large training datasets. In this paper, we present a lightweight approach for online slip detection and TRFC estimation. Our approach relies solely on IMU and LiDAR measurements and the control actions, without special dynamical or tire models, parameter identification, or training data. Slip events are detected in real time by comparing commanded and measured motions, and the TRFC is then estimated directly from observed accelerations under no-slip conditions. Experiments with a 1:10-scale autonomous racing car across different friction levels demonstrate that the proposed approach achieves accurate and consistent slip detections and friction coefficients, with results closely matching ground-truth measurements. These findings highlight the potential of our simple, deployable, and computationally efficient approach for real-time slip monitoring and friction coefficient estimation in autonomous driving.
comment: Equal contribution by the first three authors
♻ ☆ Equality Constrained Diffusion for Direct Trajectory Optimization
The recent success of diffusion-based generative models in image and natural language processing has ignited interest in diffusion-based trajectory optimization for nonlinear control systems. Existing methods cannot, however, handle the nonlinear equality constraints necessary for direct trajectory optimization. As a result, diffusion-based trajectory optimizers are currently limited to shooting methods, where the nonlinear dynamics are enforced by forward rollouts. This precludes many of the benefits enjoyed by direct methods, including flexible state constraints, reduced numerical sensitivity, and easy initial guess specification. In this paper, we present a method for diffusion-based optimization with equality constraints. This allows us to perform direct trajectory optimization, enforcing dynamic feasibility with constraints rather than rollouts. To the best of our knowledge, this is the first diffusion-based optimization algorithm that supports the general nonlinear equality constraints required for direct trajectory optimization.
comment: ACC 2025, fixed typo in equations (11)-(12)
♻ ☆ Locomotor transitions in the potential energy landscape-dominated regime
To traverse complex three-dimensional terrainwith large obstacles, animals and robots must transition across different modes. However, the most mechanistic understanding of terrestrial locomotion concerns how to generate and stabilize near-steady-state, single-mode locomotion (e.g. walk, run). We know little about how to use physical interaction to make robust locomotor transitions. Here, we review our progress towards filling this gap by discovering terradynamic principles of multi-legged locomotor transitions, using simplified model systems representing distinct challenges in complex three-dimensional terrain. Remarkably, general physical principles emerge across diverse model systems, by modelling locomotor-terrain interaction using a potential energy landscape approach. The animal and robots' stereotyped locomotor modes are constrained by physical interaction. Locomotor transitions are stochastic, destabilizing, barrier-crossing transitions on the landscape. They can be induced by feed-forward self-propulsion and are facilitated by feedbackcontrolled active adjustment. General physical principles and strategies from our systematic studies already advanced robot performance in simple model systems. Efforts remain to better understand the intelligence aspect of locomotor transitions and how to compose larger-scale potential energy landscapes of complex three-dimensional terrains from simple landscapes of abstracted challenges. This will elucidate how the neuromechanical control system mediates physical interaction to generate multi-pathway locomotor transitions and lead to advancements in biology, physics, robotics and dynamical systems theory.
♻ ☆ Shape-induced obstacle attraction and repulsion during dynamic locomotion
Robots still struggle to dynamically traverse complex 3-D terrain with many large obstacles, an ability required for many critical applications. Body-obstacle interaction is often inevitable and induces perturbation and uncertainty in motion that challenges closed-form dynamic modeling. Here, inspired by recent discovery of a terradynamic streamlined shape, we studied how two body shapes interacting with obstacles affect turning and pitching motions of an open-loop multi-legged robot and cockroaches during dynamic locomotion. With a common cuboidal body, the robot was attracted towards obstacles, resulting in pitching up and flipping-over. By contrast, with an elliptical body, the robot was repelled by obstacles and readily traversed. The animal displayed qualitatively similar turning and pitching motions induced by these two body shapes. However, unlike the cuboidal robot, the cuboidal animal was capable of escaping obstacle attraction and subsequent high pitching and flipping over, which inspired us to develop an empirical pitch-and-turn strategy for cuboidal robots. Considering the similarity of our self-propelled body-obstacle interaction with part-feeder interaction in robotic part manipulation, we developed a quasi-static potential energy landscape model to explain the dependence of dynamic locomotion on body shape. Our experimental and modeling results also demonstrated that obstacle attraction or repulsion is an inherent property of locomotor body shape and insensitive to obstacle geometry and size. Our study expanded the concept and usefulness of terradynamic shapes for passive control of robot locomotion to traverse large obstacles using physical interaction. Our study is also a step in establishing an energy landscape approach to locomotor transitions.
♻ ☆ Randomness in appendage coordination facilitates strenuous ground self-righting
Randomness is common in biological and artificial systems, resulting either from stochasticity of the environment or noise in organisms or devices themselves. In locomotor control, randomness is typically considered a nuisance. For example, during dynamic walking, randomness in stochastic terrain leads to metastable dynamics, which must be mitigated to stabilize the system around limit cycles. Here, we studied whether randomness in motion is beneficial for strenuous locomotor tasks. Our study used robotic simulation modeling of strenuous, leg-assisted, winged ground self-righting observed in cockroaches, in which unusually large randomness in wing and leg motions is present. We developed a simplified simulation robot capable of generating similar self-righting behavior and varied the randomness level in wing-leg coordination. During each wing opening attempt, the more randomness added to the time delay between wing opening and leg swinging, the more likely it was for the naive robot (which did not know what coordination is best) to self-right within a finite time. Wing-leg coordination, measured by the phase between wing and leg oscillations, had a crucial impact on self-righting outcome. Without randomness, periodic wing and leg oscillations often limited the system to visit a few bad phases, leading to failure to escape from the metastable state. With randomness, the system explored phases thoroughly and had a better chance of encountering good phases to self-right. Our study complements previous work by demonstrating that randomness helps destabilize locomotor systems from being trapped in undesired metastable states, a situation common in strenuous locomotion.
♻ ☆ Information-Theoretic Bounds and Task-Centric Learning Complexity for Real-World Dynamic Nonlinear Systems
Dynamic nonlinear systems exhibit distortions arising from coupled static and dynamic effects. Their intertwined nature poses major challenges for data-driven modeling. This paper presents a theoretical framework grounded in structured decomposition, variance analysis, and task-centric complexity bounds. The framework employs a directional lower bound on interactions between measurable system components, extending orthogonality in inner product spaces to structurally asymmetric settings. This bound supports variance inequalities for decomposed systems. Key behavioral indicators are introduced along with a memory finiteness index. A rigorous power-based condition establishes a measurable link between finite memory in realizable systems and the First Law of Thermodynamics. This offers a more foundational perspective than classical bounds based on the Second Law. Building on this foundation, we formulate a `Behavioral Uncertainty Principle,' demonstrating that static and dynamic distortions cannot be minimized simultaneously. We identify that real-world systems seem to resist complete deterministic decomposition due to entangled static and dynamic effects. We also present two general-purpose theorems linking function variance to mean-squared Lipschitz continuity and learning complexity. This yields a model-agnostic, task-aware complexity metric, showing that lower-variance components are inherently easier to learn. These insights explain the empirical benefits of structured residual learning, including improved generalization, reduced parameter count, and lower training cost, as previously observed in power amplifier linearization experiments. The framework is broadly applicable and offers a scalable, theoretically grounded approach to modeling complex dynamic nonlinear systems.
comment: 15 pages, 1 figure, 2 photographs
♻ ☆ Prescribed-Time Event-Triggered Control for Matrix-Scaled Networks
This article proposes a distributed control method for matrix-scaled multi-agent networks aimed at achieving convergence within a user-defined time frame. The control law of each individual agent relies only on information from neighboring agents and is updated at discrete intervals determined by state-dependent triggering functions, reducing the frequency of agent interactions. To this end, first, the controller is augmented with a time-varying gain. Then, the dynamics of the closed-loop system over the finite-time interval is transformed into an infinite-time frame using time scaling. Lyapunov-based analysis is employed to derive suitable triggering conditions that guarantee the asymptotic convergence of the time-transformed system, thereby ensuring the prescribed-time convergence of the original system.
comment: 11 pages
♻ ☆ On-Policy Reinforcement-Learning Control for Optimal Energy Sharing and Temperature Regulation in District Heating Systems
We address the problem of temperature regulation and optimal energy sharing in district heating systems (DHSs) where the demand and system parameters are unknown. We propose a temperature regulation scheme that employs data-driven on-policy updates that achieve these objectives. In particular, we show that the proposed control scheme converges to an optimal equilibrium point of the system, while also having guaranteed convergence to an optimal LQR control policy, thus providing good transient performance. The efficiency of our approach is also demonstrated through extensive simulations.
comment: To appear at CDC 2025
♻ ☆ Handling Infinite Domain Parameters in Planning Through Best-First Search with Delayed Partial Expansions
In automated planning, control parameters extend standard action representations through the introduction of continuous numeric decision variables. Existing state-of-the-art approaches have primarily handled control parameters as embedded constraints alongside other temporal and numeric restrictions, and thus have implicitly treated them as additional constraints rather than as decision points in the search space. In this paper, we propose an efficient alternative that explicitly handles control parameters as true decision points within a systematic search scheme. We develop a best-first, heuristic search algorithm that operates over infinite decision spaces defined by control parameters and prove a notion of completeness in the limit under certain conditions. Our algorithm leverages the concept of delayed partial expansion, where a state is not fully expanded but instead incrementally expands a subset of its successors. Our results demonstrate that this novel search algorithm is a competitive alternative to existing approaches for solving planning problems involving control parameters.
♻ ☆ Network-Realised Model Predictive Control Part II: Distributed Constraint Management
A two-layer control architecture is proposed, which promotes scalable implementations for model predictive controllers. The top layer acts as both reference governor for the bottom layer, and as a feedback controller for the regulated network. By employing set-based methods, global theoretical guarantees are obtained by enforcing local constraints upon the network's variables and upon those of the first layer's implementation. The proposed technique offers recursive feasibility guarantees as one of its central features, and the expressions of the resulting predictive strategies bear a striking resemblance to classical formulations from model predictive control literature, allowing for flexible and easily customizable implementations.
comment: 20 pages, 9 figures, 4 tables
♻ ☆ Virtual Contraction Approach to Decentralized Adaptive Stabilization of Nonlinear Time-Delayed Networks
In this paper, we exploit a diagonally dominant structure for the decentralized stabilization of unknown nonlinear time-delayed networks. To this end, we first introduce a novel generalization of virtual contraction analysis to diagonally dominant time-delayed control systems. We then show that nonlinear time-delayed networks can be stabilized using diagonal high-gains, provided that the input matrices satisfy certain generalized (column/row) diagonally dominant conditions. To enable stabilization of unknown networks, we further propose a distributed adaptive tuning rule for each individual gain function, guaranteeing that all closed-loop trajectories converge to the origin while the gains converge to finite values. The effectiveness of the proposed decentralized adaptive control is illustrated through a case study on epidemic spreading control in SIS networks with transmission delays.
♻ ☆ Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data. However, traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates, which can result in significant delays and increased communication overhead, particularly in heterogeneous and dynamic environments. To address these challenges in this paper, we propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously. Our key contributions include a comprehensive convergence analysis of AFL in the presence of client delays and model staleness. By leveraging martingale difference sequence theory and variance bounds, we ensure robust convergence despite asynchronous updates. Assuming strongly convex local objective functions, we establish bounds on gradient variance under random client sampling and derive a recursion formula quantifying the impact of client delays on convergence. Furthermore, we demonstrate the practical applicability of the AFL algorithm by training decentralized linear regression and Support Vector Machine (SVM) based classifiers and compare its results with synchronous FL algorithm to effectively handling non-IID data distributed among clients. The proposed AFL algorithm addresses key limitations of traditional FL methods, such as inefficiency due to global synchronization and susceptibility to client drift. It enhances scalability, robustness, and efficiency in real-world settings with heterogeneous client populations and dynamic network conditions. Our results underscore the potential of AFL to drive advancements indistributed learning systems, particularly for large-scale, privacy-preserving applications in resource-constrained environments.
♻ ☆ Quasi Steady-State Frequency
Accurate frequency estimation is critical for the control, monitoring and protection of electrical power systems, in particular, of systems with a high penetration of power electronics. This paper introduces the novel concept of Quasi Steady-State (QSS) frequency as a quantity that fills the gap between stationary and instantaneous frequency. QSS frequency coincides with the fundamental frequency of an AC voltage in any stationary conditions, including unbalanced and non-sinusoidal, and is able to capture the time-varying fundamental frequency in transient conditions. The paper also proposes a metric borrowed from fluid dynamics, namely, the time derivative of the circulation, to define the scope of validity of the QSS frequency. Analytical examples as well as a case study based on a fully-fledged EMT model of the IEEE 39-bus system serve to illustrate, respectively, the properties of the QSS frequency and its behavior in transient conditions.
Systems and Control 1
☆ Privacy-Preserving State Estimation with Crowd Sensors: An Information-Theoretic Respective
Privacy-preserving state estimation for linear time-invariant dynamical systems with crowd sensors is considered. At any time step, the estimator has access to measurements from a randomly selected sensor from a pool of sensors with pre-specified models and noise profiles. A Luenberger-like observer is used to fuse the measurements with the underlying model of the system to recursively generate the state estimates. An additive privacy-preserving noise is used to constrain information leakage. Information leakage is measured via mutual information between the identity of the sensors and the state estimate conditioned on the actual state of the system. This captures an omnipotent adversary that not only can access state estimates but can also gather direct high-quality state measurements. Any prescribed level of information leakage is shown to be achievable by appropriately selecting the variance of the privacy-preserving noise. Therefore, privacy-utility trade-off can be fine-tuned.
comment: Accepted for presentation at the 17th IEEE International Workshop on Information Forensics and Security (WIFS2025)