MyArxiv
Robotics 41
☆ Whole-Body Conditioned Egocentric Video Prediction
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model's embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.
comment: Project Page: https://dannytran123.github.io/PEVA
☆ SAM4D: Segment Anything in Camera and LiDAR Streams ICCV2025
We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space, enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction, and cross-modal masklet fusion. This framework generates camera-LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg, which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.
comment: Accepted by ICCV2025, Project Page: https://SAM4D-Project.github.io
☆ WorldVLA: Towards Autoregressive Action World Model
We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation. Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model. We demonstrate that WorldVLA outperforms standalone action and world models, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction, leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
comment: Code: https://github.com/alibaba-damo-academy/WorldVLA
☆ Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.
☆ EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting
Efficient three-dimensional reconstruction and real-time visualization are critical in surgical scenarios such as endoscopy. In recent years, 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in efficient 3D reconstruction and rendering. Most 3DGS-based Simultaneous Localization and Mapping (SLAM) methods only rely on the appearance constraints for optimizing both 3DGS and camera poses. However, in endoscopic scenarios, the challenges include photometric inconsistencies caused by non-Lambertian surfaces and dynamic motion from breathing affects the performance of SLAM systems. To address these issues, we additionally introduce optical flow loss as a geometric constraint, which effectively constrains both the 3D structure of the scene and the camera motion. Furthermore, we propose a depth regularisation strategy to mitigate the problem of photometric inconsistencies and ensure the validity of 3DGS depth rendering in endoscopic scenes. In addition, to improve scene representation in the SLAM system, we improve the 3DGS refinement strategy by focusing on viewpoints corresponding to Keyframes with suboptimal rendering quality frames, achieving better rendering results. Extensive experiments on the C3VD static dataset and the StereoMIS dynamic dataset demonstrate that our method outperforms existing state-of-the-art methods in novel view synthesis and pose estimation, exhibiting high performance in both static and dynamic surgical scenes. The source code will be publicly available upon paper acceptance.
☆ ToosiCubix: Monocular 3D Cuboid Labeling via Vehicle Part Annotations
Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per vehicle, making it highly practical for adding 3D annotations to existing datasets originally collected without specialized equipment. By annotating specific features (e.g., wheels, car badge, symmetries) across different vehicle parts, we accurately estimate each vehicle's position, orientation, and dimensions up to a scale ambiguity (8 DoF). The geometric constraints are formulated as an optimization problem, which we solve using a coordinate descent strategy, alternating between Perspective-n-Points (PnP) and least-squares subproblems. To handle common ambiguities such as scale and unobserved dimensions, we incorporate probabilistic size priors, enabling 9 DoF cuboid placements. We validate our annotations against the KITTI and Cityscapes3D datasets, demonstrating that our method offers a cost-effective and scalable solution for high-quality 3D cuboid annotation.
☆ Real-time Terrain Analysis for Off-road Autonomous Vehicles
This research addresses critical autonomous vehicle control challenges arising from road roughness variation, which induces course deviations and potential loss of road contact during steering operations. We present a novel real-time road roughness estimation system employing Bayesian calibration methodology that processes axle accelerations to predict terrain roughness with quantifiable confidence measures. The technical framework integrates a Gaussian process surrogate model with a simulated half-vehicle model, systematically processing vehicle velocity and road surface roughness parameters to generate corresponding axle acceleration responses. The Bayesian calibration routine performs inverse estimation of road roughness from observed accelerations and velocities, yielding posterior distributions that quantify prediction uncertainty for adaptive risk management. Training data generation utilizes Latin Hypercube sampling across comprehensive velocity and roughness parameter spaces, while the calibrated model integrates seamlessly with a Simplex controller architecture to dynamically adjust velocity limits based on real-time roughness predictions. Experimental validation on stochastically generated surfaces featuring varying roughness regions demonstrates robust real-time characterization capabilities, with the integrated Simplex control strategy effectively enhancing autonomous vehicle operational safety through proactive surface condition response. This innovative Bayesian framework establishes a comprehensive foundation for mitigating roughness-related operational risks while simultaneously improving efficiency and safety margins in autonomous vehicle systems.
☆ "Who Should I Believe?": User Interpretation and Decision-Making When a Family Healthcare Robot Contradicts Human Memory
Advancements in robotic capabilities for providing physical assistance, psychological support, and daily health management are making the deployment of intelligent healthcare robots in home environments increasingly feasible in the near future. However, challenges arise when the information provided by these robots contradicts users' memory, raising concerns about user trust and decision-making. This paper presents a study that examines how varying a robot's level of transparency and sociability influences user interpretation, decision-making and perceived trust when faced with conflicting information from a robot. In a 2 x 2 between-subjects online study, 176 participants watched videos of a Furhat robot acting as a family healthcare assistant and suggesting a fictional user to take medication at a different time from that remembered by the user. Results indicate that robot transparency influenced users' interpretation of information discrepancies: with a low transparency robot, the most frequent assumption was that the user had not correctly remembered the time, while with the high transparency robot, participants were more likely to attribute the discrepancy to external factors, such as a partner or another household member modifying the robot's information. Additionally, participants exhibited a tendency toward overtrust, often prioritizing the robot's recommendations over the user's memory, even when suspecting system malfunctions or third-party interference. These findings highlight the impact of transparency mechanisms in robotic systems, the complexity and importance associated with system access control for multi-user robots deployed in home environments, and the potential risks of users' over reliance on robots in sensitive domains such as healthcare.
comment: 8 pages
☆ Active Disturbance Rejection Control for Trajectory Tracking of a Seagoing USV: Design, Simulation, and Field Experiments IROS 2025
Unmanned Surface Vessels (USVs) face significant control challenges due to uncertain environmental disturbances like waves and currents. This paper proposes a trajectory tracking controller based on Active Disturbance Rejection Control (ADRC) implemented on the DUS V2500. A custom simulation incorporating realistic waves and current disturbances is developed to validate the controller's performance, supported by further validation through field tests in the harbour of Scheveningen, the Netherlands, and at sea. Simulation results demonstrate that ADRC significantly reduces cross-track error across all tested conditions compared to a baseline PID controller but increases control effort and energy consumption. Field trials confirm these findings while revealing a further increase in energy consumption during sea trials compared to the baseline.
comment: Accepted for presentation at IROS 2025. Submitted version
☆ ACTLLM: Action Consistency Tuned Large Language Model
This paper introduces ACTLLM (Action Consistency Tuned Large Language Model), a novel approach for robot manipulation in dynamic environments. Traditional vision-based systems often struggle to learn visual representations that excel in both task execution and spatial reasoning, thereby limiting their adaptability in dynamic environments. ACTLLM addresses these challenges by harnessing language to craft structured scene descriptors, providing a uniform interface for both spatial understanding and task performance through flexible language instructions. Moreover, we introduce a novel action consistency constraint that aligns visual perception with corresponding actions, thereby enhancing the learning of actionable visual representations. Additionally, we have reformulated the Markov decision process for manipulation tasks into a multi-turn visual dialogue framework. This approach enables the modeling of long-term task execution with enhanced contextual relevance derived from the history of task execution. During our evaluation, ACTLLM excels in diverse scenarios, proving its effectiveness on challenging vision-based robot manipulation tasks.
☆ Real-Time ESFP: Estimating, Smoothing, Filtering, and Pose-Mapping
This paper presents ESFP, an end-to-end pipeline that converts monocular RGB video into executable joint trajectories for a low-cost 4-DoF desktop arm. ESFP comprises four sequential modules. (1) Estimating: ROMP lifts each frame to a 24-joint 3-D skeleton. (2) Smoothing: the proposed HPSTM-a sequence-to-sequence Transformer with self-attention-combines long-range temporal context with a differentiable forward-kinematics decoder, enforcing constant bone lengths and anatomical plausibility while jointly predicting joint means and full covariances. (3) Filtering: root-normalized trajectories are variance-weighted according to HPSTM's uncertainty estimates, suppressing residual noise. (4) Pose-Mapping: a geometric retargeting layer transforms shoulder-elbow-wrist triples into the uArm's polar workspace, preserving wrist orientation.
☆ World-aware Planning Narratives Enhance Large Vision-Language Model Planner
Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather than visual reasoning during long-horizon interactions. In this work, we propose World-Aware Planning Narrative Enhancement (WAP), a framework that infuses LVLMs with comprehensive environmental understanding through four cognitive capabilities (visual appearance modeling, spatial reasoning, functional abstraction, and syntactic grounding) while developing and evaluating models using only raw visual observations through curriculum learning. Evaluations on the EB-ALFRED benchmark demonstrate substantial improvements, with Qwen2.5-VL achieving a 60.7 absolute improvement in task success rates, particularly in commonsense reasoning (+60.0) and long-horizon planning (+70.0). Notably, our enhanced open-source models outperform proprietary systems like GPT-4o and Claude-3.5-Sonnet by a large margin.
☆ Dynamic Risk-Aware MPPI for Mobile Robots in Crowds via Efficient Monte Carlo Approximations IROS 2025
Deploying mobile robots safely among humans requires the motion planner to account for the uncertainty in the other agents' predicted trajectories. This remains challenging in traditional approaches, especially with arbitrarily shaped predictions and real-time constraints. To address these challenges, we propose a Dynamic Risk-Aware Model Predictive Path Integral control (DRA-MPPI), a motion planner that incorporates uncertain future motions modelled with potentially non-Gaussian stochastic predictions. By leveraging MPPI's gradient-free nature, we propose a method that efficiently approximates the joint Collision Probability (CP) among multiple dynamic obstacles for several hundred sampled trajectories in real-time via a Monte Carlo (MC) approach. This enables the rejection of samples exceeding a predefined CP threshold or the integration of CP as a weighted objective within the navigation cost function. Consequently, DRA-MPPI mitigates the freezing robot problem while enhancing safety. Real-world and simulated experiments with multiple dynamic obstacles demonstrate DRA-MPPI's superior performance compared to state-of-the-art approaches, including Scenario-based Model Predictive Control (S-MPC), Frenet planner, and vanilla MPPI.
comment: Accepted for presentation at IROS 2025. Submitted Version
☆ Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation ICCV 2025
Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Free Occlusion-Aware Seamless Segmentation (SFOASS), and propose its first solution, called UNconstrained Learning Omni-Context Knowledge (UNLOCK). Specifically, UNLOCK includes two key modules: Omni Pseudo-Labeling Learning and Amodal-Driven Context Learning. While adapting without relying on source data or target labels, this framework enhances models to achieve segmentation with 360{\deg} viewpoint coverage and occlusion-aware reasoning. Furthermore, we benchmark the proposed SFOASS task through both real-to-real and synthetic-to-real adaptation settings. Experimental results show that our source-free method achieves performance comparable to source-dependent methods, yielding state-of-the-art scores of 10.9 in mAAP and 11.6 in mAP, along with an absolute improvement of +4.3 in mAPQ over the source-only method. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK.
comment: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK
☆ Out-of-Distribution Semantic Occupancy Prediction
3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space. To fill the gaps in the dataset, we propose a Synthetic Anomaly Integration Pipeline that injects synthetic anomalies while preserving realistic spatial and occlusion patterns, enabling the creation of two datasets: VAA-KITTI and VAA-KITTI-360. We introduce OccOoD, a novel framework integrating OoD detection into 3D semantic occupancy prediction, with Voxel-BEV Progressive Fusion (VBPF) leveraging an RWKV-based branch to enhance OoD detection via geometry-semantic fusion. Experimental results demonstrate that OccOoD achieves state-of-the-art OoD detection with an AuROC of 67.34% and an AuPRCr of 29.21% within a 1.2m region, while maintaining competitive occupancy prediction performance. The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD.
comment: The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD
☆ UAIbot: Beginner-friendly web-based simulator for interactive robotics learning and research
This paper presents UAIbot, a free and open-source web-based robotics simulator designed to address the educational and research challenges conventional simulation platforms generally face. The Python and JavaScript interfaces of UAIbot enable accessible hands-on learning experiences without cumbersome installations. By allowing users to explore fundamental mathematical and physical principles interactively, ranging from manipulator kinematics to pedestrian flow dynamics, UAIbot provides an effective tool for deepening student understanding, facilitating rapid experimentation, and enhancing research dissemination.
comment: 12 pages, 8 figures, submitted to Springer proceedings
GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction ICML 2025
Trajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context representations. We develop a feature adaptor to effectively aggregate lane-graph features into grid space, enabling seamless integration with the maximum entropy IRL paradigm to infer the reward distribution and obtain the policy that can be sampled to induce multiple plausible plans. Furthermore, conditioned on the sampled plans, we implement a hierarchical parameterized trajectory generator with a refinement module to enhance prediction accuracy and a probability fusion strategy to boost prediction confidence. Extensive experimental results showcase our approach not only achieves state-of-the-art performance on the large-scale Argoverse & nuScenes motion forecasting benchmarks but also exhibits superior generalization abilities compared to existing supervised models.
comment: Accepted by ICML 2025
☆ CURL-SLAM: Continuous and Compact LiDAR Mapping
This paper studies 3D LiDAR mapping with a focus on developing an updatable and localizable map representation that enables continuity, compactness and consistency in 3D maps. Traditional LiDAR Simultaneous Localization and Mapping (SLAM) systems often rely on 3D point cloud maps, which typically require extensive storage to preserve structural details in large-scale environments. In this paper, we propose a novel paradigm for LiDAR SLAM by leveraging the Continuous and Ultra-compact Representation of LiDAR (CURL) introduced in [1]. Our proposed LiDAR mapping approach, CURL-SLAM, produces compact 3D maps capable of continuous reconstruction at variable densities using CURL's spherical harmonics implicit encoding, and achieves global map consistency after loop closure. Unlike popular Iterative Closest Point (ICP)-based LiDAR odometry techniques, CURL-SLAM formulates LiDAR pose estimation as a unique optimization problem tailored for CURL and extends it to local Bundle Adjustment (BA), enabling simultaneous pose refinement and map correction. Experimental results demonstrate that CURL-SLAM achieves state-of-the-art 3D mapping quality and competitive LiDAR trajectory accuracy, delivering sensor-rate real-time performance (10 Hz) on a CPU. We will release the CURL-SLAM implementation to the community.
☆ Control of Marine Robots in the Era of Data-Driven Intelligence
The control of marine robots has long relied on model-based methods grounded in classical and modern control theory. However, the nonlinearity and uncertainties inherent in robot dynamics, coupled with the complexity of marine environments, have revealed the limitations of conventional control methods. The rapid evolution of machine learning has opened new avenues for incorporating data-driven intelligence into control strategies, prompting a paradigm shift in the control of marine robots. This paper provides a review of recent progress in marine robot control through the lens of this emerging paradigm. The review covers both individual and cooperative marine robotic systems, highlighting notable achievements in data-driven control of marine robots and summarizing open-source resources that support the development and validation of advanced control methods. Finally, several future perspectives are outlined to guide research toward achieving high-level autonomy for marine robots in real-world applications. This paper aims to serve as a roadmap toward the next-generation control framework of marine robots in the era of data-driven intelligence.
☆ Knowledge-Driven Imitation Learning: Enabling Generalization Across Diverse Conditions IROS 2025
Imitation learning has emerged as a powerful paradigm in robot manipulation, yet its generalization capability remains constrained by object-specific dependencies in limited expert demonstrations. To address this challenge, we propose knowledge-driven imitation learning, a framework that leverages external structural semantic knowledge to abstract object representations within the same category. We introduce a novel semantic keypoint graph as a knowledge template and develop a coarse-to-fine template-matching algorithm that optimizes both structural consistency and semantic similarity. Evaluated on three real-world robotic manipulation tasks, our method achieves superior performance, surpassing image-based diffusion policies with only one-quarter of the expert demonstrations. Extensive experiments further demonstrate its robustness across novel objects, backgrounds, and lighting conditions. This work pioneers a knowledge-driven approach to data-efficient robotic learning in real-world settings. Code and more materials are available on https://knowledge-driven.github.io/.
comment: IROS 2025
☆ V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling
Ensuring robust planning and decision-making under rare, diverse, and visually degraded long-tail scenarios remains a fundamental challenge for autonomous driving in urban environments. This issue becomes more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address this challenge, we propose V2X-REALM, a vision-language model (VLM)-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios. V2X-REALM introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions such as snow and fog across vehicle- and infrastructure-side views, enriching training diversity efficiently; (ii) a gated multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability. Extensive experiments demonstrate that V2X-REALM significantly outperforms existing baselines in robustness, semantic reasoning, safety, and planning accuracy under complex, challenging driving conditions, advancing the scalability of end-to-end cooperative autonomous driving.
☆ STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner
The ability to perform reliable long-horizon task planning is crucial for deploying robots in real-world environments. However, directly employing Large Language Models (LLMs) as action sequence generators often results in low success rates due to their limited reasoning ability for long-horizon embodied tasks. In the STEP framework, we construct a subgoal tree through a pair of closed-loop models: a subgoal decomposition model and a leaf node termination model. Within this framework, we develop a hierarchical tree structure that spans from coarse to fine resolutions. The subgoal decomposition model leverages a foundation LLM to break down complex goals into manageable subgoals, thereby spanning the subgoal tree. The leaf node termination model provides real-time feedback based on environmental states, determining when to terminate the tree spanning and ensuring each leaf node can be directly converted into a primitive action. Experiments conducted in both the VirtualHome WAH-NL benchmark and on real robots demonstrate that STEP achieves long-horizon embodied task completion with success rates up to 34% (WAH-NL) and 25% (real robot) outperforming SOTA methods.
☆ Fault-Tolerant Spacecraft Attitude Determination using State Estimation Techniques
The extended and unscented Kalman filter, and the particle filter provide a robust framework for fault-tolerant attitude estimation on spacecraft. This paper explores how each filter performs for a large satellite in a low earth orbit. Additionally, various techniques, built on these filters, for fault detection, isolation and recovery from erroneous sensor measurements, are analyzed. Key results from this analysis include filter performance for various fault modes.
comment: 8 pages, 19 figures
☆ Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers
Finding balanced ways to employ Large Language Models (LLMs) in education is a challenge due to inherent risks of poor understanding of the technology and of a susceptible audience. This is particularly so with younger children, who are known to have difficulties with pervasive screen time. Working with a tangible programming robot called Cubetto, we propose an approach to benefit from the capabilities of LLMs by employing such models in the preparation of personalised storytelling, necessary for preschool children to get accustomed to the practice of commanding the robot. We engage in action research to develop an early version of a formalised process to rapidly prototype game stories for Cubetto. Our approach has both reproducible results, because it employs open weight models, and is model-agnostic, because we test it with 5 different LLMs. We document on one hand the process, the used materials and prompts, and on the other the learning experience and outcomes. We deem the generation successful for the intended purposes of using the results as a teacher aid. Testing the models on 4 different task scenarios, we encounter issues of consistency and hallucinations and document the corresponding evaluation process and attempts (some successful and some not) to overcome these issues. Importantly, the process does not expose children to LLMs directly. Rather, the technology is used to help teachers easily develop personalised narratives on children's preferred topics. We believe our method is adequate for preschool classes and we are planning to further experiment in real-world educational settings.
comment: accepted at D-SAIL Workshop - Transformative Curriculum Design: Digitalization, Sustainability, and AI Literacy for 21st Century Learning
☆ ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation ICRA 2025
Autonomous systems rely on sensors to estimate the environment around them. However, cameras, LiDARs, and RADARs have their own limitations. In nighttime or degraded environments such as fog, mist, or dust, thermal cameras can provide valuable information regarding the presence of objects of interest due to their heat signature. They make it easy to identify humans and vehicles that are usually at higher temperatures compared to their surroundings. In this paper, we focus on the adaptation of thermal cameras for robotics and automation, where the biggest hurdle is the lack of data. Several multi-modal datasets are available for driving robotics research in tasks such as scene segmentation, object detection, and depth estimation, which are the cornerstone of autonomous systems. However, they are found to be lacking in thermal imagery. Our paper proposes a solution to augment these datasets with synthetic thermal data to enable widespread and rapid adaptation of thermal cameras. We explore the use of conditional diffusion models to convert existing RGB images to thermal images using self-attention to learn the thermal properties of real-world objects.
comment: Accepted at Thermal Infrared in Robotics (TIRO) Workshop, ICRA 2025
☆ Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends
Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation. Leveraging strengths of VLM in vision perception and instruction understanding, VLA models exhibit promising generalization across diverse manipulation tasks. However, applications demanding high precision and accuracy reveal performance gaps without further adaptation. Evidence from multiple domains highlights the critical role of post-training to align foundational models with downstream applications, spurring extensive research on post-training VLA models. VLA model post-training aims to address the challenge of improving an embodiment's ability to interact with the environment for the given tasks, analogous to the process of humans motor skills acquisition. Accordingly, this paper reviews post-training strategies for VLA models through the lens of human motor learning, focusing on three dimensions: environments, embodiments, and tasks. A structured taxonomy is introduced aligned with human learning mechanisms: (1) enhancing environmental perception, (2) improving embodiment awareness, (3) deepening task comprehension, and (4) multi-component integration. Finally, key challenges and trends in post-training VLA models are identified, establishing a conceptual framework to guide future research. This work delivers both a comprehensive overview of current VLA model post-training methods from a human motor learning perspective and practical insights for VLA model development. (Project website: https://github.com/AoqunJin/Awesome-VLA-Post-Training)
Cooperative Circumnavigation for Multi-Quadrotor Systems via Onboard Sensing
A cooperative circumnavigation framework is proposed for multi-quadrotor systems to enclose and track a moving target without reliance on external localization systems. The distinct relationships between quadrotor-quadrotor and quadrotor-target interactions are evaluated using a heterogeneous perception strategy and corresponding state estimation algorithms. A modified Kalman filter is developed to fuse visual-inertial odometry with range measurements to enhance the accuracy of inter-quadrotor relative localization. An event-triggered distributed Kalman filter is designed to achieve robust target state estimation under visual occlusion by incorporating neighbor measurements and estimated inter-quadrotor relative positions. Using the estimation results, a cooperative circumnavigation controller is constructed, leveraging an oscillator-based autonomous formation flight strategy. We conduct extensive indoor and outdoor experiments to validate the efficiency of the proposed circumnavigation framework in occluded environments. Furthermore, a quadrotor failure experiment highlights the inherent fault tolerance property of the proposed framework, underscoring its potential for deployment in search-and-rescue operations.
comment: 8 Pages, 7 figures. Accepted by RA-L
☆ Effect of Haptic Feedback on Avoidance Behavior and Visual Exploration in Dynamic VR Pedestrian Environment
Human crowd simulation in virtual reality (VR) is a powerful tool with potential applications including emergency evacuation training and assessment of building layout. While haptic feedback in VR enhances immersive experience, its effect on walking behavior in dense and dynamic pedestrian flows is unknown. Through a user study, we investigated how haptic feedback changes user walking motion in crowded pedestrian flows in VR. The results indicate that haptic feedback changed users' collision avoidance movements, as measured by increased walking trajectory length and change in pelvis angle. The displacements of users' lateral position and pelvis angle were also increased in the instantaneous response to a collision with a non-player character (NPC), even when the NPC was inside the field of view. Haptic feedback also enhanced users' awareness and visual exploration when an NPC approached from the side and back. Furthermore, variation in walking speed was increased by the haptic feedback. These results suggested that the haptic feedback enhanced users' sensitivity to a collision in VR environment.
♻ ☆ ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG
Surface electromyography (sEMG) signals show promise for effective human-computer interfaces, particularly in rehabilitation and prosthetics. However, challenges remain in developing systems that respond quickly and reliably to user intent, across different subjects and without requiring time-consuming calibration. In this work, we propose a framework for EMG-based intent detection that addresses these challenges. Unlike traditional gesture recognition models that wait until a gesture is completed before classifying it, our approach uses a segmentation strategy to assign intent labels at every timestep as the gesture unfolds. We introduce a novel masked modeling strategy that aligns muscle activations with their corresponding user intents, enabling rapid onset detection and stable tracking of ongoing gestures. In evaluations against baseline methods, considering both accuracy and stability for device control, our approach surpasses state-of-the-art performance in zero-shot transfer conditions, demonstrating its potential for wearable robotics and next-generation prosthetic systems. Our project page is available at: https://reactemg.github.io
♻ ☆ Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception IROS 2025
Deep object pose estimators are notoriously overconfident. A grasping agent that both estimates the 6-DoF pose of a target object and predicts the uncertainty of its own estimate could avoid task failure by choosing not to act under high uncertainty. Even though object pose estimation improves and uncertainty quantification research continues to make strides, few studies have connected them to the downstream task of robotic grasping. We propose a method for training lightweight, deep networks to predict whether a grasp guided by an image-based pose estimate will succeed before that grasp is attempted. We generate training data for our networks via object pose estimation on real images and simulated grasping. We also find that, despite high object variability in grasping trials, networks benefit from training on all objects jointly, suggesting that a diverse variety of objects can nevertheless contribute to the same goal.
comment: Accepted to IROS 2025
♻ ☆ Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Controlling robots through natural language is pivotal for enhancing human-robot collaboration and synthesizing complex robot behaviors. Recent works that are trained on large robot datasets show impressive generalization abilities. However, such pretrained methods are (1) often fragile to unseen scenarios, and (2) expensive to adapt to new tasks. This paper introduces Grounded Equivariant Manipulation (GEM), a robust yet efficient approach that leverages pretrained vision-language models with equivariant language mapping for language-conditioned manipulation tasks. Our experiments demonstrate GEM's high sample efficiency and generalization ability across diverse tasks in both simulation and the real world. GEM achieves similar or higher performance with orders of magnitude fewer robot data compared with major data-efficient baselines such as CLIPort and VIMA. Finally, our approach demonstrates greater robustness compared to large VLA model, e.g, OpenVLA, at correctly interpreting natural language commands on unseen objects and poses. Code, data, and training details are available https://saulbatman.github.io/gem_page/
♻ ☆ 3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors IROS 2025
Crop yield estimation is a relevant problem in agriculture, because an accurate yield estimate can support farmers' decisions on harvesting or precision intervention. Robots can help to automate this process. To do so, they need to be able to perceive the surrounding environment to identify target objects such as trees and plants. In this paper, we introduce a novel approach to address the problem of hierarchical panoptic segmentation of apple orchards on 3D data from different sensors. Our approach is able to simultaneously provide semantic segmentation, instance segmentation of trunks and fruits, and instance segmentation of trees (a trunk with its fruits). This allows us to identify relevant information such as individual plants, fruits, and trunks, and capture the relationship among them, such as precisely estimate the number of fruits associated to each tree in an orchard. To efficiently evaluate our approach for hierarchical panoptic segmentation, we provide a dataset designed specifically for this task. Our dataset is recorded in Bonn, Germany, in a real apple orchard with a variety of sensors, spanning from a terrestrial laser scanner to a RGB-D camera mounted on different robots platforms. The experiments show that our approach surpasses state-of-the-art approaches in 3D panoptic segmentation in the agricultural domain, while also providing full hierarchical panoptic segmentation. Our dataset is publicly available at https://www.ipb.uni-bonn.de/data/hops/. The open-source implementation of our approach is available at https://github.com/PRBonn/hapt3D.
comment: Accepted to IROS 2025
♻ ☆ Rapid Gyroscope Calibration: A Deep Learning Approach
Low-cost gyroscope calibration is essential for ensuring the accuracy and reliability of gyroscope measurements. Stationary calibration estimates the deterministic parts of measurement errors. To this end, a common practice is to average the gyroscope readings during a predefined period and estimate the gyroscope bias. Calibration duration plays a crucial role in performance, therefore, longer periods are preferred. However, some applications require quick startup times and calibration is therefore allowed only for a short time. In this work, we focus on reducing low-cost gyroscope calibration time using deep learning methods. We propose an end-to-end convolutional neural network for the application of gyroscope calibration. We explore the possibilities of using multiple real and virtual gyroscopes to improve the calibration performance of single gyroscopes. To train and validate our approach, we recorded a dataset consisting of 186.6 hours of gyroscope readings, using 36 gyroscopes of four different brands. We also created a virtual dataset consisting of simulated gyroscope readings. The six datasets were used to evaluate our proposed approach. One of our key achievements in this work is reducing gyroscope calibration time by up to 89% using three low-cost gyroscopes. Our dataset is publicly available to allow reproducibility of our work and to increase research in the field.
comment: 10 Pages, 14 Figures
♻ ☆ PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp
The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.
RAMBO: RL-augmented Model-based Whole-body Control for Loco-manipulation
Loco-manipulation, physical interaction of various objects that is concurrently coordinated with locomotion, remains a major challenge for legged robots due to the need for both precise end-effector control and robustness to unmodeled dynamics. While model-based controllers provide precise planning via online optimization, they are limited by model inaccuracies. In contrast, learning-based methods offer robustness, but they struggle with precise modulation of interaction forces. We introduce RAMBO, a hybrid framework that integrates model-based whole-body control within a feedback policy trained with reinforcement learning. The model-based module generates feedforward torques by solving a quadratic program, while the policy provides feedback corrective terms to enhance robustness. We validate our framework on a quadruped robot across a diverse set of real-world loco-manipulation tasks, such as pushing a shopping cart, balancing a plate, and holding soft objects, in both quadrupedal and bipedal walking. Our experiments demonstrate that RAMBO enables precise manipulation capabilities while achieving robust and dynamic locomotion.
♻ ☆ The Starlink Robot: A Platform and Dataset for Mobile Satellite Communication
The integration of satellite communication into mobile devices represents a paradigm shift in connectivity, yet the performance characteristics under motion and environmental occlusion remain poorly understood. We present the Starlink Robot, the first mobile robotic platform equipped with Starlink satellite internet, comprehensive sensor suite including upward-facing camera, LiDAR, and IMU, designed to systematically study satellite communication performance during movement. Our multi-modal dataset captures synchronized communication metrics, motion dynamics, sky visibility, and 3D environmental context across diverse scenarios including steady-state motion, variable speeds, and different occlusion conditions. This platform and dataset enable researchers to develop motion-aware communication protocols, predict connectivity disruptions, and optimize satellite communication for emerging mobile applications from smartphones to autonomous vehicles. The project is available at https://github.com/StarlinkRobot.
♻ ☆ CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. Videos and additional materials can be found on the project page: https://amrl.cs.utexas.edu/creste
comment: 18 pages, 10 figures, 5 tables
♻ ☆ Finding the Easy Way Through -- the Probabilistic Gap Planner for Social Robot Navigation
In Social Robot Navigation, autonomous agents need to resolve many sequential interactions with other agents. State-of-the art planners can efficiently resolve the next, imminent interaction cooperatively and do not focus on longer planning horizons. This makes it hard to maneuver scenarios where the agent needs to select a good strategy to find gaps or channels in the crowd. We propose to decompose trajectory planning into two separate steps: Conflict avoidance for finding good, macroscopic trajectories, and cooperative collision avoidance (CCA) for resolving the next interaction optimally. We propose the Probabilistic Gap Planner (PGP) as a conflict avoidance planner. PGP modifies an established probabilistic collision risk model to include a general assumption of cooperativity. PGP biases the short-term CCA planner to head towards gaps in the crowd. In extensive simulations with crowds of varying density, we show that using PGP in addition to state-of-the-art CCA planners improves the agents' performance: On average, agents keep more space to others, create less tension, and cause fewer collisions. This typically comes at the expense of slightly longer paths. PGP runs in real-time on WaPOCHI mobile robot by Honda R&D.
♻ ☆ IMPACT: Behavioral Intention-aware Multimodal Trajectory Prediction with Adaptive Context Trimming
While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, overtaking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reducing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community's efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle's future trajectory. By leveraging these intention and occupancy prediction priors, our method conducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves first place on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the soft mean average precision (softmAP) by 10 percent compared to the second-best method in the Waymo Interactive Prediction Leaderboard. Furthermore, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real-world applications.
comment: under review
♻ ☆ EFEAR-4D: Ego-Velocity Filtering for Efficient and Accurate 4D radar Odometry
Odometry is a crucial component for successfully implementing autonomous navigation, relying on sensors such as cameras, LiDARs and IMUs. However, these sensors may encounter challenges in extreme weather conditions, such as snowfall and fog. The emergence of FMCW radar technology offers the potential for robust perception in adverse conditions. As the latest generation of FWCW radars, the 4D mmWave radar provides point cloud with range, azimuth, elevation, and Doppler velocity information, despite inherent sparsity and noises in the point cloud. In this paper, we propose EFEAR-4D, an accurate, highly efficient, and learning-free method for large-scale 4D radar odometry estimation. EFEAR-4D exploits Doppler velocity information delicately for robust ego-velocity estimation, resulting in a highly accurate prior guess. EFEAR-4D maintains robustness against point-cloud sparsity and noises across diverse environments through dynamic object removal and effective region-wise feature extraction. Extensive experiments on two publicly available 4D radar datasets demonstrate state-of-the-art reliability and localization accuracy of EFEAR-4D under various conditions. Furthermore, we have collected a dataset following the same route but varying installation heights of the 4D radar, emphasizing the significant impact of radar height on point cloud quality - a crucial consideration for real-world deployments. Our algorithm and dataset will be available soon at https://github.com/CLASS-Lab/EFEAR-4D.
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.
Computer Vision and Pattern Recognition 150
☆ Whole-Body Conditioned Egocentric Video Prediction
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model's embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.
comment: Project Page: https://dannytran123.github.io/PEVA
☆ SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
We propose SiM3D, the first benchmark considering the integration of multiview and multimodal information for comprehensive 3D anomaly detection and segmentation (ADS), where the task is to produce a voxel-based Anomaly Volume. Moreover, SiM3D focuses on a scenario of high interest in manufacturing: single-instance anomaly detection, where only one object, either real or synthetic, is available for training. In this respect, SiM3D stands out as the first ADS benchmark that addresses the challenge of generalising from synthetic training data to real test data. SiM3D includes a novel multimodal multiview dataset acquired using top-tier industrial sensors and robots. The dataset features multiview high-resolution images (12 Mpx) and point clouds (7M points) for 333 instances of eight types of objects, alongside a CAD model for each type. We also provide manually annotated 3D segmentation GTs for anomalous test samples. To establish reference baselines for the proposed multiview 3D ADS task, we adapt prominent singleview methods and assess their performance using novel metrics that operate on Anomaly Volumes.
☆ SAM4D: Segment Anything in Camera and LiDAR Streams ICCV2025
We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space, enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction, and cross-modal masklet fusion. This framework generates camera-LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg, which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.
comment: Accepted by ICCV2025, Project Page: https://SAM4D-Project.github.io
☆ HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.
comment: Project webpage: https://plan-lab.github.io/hallusegbench/
☆ DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion
Reconstructing 3D objects from a single image is a long-standing challenge, especially under real-world occlusions. While recent diffusion-based view synthesis models can generate consistent novel views from a single RGB image, they generally assume fully visible inputs and fail when parts of the object are occluded. This leads to inconsistent views and degraded 3D reconstruction quality. To overcome this limitation, we propose an end-to-end framework for occlusion-aware multi-view generation. Our method directly synthesizes six structurally consistent novel views from a single partially occluded image, enabling downstream 3D reconstruction without requiring prior inpainting or manual annotations. We construct a self-supervised training pipeline using the Pix2Gestalt dataset, leveraging occluded-unoccluded image pairs and pseudo-ground-truth views to teach the model structure-aware completion and view consistency. Without modifying the original architecture, we fully fine-tune the view synthesis model to jointly learn completion and multi-view generation. Additionally, we introduce the first benchmark for occlusion-aware reconstruction, encompassing diverse occlusion levels, object categories, and mask patterns. This benchmark provides a standardized protocol for evaluating future methods under partial occlusions. Our code is available at https://github.com/Quyans/DeOcc123.
☆ StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning ICCV 2025
Recently, Mamba-based methods have demonstrated impressive performance in point cloud representation learning by leveraging State Space Model (SSM) with the efficient context modeling ability and linear complexity. However, these methods still face two key issues that limit the potential of SSM: Destroying the adjacency of 3D points during SSM processing and failing to retain long-sequence memory as the input length increases in downstream tasks. To address these issues, we propose StruMamba3D, a novel paradigm for self-supervised point cloud representation learning. It enjoys several merits. First, we design spatial states and use them as proxies to preserve spatial dependencies among points. Second, we enhance the SSM with a state-wise update strategy and incorporate a lightweight convolution to facilitate interactions between spatial states for efficient structure modeling. Third, our method reduces the sensitivity of pre-trained Mamba-based models to varying input lengths by introducing a sequence length-adaptive strategy. Experimental results across four downstream tasks showcase the superior performance of our method. In addition, our method attains the SOTA 95.1% accuracy on ModelNet40 and 92.75% accuracy on the most challenging split of ScanObjectNN without voting strategy.
comment: Accepted by ICCV 2025
☆ Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.
comment: Accepted at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025 Main)
☆ ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers ICCV
Research in quantum machine learning has recently proliferated due to the potential of quantum computing to accelerate machine learning. An area of machine learning that has not yet been explored is neural ordinary differential equation (neural ODE) based residual neural networks (ResNets), which aim to improve the effectiveness of neural networks using the principles of ordinary differential equations. In this work, we present our insights about why analog Rydberg atom quantum computers are especially well-suited for ResNets. We also introduce ResQ, a novel framework to optimize the dynamics of Rydberg atom quantum computers to solve classification problems in machine learning using analog quantum neural ODEs.
comment: ResQ will appear in the Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2025
☆ Exploring the Design Space of 3D MLLMs for CT Report Generation
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution
☆ WAFT: Warping-Alone Field Transforms for Optical Flow
We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance. Code and model weights are available at https://github.com/princeton-vl/WAFT.
☆ MADrive: Memory-Augmented Driving Scene Modeling
Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive, a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars, a curated dataset of ${\sim}70$K 360{\deg} car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the target scene through orientation alignment and relighting. The resulting replacements provide complete multi-view representations of vehicles in the scene, enabling photorealistic synthesis of substantially altered configurations, as demonstrated in our experiments. Project page: https://yandex-research.github.io/madrive/
☆ G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation ICCV 2025
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G$^{2}$D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G$^{2}$D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G$^{2}$D on multiple real-world datasets and show that G$^{2}$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available at https://github.com/rAIson-Lab/G2D.
comment: Accepted at ICCV 2025
☆ GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation ICCV 2025
Creating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue lies in the lack of sufficient 3D priors, which limits the extrapolation capabilities of synthesized talking heads. To address this, we propose GGTalker, which synthesizes talking heads through a combination of generalizable priors and identity-specific adaptation. We introduce a two-stage Prior-Adaptation training strategy to learn Gaussian head priors and adapt to individual characteristics. We train Audio-Expression and Expression-Visual priors to capture the universal patterns of lip movements and the general distribution of head textures. During the Customized Adaptation, individual speaking styles and texture details are precisely modeled. Additionally, we introduce a color MLP to generate fine-grained, motion-aligned textures and a Body Inpainter to blend rendered results with the background, producing indistinguishable, photorealistic video frames. Comprehensive experiments show that GGTalker achieves state-of-the-art performance in rendering quality, 3D consistency, lip-sync accuracy, and training efficiency.
comment: ICCV 2025, Project page: https://vincenthu19.github.io/GGTalker/
☆ Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration
Large Vision-Language Models (LVLMs) have demonstrated significant advancements in multimodal understanding, yet they are frequently hampered by hallucination-the generation of text that contradicts visual input. Existing training-free decoding strategies exhibit critical limitations, including the use of static constraints that do not adapt to semantic drift during generation, inefficiency stemming from the need for multiple forward passes, and degradation of detail due to overly rigid intervention rules. To overcome these challenges, this paper introduces Dynamic Logits Calibration (DLC), a novel training-free decoding framework designed to dynamically align text generation with visual evidence at inference time. At the decoding phase, DLC step-wise employs CLIP to assess the semantic alignment between the input image and the generated text sequence. Then, the Relative Visual Advantage (RVA) of candidate tokens is evaluated against a dynamically updated contextual baseline, adaptively adjusting output logits to favor tokens that are visually grounded. Furthermore, an adaptive weighting mechanism, informed by a real-time context alignment score, carefully balances the visual guidance while ensuring the overall quality of the textual output. Extensive experiments conducted across diverse benchmarks and various LVLM architectures (such as LLaVA, InstructBLIP, and MiniGPT-4) demonstrate that DLC significantly reduces hallucinations, outperforming current methods while maintaining high inference efficiency by avoiding multiple forward passes. Overall, we present an effective and efficient decoding-time solution to mitigate hallucinations, thereby enhancing the reliability of LVLMs for more practices. Code will be released on Github.
☆ Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising
Ultrasound Coherent Plane Wave Compounding (CPWC) enhances image contrast by combining echoes from multiple steered transmissions. While increasing the number of angles generally improves image quality, it drastically reduces the frame rate and can introduce blurring artifacts in fast-moving targets. Moreover, compounded images remain susceptible to noise, particularly when acquired with a limited number of transmissions. We propose a zero-shot denoising framework tailored for low-angle CPWC acquisitions, which enhances contrast without relying on a separate training dataset. The method divides the available transmission angles into two disjoint subsets, each used to form compound images that include higher noise levels. The new compounded images are then used to train a deep model via a self-supervised residual learning scheme, enabling it to suppress incoherent noise while preserving anatomical structures. Because angle-dependent artifacts vary between the subsets while the underlying tissue response is similar, this physics-informed pairing allows the network to learn to disentangle the inconsistent artifacts from the consistent tissue signal. Unlike supervised methods, our model requires no domain-specific fine-tuning or paired data, making it adaptable across anatomical regions and acquisition setups. The entire pipeline supports efficient training with low computational cost due to the use of a lightweight architecture, which comprises only two convolutional layers. Evaluations on simulation, phantom, and in vivo data demonstrate superior contrast enhancement and structure preservation compared to both classical and deep learning-based denoising methods.
☆ Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model's uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.
comment: 15 pages, 4 figures, 3 tables
☆ TITAN: Query-Token based Domain Adaptive Adversarial Learning ICCV 2025
We focus on the source-free domain adaptive object detection (SF-DAOD) problem when source data is unavailable during adaptation and the model must adapt to an unlabeled target domain. The majority of approaches for the problem employ a self-supervised approach using a student-teacher (ST) framework where pseudo-labels are generated via a source-pretrained model for further fine-tuning. We observe that the performance of a student model often degrades drastically, due to the collapse of the teacher model, primarily caused by high noise in pseudo-labels, resulting from domain bias, discrepancies, and a significant domain shift across domains. To obtain reliable pseudo-labels, we propose a Target-based Iterative Query-Token Adversarial Network (TITAN), which separates the target images into two subsets: those similar to the source (easy) and those dissimilar (hard). We propose a strategy to estimate variance to partition the target domain. This approach leverages the insight that higher detection variances correspond to higher recall and greater similarity to the source domain. Also, we incorporate query-token-based adversarial modules into a student-teacher baseline framework to reduce the domain gaps between two feature representations. Experiments conducted on four natural imaging datasets and two challenging medical datasets have substantiated the superior performance of TITAN compared to existing state-of-the-art (SOTA) methodologies. We report an mAP improvement of +22.7, +22.2, +21.1, and +3.7 percent over the current SOTA on C2F, C2B, S2C, and K2C benchmarks, respectively.
comment: ICCV 2025
☆ Global and Local Entailment Learning for Natural World Imagery ICCV 2025
Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Radial Cross-Modal Embeddings (RCME), a framework that enables the explicit modeling of transitivity-enforced entailment. Our proposed framework optimizes for the partial order of concepts within vision-language models. By leveraging our framework, we develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life. Our experiments on hierarchical species classification and hierarchical retrieval tasks demonstrate the enhanced performance of our models compared to the existing state-of-the-art models. Our code and models are open-sourced at https://vishu26.github.io/RCME/index.html.
comment: Accepted at ICCV 2025
☆ Logios : An open source Greek Polytonic Optical Character Recognition system
In this paper, we present an Optical Character Recognition (OCR) system specifically designed for the accurate recognition and digitization of Greek polytonic texts. By leveraging the combined strengths of convolutional layers for feature extraction and recurrent layers for sequence learning, our system addresses the unique challenges posed by Greek polytonic scripts. This approach aims to overcome the limitations of traditional OCR methods, offering significant improvements in accuracy and efficiency. We release the underlying model as an open-source library and make our OCR platform available for academic use.
☆ Evaluation of Traffic Signals for Daily Traffic Pattern
The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones.
☆ Spatial Mental Modeling from Limited Views
Can Vision Language Models (VLMs) imagine the full scene from just a few views, like humans do? Humans form spatial mental models, internal representations of unseen space, to reason about layout, perspective, and motion. Our new MindCube benchmark with 21,154 questions across 3,268 images exposes this critical gap, where existing VLMs exhibit near-random performance. Using MindCube, we systematically evaluate how well VLMs build robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for "what-if" movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps. The significant improvement comes from a synergistic approach, "map-then-reason", that jointly trains the model to first generate a cognitive map and then reason upon it. By training models to reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of unobservable space.
comment: Preprint version
☆ Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency
Classifier-free guidance (CFG) succeeds in condition diffusion models that use a guidance scale to balance the influence of conditional and unconditional terms. A high guidance scale is used to enhance the performance of the conditional term. However, the high guidance scale often results in oversaturation and unrealistic artifacts. In this paper, we introduce a new perspective based on low-frequency signals, identifying the accumulation of redundant information in these signals as the key factor behind oversaturation and unrealistic artifacts. Building on this insight, we propose low-frequency improved classifier-free guidance (LF-CFG) to mitigate these issues. Specifically, we introduce an adaptive threshold-based measurement to pinpoint the locations of redundant information. We determine a reasonable threshold by analyzing the change rate of low-frequency information between prior and current steps. We then apply a down-weight strategy to reduce the impact of redundant information in the low-frequency signals. Experimental results demonstrate that LF-CFG effectively alleviates oversaturation and unrealistic artifacts across various diffusion models, including Stable Diffusion-XL, Stable Diffusion 2.1, 3.0, 3.5, and SiT-XL.
☆ A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.
☆ ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging. Like professionals in the creative industries, such generation requires sophisticated reasoning about items such as visual dynamics, acoustic environments, and temporal relationships. We present \textbf{ThinkSound}, a novel framework that leverages Chain-of-Thought (CoT) reasoning to enable stepwise, interactive audio generation and editing for videos. Our approach decomposes the process into three complementary stages: foundational foley generation that creates semantically coherent soundscapes, interactive object-centric refinement through precise user interactions, and targeted editing guided by natural language instructions. At each stage, a multimodal large language model generates contextually aligned CoT reasoning that guides a unified audio foundation model. Furthermore, we introduce \textbf{AudioCoT}, a comprehensive dataset with structured reasoning annotations that establishes connections between visual content, textual descriptions, and sound synthesis. Experiments demonstrate that ThinkSound achieves state-of-the-art performance in video-to-audio generation across both audio metrics and CoT metrics and excels in out-of-distribution Movie Gen Audio benchmark. The demo page is available at https://ThinkSound-Demo.github.io.
☆ Controllable 3D Placement of Objects with Scene-Aware Diffusion Models
Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object masks, is sufficient for high quality object placement. We design a conditioning signal that resolves ambiguities, while being flexible enough to allow for changing of shapes or object orientations. By building on an inpainting model, we leave the background intact by design, in contrast to methods that model objects and background jointly. We demonstrate the effectiveness of our method in the automotive setting, where we compare different conditioning signals in novel object placement tasks. These tasks are designed to measure edit quality not only in terms of appearance, but also in terms of pose and location accuracy, including cases that require non-trivial shape changes. Lastly, we show that fine location control can be combined with appearance control to place existing objects in precise locations in a scene.
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation
Atypical mitoses mark a deviation in the cell division process that can be an independent prognostically relevant marker for tumor malignancy. However, their identification remains challenging due to low prevalence, at times subtle morphological differences from normal mitoses, low inter-rater agreement among pathologists, and class imbalance in datasets. Building on the Atypical Mitosis dataset for Breast Cancer (AMi-Br), this study presents a comprehensive benchmark comparing deep learning approaches for automated atypical mitotic figure (AMF) classification, including baseline models, foundation models with linear probing, and foundation models fine-tuned with low-rank adaptation (LoRA). For rigorous evaluation, we further introduce two new hold-out AMF datasets - AtNorM-Br, a dataset of mitoses from the The TCGA breast cancer cohort, and AtNorM-MD, a multi-domain dataset of mitoses from the MIDOG++ training set. We found average balanced accuracy values of up to 0.8135, 0.7696, and 0.7705 on the in-domain AMi-Br and the out-of-domain AtNorm-Br and AtNorM-MD datasets, respectively, with the results being particularly good for LoRA-based adaptation of the Virchow-line of foundation models. Our work shows that atypical mitosis classification, while being a challenging problem, can be effectively addressed through the use of recent advances in transfer learning and model fine-tuning techniques. We make available all code and data used in this paper in this github repository: https://github.com/DeepMicroscopy/AMi-Br_Benchmark.
☆ HyperSORT: Self-Organising Robust Training with hyper-networks
Medical imaging datasets often contain heterogeneous biases ranging from erroneous labels to inconsistent labeling styles. Such biases can negatively impact deep segmentation networks performance. Yet, the identification and characterization of such biases is a particularly tedious and challenging task. In this paper, we introduce HyperSORT, a framework using a hyper-network predicting UNets' parameters from latent vectors representing both the image and annotation variability. The hyper-network parameters and the latent vector collection corresponding to each data sample from the training set are jointly learned. Hence, instead of optimizing a single neural network to fit a dataset, HyperSORT learns a complex distribution of UNet parameters where low density areas can capture noise-specific patterns while larger modes robustly segment organs in differentiated but meaningful manners. We validate our method on two 3D abdominal CT public datasets: first a synthetically perturbed version of the AMOS dataset, and TotalSegmentator, a large scale dataset containing real unknown biases and errors. Our experiments show that HyperSORT creates a structured mapping of the dataset allowing the identification of relevant systematic biases and erroneous samples. Latent space clusters yield UNet parameters performing the segmentation task in accordance with the underlying learned systematic bias. The code and our analysis of the TotalSegmentator dataset are made available: https://github.com/ImFusionGmbH/HyperSORT
comment: Accepted at MICCAI 2025
☆ EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting
Efficient three-dimensional reconstruction and real-time visualization are critical in surgical scenarios such as endoscopy. In recent years, 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in efficient 3D reconstruction and rendering. Most 3DGS-based Simultaneous Localization and Mapping (SLAM) methods only rely on the appearance constraints for optimizing both 3DGS and camera poses. However, in endoscopic scenarios, the challenges include photometric inconsistencies caused by non-Lambertian surfaces and dynamic motion from breathing affects the performance of SLAM systems. To address these issues, we additionally introduce optical flow loss as a geometric constraint, which effectively constrains both the 3D structure of the scene and the camera motion. Furthermore, we propose a depth regularisation strategy to mitigate the problem of photometric inconsistencies and ensure the validity of 3DGS depth rendering in endoscopic scenes. In addition, to improve scene representation in the SLAM system, we improve the 3DGS refinement strategy by focusing on viewpoints corresponding to Keyframes with suboptimal rendering quality frames, achieving better rendering results. Extensive experiments on the C3VD static dataset and the StereoMIS dynamic dataset demonstrate that our method outperforms existing state-of-the-art methods in novel view synthesis and pose estimation, exhibiting high performance in both static and dynamic surgical scenes. The source code will be publicly available upon paper acceptance.
☆ XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Achieving fine-grained control over subject identity and semantic attributes (pose, style, lighting) in text-to-image generation, particularly for multiple subjects, often undermines the editability and coherence of Diffusion Transformers (DiTs). Many approaches introduce artifacts or suffer from attribute entanglement. To overcome these challenges, we propose a novel multi-subject controlled generation model XVerse. By transforming reference images into offsets for token-specific text-stream modulation, XVerse allows for precise and independent control for specific subject without disrupting image latents or features. Consequently, XVerse offers high-fidelity, editable multi-subject image synthesis with robust control over individual subject characteristics and semantic attributes. This advancement significantly improves personalized and complex scene generation capabilities.
comment: Project Page: https://bytedance.github.io/XVerse Github Link: https://github.com/bytedance/XVerse
☆ Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction ICCV 2025
This paper presents an end-to-end framework for reconstructing 3D parametric curves directly from multi-view edge maps. Contrasting with existing two-stage methods that follow a sequential ``edge point cloud reconstruction and parametric curve fitting'' pipeline, our one-stage approach optimizes 3D parametric curves directly from 2D edge maps, eliminating error accumulation caused by the inherent optimization gap between disconnected stages. However, parametric curves inherently lack suitability for rendering-based multi-view optimization, necessitating a complementary representation that preserves their geometric properties while enabling differentiable rendering. We propose a novel bi-directional coupling mechanism between parametric curves and edge-oriented Gaussian components. This tight correspondence formulates a curve-aware Gaussian representation, \textbf{CurveGaussian}, that enables differentiable rendering of 3D curves, allowing direct optimization guided by multi-view evidence. Furthermore, we introduce a dynamically adaptive topology optimization framework during training to refine curve structures through linearization, merging, splitting, and pruning operations. Comprehensive evaluations on the ABC dataset and real-world benchmarks demonstrate our one-stage method's superiority over two-stage alternatives, particularly in producing cleaner and more robust reconstructions. Additionally, by directly optimizing parametric curves, our method significantly reduces the parameter count during training, achieving both higher efficiency and superior performance compared to existing approaches.
comment: Code: https://github.com/zhirui-gao/Curve-Gaussian Accepted by ICCV 2025
☆ FastRef:Fast Prototype Refinement for Few-Shot Industrial Anomaly Detection
Few-shot industrial anomaly detection (FS-IAD) presents a critical challenge for practical automated inspection systems operating in data-scarce environments. While existing approaches predominantly focus on deriving prototypes from limited normal samples, they typically neglect to systematically incorporate query image statistics to enhance prototype representativeness. To address this issue, we propose FastRef, a novel and efficient prototype refinement framework for FS-IAD. Our method operates through an iterative two-stage process: (1) characteristic transfer from query features to prototypes via an optimizable transformation matrix, and (2) anomaly suppression through prototype alignment. The characteristic transfer is achieved through linear reconstruction of query features from prototypes, while the anomaly suppression addresses a key observation in FS-IAD that unlike conventional IAD with abundant normal prototypes, the limited-sample setting makes anomaly reconstruction more probable. Therefore, we employ optimal transport (OT) for non-Gaussian sampled features to measure and minimize the gap between prototypes and their refined counterparts for anomaly suppression. For comprehensive evaluation, we integrate FastRef with three competitive prototype-based FS-IAD methods: PatchCore, FastRecon, WinCLIP, and AnomalyDINO. Extensive experiments across four benchmark datasets of MVTec, ViSA, MPDD and RealIAD demonstrate both the effectiveness and computational efficiency of our approach under 1/2/4-shots.
comment: 18pages, 7figures, 6tables
☆ GenFlow: Interactive Modular System for Image Generation
Generative art unlocks boundless creative possibilities, yet its full potential remains untapped due to the technical expertise required for advanced architectural concepts and computational workflows. To bridge this gap, we present GenFlow, a novel modular framework that empowers users of all skill levels to generate images with precision and ease. Featuring a node-based editor for seamless customization and an intelligent assistant powered by natural language processing, GenFlow transforms the complexity of workflow creation into an intuitive and accessible experience. By automating deployment processes and minimizing technical barriers, our framework makes cutting-edge generative art tools available to everyone. A user study demonstrated GenFlow's ability to optimize workflows, reduce task completion times, and enhance user understanding through its intuitive interface and adaptive features. These results position GenFlow as a groundbreaking solution that redefines accessibility and efficiency in the realm of generative art.
☆ CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection ICCV 2025
Detection-free methods typically follow a coarse-to-fine pipeline, extracting image and point cloud features for patch-level matching and refining dense pixel-to-point correspondences. However, differences in feature channel attention between images and point clouds may lead to degraded matching results, ultimately impairing registration accuracy. Furthermore, similar structures in the scene could lead to redundant correspondences in cross-modal matching. To address these issues, we propose Channel Adaptive Adjustment Module (CAA) and Global Optimal Selection Module (GOS). CAA enhances intra-modal features and suppresses cross-modal sensitivity, while GOS replaces local selection with global optimization. Experiments on RGB-D Scenes V2 and 7-Scenes demonstrate the superiority of our method, achieving state-of-the-art performance in image-to-point cloud registration.
comment: ICCV 2025 accepted
☆ ToosiCubix: Monocular 3D Cuboid Labeling via Vehicle Part Annotations
Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per vehicle, making it highly practical for adding 3D annotations to existing datasets originally collected without specialized equipment. By annotating specific features (e.g., wheels, car badge, symmetries) across different vehicle parts, we accurately estimate each vehicle's position, orientation, and dimensions up to a scale ambiguity (8 DoF). The geometric constraints are formulated as an optimization problem, which we solve using a coordinate descent strategy, alternating between Perspective-n-Points (PnP) and least-squares subproblems. To handle common ambiguities such as scale and unobserved dimensions, we incorporate probabilistic size priors, enabling 9 DoF cuboid placements. We validate our annotations against the KITTI and Cityscapes3D datasets, demonstrating that our method offers a cost-effective and scalable solution for high-quality 3D cuboid annotation.
☆ CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations
2D scene graphs provide a structural and explainable framework for scene understanding. However, current work still struggles with the lack of accurate scene graph data. To overcome this data bottleneck, we present CoPa-SG, a synthetic scene graph dataset with highly precise ground truth and exhaustive relation annotations between all objects. Moreover, we introduce parametric and proto-relations, two new fundamental concepts for scene graphs. The former provides a much more fine-grained representation than its traditional counterpart by enriching relations with additional parameters such as angles or distances. The latter encodes hypothetical relations in a scene graph and describes how relations would form if new objects are placed in the scene. Using CoPa-SG, we compare the performance of various scene graph generation models. We demonstrate how our new relation types can be integrated in downstream applications to enhance planning and reasoning capabilities.
☆ ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Cinematography, the fundamental visual language of film, is essential for conveying narrative, emotion, and aesthetic quality. While recent Vision-Language Models (VLMs) demonstrate strong general visual understanding, their proficiency in comprehending the nuanced cinematic grammar embedded within individual shots remains largely unexplored and lacks robust evaluation. This critical gap limits both fine-grained visual comprehension and the precision of AI-assisted video generation. To address this, we introduce \textbf{ShotBench}, a comprehensive benchmark specifically designed for cinematic language understanding. It features over 3.5k expert-annotated QA pairs from images and video clips, meticulously curated from over 200 acclaimed (predominantly Oscar-nominated) films and spanning eight key cinematography dimensions. Our evaluation of 24 leading VLMs on ShotBench reveals their substantial limitations: even the top-performing model achieves less than 60\% average accuracy, particularly struggling with fine-grained visual cues and complex spatial reasoning. To catalyze advancement in this domain, we construct \textbf{ShotQA}, a large-scale multimodal dataset comprising approximately 70k cinematic QA pairs. Leveraging ShotQA, we develop \textbf{ShotVL} through supervised fine-tuning and Group Relative Policy Optimization. ShotVL significantly outperforms all existing open-source and proprietary models on ShotBench, establishing new \textbf{state-of-the-art} performance. We open-source our models, data, and code to foster rapid progress in this crucial area of AI-driven cinematic understanding and generation.
☆ Generalizable Neural Electromagnetic Inverse Scattering
Solving Electromagnetic Inverse Scattering Problems (EISP) is fundamental in applications such as medical imaging, where the goal is to reconstruct the relative permittivity from scattered electromagnetic field. This inverse process is inherently ill-posed and highly nonlinear, making it particularly challenging. A recent machine learning-based approach, Img-Interiors, shows promising results by leveraging continuous implicit functions. However, it requires case-specific optimization, lacks generalization to unseen data, and fails under sparse transmitter setups (e.g., with only one transmitter). To address these limitations, we revisit EISP from a physics-informed perspective, reformulating it as a two stage inverse transmission-scattering process. This formulation reveals the induced current as a generalizable intermediate representation, effectively decoupling the nonlinear scattering process from the ill-posed inverse problem. Built on this insight, we propose the first generalizable physics-driven framework for EISP, comprising a current estimator and a permittivity solver, working in an end-to-end manner. The current estimator explicitly learns the induced current as a physical bridge between the incident and scattered field, while the permittivity solver computes the relative permittivity directly from the estimated induced current. This design enables data-driven training and generalizable feed-forward prediction of relative permittivity on unseen data while maintaining strong robustness to transmitter sparsity. Extensive experiments show that our method outperforms state-of-the-art approaches in reconstruction accuracy, generalization, and robustness. This work offers a fundamentally new perspective on electromagnetic inverse scattering and represents a major step toward cost-effective practical solutions for electromagnetic imaging.
☆ PanSt3R: Multi-view Consistent Panoptic Segmentation ICCV 2025
Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on NeRF) to integrate and fuse the 2D predictions. We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal as it fails to leverage the full potential of spatial relationships across views. In addition to requiring camera parameters, these approaches also necessitate computationally expensive test-time optimization for each scene. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view panoptic segmentation in a single forward pass. Our approach builds upon recent advances in 3D reconstruction, specifically upon MUSt3R, a scalable multi-view version of DUSt3R, and enhances it with semantic awareness and multi-view panoptic segmentation capabilities. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster than existing methods.
comment: Accepted at ICCV 2025
☆ Automatic Reviewers Assignment to a Research Paper Based on Allied References and Publications Weight
Everyday, a vast stream of research documents is submitted to conferences, anthologies, journals, newsletters, annual reports, daily papers, and various periodicals. Many such publications use independent external specialists to review submissions. This process is called peer review, and the reviewers are called referees. However, it is not always possible to pick the best referee for reviewing. Moreover, new research fields are emerging in every sector, and the number of research papers is increasing dramatically. To review all these papers, every journal assigns a small team of referees who may not be experts in all areas. For example, a research paper in communication technology should be reviewed by an expert from the same field. Thus, efficiently selecting the best reviewer or referee for a research paper is a big challenge. In this research, we propose and implement program that uses a new strategy to automatically select the best reviewers for a research paper. Every research paper contains references at the end, usually from the same area. First, we collect the references and count authors who have at least one paper in the references. Then, we automatically browse the web to extract research topic keywords. Next, we search for top researchers in the specific topic and count their h-index, i10-index, and citations for the first n authors. Afterward, we rank the top n authors based on a score and automatically browse their homepages to retrieve email addresses. We also check their co-authors and colleagues online and discard them from the list. The remaining top n authors, generally professors, are likely the best referees for reviewing the research paper.
comment: IEEE Conference Proceedings (5 Pages)
☆ Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models
Surgical workflow analysis is essential in robot-assisted surgeries, yet the long duration of such procedures poses significant challenges for comprehensive video analysis. Recent approaches have predominantly relied on transformer models; however, their quadratic attention mechanism restricts efficient processing of lengthy surgical videos. In this paper, we propose a novel hierarchical input-dependent state space model that leverages the linear scaling property of state space models to enable decision making on full-length videos while capturing both local and global dynamics. Our framework incorporates a temporally consistent visual feature extractor, which appends a state space model head to a visual feature extractor to propagate temporal information. The proposed model consists of two key modules: a local-aggregation state space model block that effectively captures intricate local dynamics, and a global-relation state space model block that models temporal dependencies across the entire video. The model is trained using a hybrid discrete-continuous supervision strategy, where both signals of discrete phase labels and continuous phase progresses are propagated through the network. Experiments have shown that our method outperforms the current state-of-the-art methods by a large margin (+2.8% on Cholec80, +4.3% on MICCAI2016, and +12.9% on Heichole datasets). Code will be publicly available after paper acceptance.
☆ Multimodal LLMs for Visualization Reconstruction and Understanding
Visualizations are crucial for data communication, yet understanding them requires comprehension of both visual elements and their underlying data relationships. Current multimodal large models, while effective in natural image understanding, struggle with visualization due to their inability to decode the data-to-visual mapping rules and extract structured information. To address these challenges, we present a novel dataset and train multimodal visualization LLMs specifically designed for understanding. Our approach combines chart images with their corresponding vectorized representations, encoding schemes, and data features. The proposed vector format enables compact and accurate reconstruction of visualization content. Experimental results demonstrate significant improvements in both data extraction accuracy and chart reconstruction quality.
☆ LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning
Current vision-language models (VLMs) are well-adapted for general visual understanding tasks. However, they perform inadequately when handling complex visual tasks related to human poses and actions due to the lack of specialized vision-language instruction-following data. We introduce a method for generating such data by integrating human keypoints with traditional visual features such as captions and bounding boxes, enabling more precise understanding of human-centric scenes. Our approach constructs a dataset comprising 200,328 samples tailored to fine-tune models for human-centric tasks, focusing on three areas: conversation, detailed description, and complex reasoning. We establish an Extended Human Pose and Action Understanding Benchmark (E-HPAUB) to assess model performance on human pose and action understanding. We fine-tune the LLaVA-1.5-7B model using this dataset and evaluate our resulting LLaVA-Pose model on the benchmark, achieving significant improvements. Experimental results show an overall improvement of 33.2% compared to the original LLaVA-1.5-7B model. These findings highlight the effectiveness of keypoint-integrated data in enhancing multimodal models for human-centric visual understanding. Code is available at https://github.com/Ody-trek/LLaVA-Pose.
comment: arXiv admin note: substantial text overlap with arXiv:2409.09306
☆ DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images
Visual grounding in text-rich document images is a critical yet underexplored challenge for document intelligence and visual question answering (VQA) systems. We present \drishtikon, a multi-granular visual grounding framework designed to enhance interpretability and trust in VQA for complex, multilingual documents. Our approach integrates robust multi-lingual OCR, large language models, and a novel region matching algorithm to accurately localize answer spans at block, line, word, and point levels. We curate a new benchmark from the CircularsVQA test set, providing fine-grained, human-verified annotations across multiple granularities. Extensive experiments demonstrate that our method achieves state-of-the-art grounding accuracy, with line-level granularity offering the best trade-off between precision and recall. Ablation studies further highlight the benefits of multi-block and multi-line reasoning. Comparative evaluations with leading vision-language models reveal the limitations of current VLMs in precise localization, underscoring the effectiveness of our structured, alignment-based approach. Our findings pave the way for more robust and interpretable document understanding systems in real-world, text-centric scenarios. Code and dataset has been made available at https://github.com/kasuba-badri-vishal/DhrishtiKon.
comment: Work in progress
☆ Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing
The development of continual learning (CL) methods, which aim to learn new tasks in a sequential manner from the training data acquired continuously, has gained great attention in remote sensing (RS). The existing CL methods in RS, while learning new tasks, enhance robustness towards catastrophic forgetting. This is achieved by using a large number of labeled training samples, which is costly and not always feasible to gather in RS. To address this problem, we propose a novel continual self-supervised learning method in the context of masked autoencoders (denoted as CoSMAE). The proposed CoSMAE consists of two components: i) data mixup; and ii) model mixup knowledge distillation. Data mixup is associated with retaining information on previous data distributions by interpolating images from the current task with those from the previous tasks. Model mixup knowledge distillation is associated with distilling knowledge from past models and the current model simultaneously by interpolating their model weights to form a teacher for the knowledge distillation. The two components complement each other to regularize the MAE at the data and model levels to facilitate better generalization across tasks and reduce the risk of catastrophic forgetting. Experimental results show that CoSMAE achieves significant improvements of up to 4.94% over state-of-the-art CL methods applied to MAE. Our code is publicly available at: https://git.tu-berlin.de/rsim/CoSMAE.
comment: Accepted to IEEE Geoscience and Remote Sensing Letters. Our code is available at https://git.tu-berlin.de/rsim/CoSMAE
☆ HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation
Surgical Video Synthesis has emerged as a promising research direction following the success of diffusion models in general-domain video generation. Although existing approaches achieve high-quality video generation, most are unconditional and fail to maintain consistency with surgical actions and phases, lacking the surgical understanding and fine-grained guidance necessary for factual simulation. We address these challenges by proposing HieraSurg, a hierarchy-aware surgical video generation framework consisting of two specialized diffusion models. Given a surgical phase and an initial frame, HieraSurg first predicts future coarse-grained semantic changes through a segmentation prediction model. The final video is then generated by a second-stage model that augments these temporal segmentation maps with fine-grained visual features, leading to effective texture rendering and integration of semantic information in the video space. Our approach leverages surgical information at multiple levels of abstraction, including surgical phase, action triplets, and panoptic segmentation maps. The experimental results on Cholecystectomy Surgical Video Generation demonstrate that the model significantly outperforms prior work both quantitatively and qualitatively, showing strong generalization capabilities and the ability to generate higher frame-rate videos. The model exhibits particularly fine-grained adherence when provided with existing segmentation maps, suggesting its potential for practical surgical applications.
comment: Accepted at MICCAI 2025
☆ HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
With the rapid evolution of multimodal large language models, the capacity to deeply understand and interpret human intentions has emerged as a critical capability, which demands detailed and thoughtful reasoning. In recent studies, Reinforcement Learning (RL) has demonstrated potential in enhancing the reasoning capabilities of Large Language Models (LLMs). Nonetheless, the challenges associated with adapting RL to multimodal data and formats remain largely unaddressed. In this paper, we identify two issues in existing multimodal reasoning models: insufficient global context understanding and shortcut problems. Insufficient context understanding can happen when a model misinterprets multimodal context, resulting in incorrect answers. The shortcut problem occurs when the model overlooks crucial clues in multimodal inputs, directly addressing the query without considering the multimodal information. To tackle these issues, we emphasize the necessity for the model to reason with a clear understanding of the global context within multimodal inputs. This global context understanding can effectively prevent the model from overlooking key multimodal cues and ensure a thorough reasoning process. To ensure the accurate interpretation of multimodal context information, we implement a context reward judged by a large language model, alongside format and accuracy rewards. Additionally, to improve complex reasoning capability, we employ the LLM to assess the logical reward, determining whether the reasoning process successfully integrates multimodal information with logical methods. We also introduce a reasoning omni-modal benchmark, IntentBench, aimed at evaluating models in understanding complex human intentions and emotions. Our proposed method demonstrates advanced performance across multiple omni-modal benchmarks compared to other open-source omni-modal models.
WordCon: Word-level Typography Control in Scene Text Rendering
Achieving precise word-level typography control within generated images remains a persistent challenge. To address it, we newly construct a word-level controlled scene text dataset and introduce the Text-Image Alignment (TIA) framework. This framework leverages cross-modal correspondence between text and local image regions provided by grounding models to enhance the Text-to-Image (T2I) model training. Furthermore, we propose WordCon, a hybrid parameter-efficient fine-tuning (PEFT) method. WordCon reparameterizes selective key parameters, improving both efficiency and portability. This allows seamless integration into diverse pipelines, including artistic text rendering, text editing, and image-conditioned text rendering. To further enhance controllability, the masked loss at the latent level is applied to guide the model to concentrate on learning the text region in the image, and the joint-attention loss provides feature-level supervision to promote disentanglement between different words. Both qualitative and quantitative results demonstrate the superiority of our method to the state of the art. The datasets and source code will be available for academic use.
☆ FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard. To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard. Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen
comment: Project Page: https://jayleejia.github.io/FairyGen/ ; Code: https://github.com/GVCLab/FairyGen
☆ Video Virtual Try-on with Conditional Diffusion Transformer Inpainter
Video virtual try-on aims to naturally fit a garment to a target person in consecutive video frames. It is a challenging task, on the one hand, the output video should be in good spatial-temporal consistency, on the other hand, the details of the given garment need to be preserved well in all the frames. Naively using image-based try-on methods frame by frame can get poor results due to severe inconsistency. Recent diffusion-based video try-on methods, though very few, happen to coincide with a similar solution: inserting temporal attention into image-based try-on model to adapt it for video try-on task, which have shown improvements but there still exist inconsistency problems. In this paper, we propose ViTI (Video Try-on Inpainter), formulate and implement video virtual try-on as a conditional video inpainting task, which is different from previous methods. In this way, we start with a video generation problem instead of an image-based try-on problem, which from the beginning has a better spatial-temporal consistency. Specifically, at first we build a video inpainting framework based on Diffusion Transformer with full 3D spatial-temporal attention, and then we progressively adapt it for video garment inpainting, with a collection of masking strategies and multi-stage training. After these steps, the model can inpaint the masked garment area with appropriate garment pixels according to the prompt with good spatial-temporal consistency. Finally, as other try-on methods, garment condition is added to the model to make sure the inpainted garment appearance and details are as expected. Both quantitative and qualitative experimental results show that ViTI is superior to previous works.
comment: 10 pages, 6 figures
☆ DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic ICCV 2025
Real-world object detection systems, such as those in autonomous driving and surveillance, must continuously learn new object categories and simultaneously adapt to changing environmental conditions. Existing approaches, Class Incremental Object Detection (CIOD) and Domain Incremental Object Detection (DIOD) only address one aspect of this challenge. CIOD struggles in unseen domains, while DIOD suffers from catastrophic forgetting when learning new classes, limiting their real-world applicability. To overcome these limitations, we introduce Dual Incremental Object Detection (DuIOD), a more practical setting that simultaneously handles class and domain shifts in an exemplar-free manner. We propose DuET, a Task Arithmetic-based model merging framework that enables stable incremental learning while mitigating sign conflicts through a novel Directional Consistency Loss. Unlike prior methods, DuET is detector-agnostic, allowing models like YOLO11 and RT-DETR to function as real-time incremental object detectors. To comprehensively evaluate both retention and adaptation, we introduce the Retention-Adaptability Index (RAI), which combines the Average Retention Index (Avg RI) for catastrophic forgetting and the Average Generalization Index for domain adaptability into a common ground. Extensive experiments on the Pascal Series and Diverse Weather Series demonstrate DuET's effectiveness, achieving a +13.12% RAI improvement while preserving 89.3% Avg RI on the Pascal Series (4 tasks), as well as a +11.39% RAI improvement with 88.57% Avg RI on the Diverse Weather Series (3 tasks), outperforming existing methods.
comment: Accepted at ICCV 2025
☆ Temporal Rate Reduction Clustering for Human Motion Segmentation ICCV 2025
Human Motion Segmentation (HMS), which aims to partition videos into non-overlapping human motions, has attracted increasing research attention recently. Existing approaches for HMS are mainly dominated by subspace clustering methods, which are grounded on the assumption that high-dimensional temporal data align with a Union-of-Subspaces (UoS) distribution. However, the frames in video capturing complex human motions with cluttered backgrounds may not align well with the UoS distribution. In this paper, we propose a novel approach for HMS, named Temporal Rate Reduction Clustering ($\text{TR}^2\text{C}$), which jointly learns structured representations and affinity to segment the frame sequences in video. Specifically, the structured representations learned by $\text{TR}^2\text{C}$ maintain temporally consistent and align well with a UoS structure, which is favorable for the HMS task. We conduct extensive experiments on five benchmark HMS datasets and achieve state-of-the-art performances with different feature extractors.
comment: The paper is accepted by ICCV 2025. The first two authors are equally contributed
☆ GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models
This work introduces a novel framework for brain tumor segmentation leveraging pre-trained GANs and Unet architectures. By combining a global anomaly detection module with a refined mask generation network, the proposed model accurately identifies tumor-sensitive regions and iteratively enhances segmentation precision using adversarial loss constraints. Multi-modal MRI data and synthetic image augmentation are employed to improve robustness and address the challenge of limited annotated datasets. Experimental results on the BraTS dataset demonstrate the effectiveness of the approach, achieving high sensitivity and accuracy in both lesion-wise Dice and HD95 metrics than the baseline. This scalable method minimizes the dependency on fully annotated data, paving the way for practical real-world applications in clinical settings.
☆ DiMPLe -- Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
We introduce DiMPLe (Disentangled Multi-Modal Prompt Learning), a novel approach to disentangle invariant and spurious features across vision and language modalities in multi-modal learning. Spurious correlations in visual data often hinder out-of-distribution (OOD) performance. Unlike prior methods focusing solely on image features, DiMPLe disentangles features within and across modalities while maintaining consistent alignment, enabling better generalization to novel classes and robustness to distribution shifts. Our method combines three key objectives: (1) mutual information minimization between invariant and spurious features, (2) spurious feature regularization, and (3) contrastive learning on invariant features. Extensive experiments demonstrate DiMPLe demonstrates superior performance compared to CoOp-OOD, when averaged across 11 diverse datasets, and achieves absolute gains of 15.27 in base class accuracy and 44.31 in novel class accuracy.
☆ Real-Time ESFP: Estimating, Smoothing, Filtering, and Pose-Mapping
This paper presents ESFP, an end-to-end pipeline that converts monocular RGB video into executable joint trajectories for a low-cost 4-DoF desktop arm. ESFP comprises four sequential modules. (1) Estimating: ROMP lifts each frame to a 24-joint 3-D skeleton. (2) Smoothing: the proposed HPSTM-a sequence-to-sequence Transformer with self-attention-combines long-range temporal context with a differentiable forward-kinematics decoder, enforcing constant bone lengths and anatomical plausibility while jointly predicting joint means and full covariances. (3) Filtering: root-normalized trajectories are variance-weighted according to HPSTM's uncertainty estimates, suppressing residual noise. (4) Pose-Mapping: a geometric retargeting layer transforms shoulder-elbow-wrist triples into the uArm's polar workspace, preserving wrist orientation.
☆ ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation ICCV 2025
Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the essential effect of data. Remarkably, extensive evaluations on ten benchmark datasets demonstrate that our method outperforms all existing training-free OVS approaches, highlighting the importance of data-centric design for advancing OVS without training. Our code is available at https://github.com/xiweix/ReME .
comment: Accepted to ICCV 2025
☆ BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models
State-of-the-art text-to-image models like Infinity generate photorealistic images at an unprecedented speed. These models operate in a bitwise autoregressive manner over a discrete set of tokens that is practically infinite in size. However, their impressive generative power comes with a growing risk: as their outputs increasingly populate the Internet, they are likely to be scraped and reused as training data-potentially by the very same models. This phenomenon has been shown to lead to model collapse, where repeated training on generated content, especially from the models' own previous versions, causes a gradual degradation in performance. A promising mitigation strategy is watermarking, which embeds human-imperceptible yet detectable signals into generated images-enabling the identification of generated content. In this work, we introduce BitMark, a robust bitwise watermarking framework for Infinity. Our method embeds a watermark directly at the bit level of the token stream across multiple scales (also referred to as resolutions) during Infinity's image generation process. Our bitwise watermark subtly influences the bits to preserve visual fidelity and generation speed while remaining robust against a spectrum of removal techniques. Furthermore, it exhibits high radioactivity, i.e., when watermarked generated images are used to train another image generative model, this second model's outputs will also carry the watermark. The radioactive traces remain detectable even when only fine-tuning diffusion or image autoregressive models on images watermarked with our BitMark. Overall, our approach provides a principled step toward preventing model collapse in image generative models by enabling reliable detection of generated outputs.
☆ MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification
Current medical image analysis systems are typically task-specific, requiring separate models for classification and segmentation, and lack the flexibility to support user-defined workflows. To address these challenges, we introduce MedPrompt, a unified framework that combines a few-shot prompted Large Language Model (Llama-4-17B) for high-level task planning with a modular Convolutional Neural Network (DeepFusionLab) for low-level image processing. The LLM interprets user instructions and generates structured output to dynamically route task-specific pretrained weights. This weight routing approach avoids retraining the entire framework when adding new tasks-only task-specific weights are required, enhancing scalability and deployment. We evaluated MedPrompt across 19 public datasets, covering 12 tasks spanning 5 imaging modalities. The system achieves a 97% end-to-end correctness in interpreting and executing prompt-driven instructions, with an average inference latency of 2.5 seconds, making it suitable for near real-time applications. DeepFusionLab achieves competitive segmentation accuracy (e.g., Dice 0.9856 on lungs) and strong classification performance (F1 0.9744 on tuberculosis). Overall, MedPrompt enables scalable, prompt-driven medical imaging by combining the interpretability of LLMs with the efficiency of modular CNNs.
comment: 40 pages, 8 Tables, 9 Figures
☆ Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation ICCV 2025
Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Free Occlusion-Aware Seamless Segmentation (SFOASS), and propose its first solution, called UNconstrained Learning Omni-Context Knowledge (UNLOCK). Specifically, UNLOCK includes two key modules: Omni Pseudo-Labeling Learning and Amodal-Driven Context Learning. While adapting without relying on source data or target labels, this framework enhances models to achieve segmentation with 360{\deg} viewpoint coverage and occlusion-aware reasoning. Furthermore, we benchmark the proposed SFOASS task through both real-to-real and synthetic-to-real adaptation settings. Experimental results show that our source-free method achieves performance comparable to source-dependent methods, yielding state-of-the-art scores of 10.9 in mAAP and 11.6 in mAP, along with an absolute improvement of +4.3 in mAPQ over the source-only method. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK.
comment: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK
☆ GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Sequential grounding in 3D point clouds (SG3D) refers to locating sequences of objects by following text instructions for a daily activity with detailed steps. Current 3D visual grounding (3DVG) methods treat text instructions with multiple steps as a whole, without extracting useful temporal information from each step. However, the instructions in SG3D often contain pronouns such as "it", "here" and "the same" to make language expressions concise. This requires grounding methods to understand the context and retrieve relevant information from previous steps to correctly locate object sequences. Due to the lack of an effective module for collecting related historical information, state-of-the-art 3DVG methods face significant challenges in adapting to the SG3D task. To fill this gap, we propose GroundFlow -- a plug-in module for temporal reasoning on 3D point cloud sequential grounding. Firstly, we demonstrate that integrating GroundFlow improves the task accuracy of 3DVG baseline methods by a large margin (+7.5\% and +10.2\%) in the SG3D benchmark, even outperforming a 3D large language model pre-trained on various datasets. Furthermore, we selectively extract both short-term and long-term step information based on its relevance to the current instruction, enabling GroundFlow to take a comprehensive view of historical information and maintain its temporal understanding advantage as step counts increase. Overall, our work introduces temporal reasoning capabilities to existing 3DVG models and achieves state-of-the-art performance in the SG3D benchmark across five datasets.
☆ Out-of-Distribution Semantic Occupancy Prediction
3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space. To fill the gaps in the dataset, we propose a Synthetic Anomaly Integration Pipeline that injects synthetic anomalies while preserving realistic spatial and occlusion patterns, enabling the creation of two datasets: VAA-KITTI and VAA-KITTI-360. We introduce OccOoD, a novel framework integrating OoD detection into 3D semantic occupancy prediction, with Voxel-BEV Progressive Fusion (VBPF) leveraging an RWKV-based branch to enhance OoD detection via geometry-semantic fusion. Experimental results demonstrate that OccOoD achieves state-of-the-art OoD detection with an AuROC of 67.34% and an AuPRCr of 29.21% within a 1.2m region, while maintaining competitive occupancy prediction performance. The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD.
comment: The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD
☆ Task-Aware KV Compression For Cost-Effective Long Video Understanding
Long-video understanding (LVU) remains a severe challenge for existing multimodal large language models (MLLMs), primarily due to the prohibitive computational cost. Recent approaches have explored KV compression to mitigate this issue, but they often suffer from significant information loss at high compression ratios. In this paper, we introduce Video-X^2L, which flexibly preserves critical video information for each LVU task. Video-X^2L involves two key operations. The first one is called bi-level KV compression. During the MLLM's pre-filling stage, Video-X^2L generates two types of compressed KVs: low-compression KVs (L-KVs) to capture fine-grained video details and high-compression KVs (H-KVs) to offer compact video representations. The second one is called selective KV re-loading. During the MLLM's decoding stage, Video-X^2L selectively re-loads L-KVs for the most critical video chunks while using H-KVs for other less important ones. This allows the MLLM to fully utilize task-specific information while maintaining the overall compactness. Video-X^2L is simple yet effective: it is free from additional training and directly compatible with existing KV-compressible MLLMs. We evaluate Video-X^2L with a variety of popular LVU benchmarks, including VideoMME, MLVU, LongVideoBench, and VNBench. Our experiment result shows that Video-X^2L outperforms existing KV-compression methods by a huge advantage while substantially saving the computation cost.
comment: 14 pages, 3 figures, 6 tables
☆ Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations
Joint Photographic Experts Group (JPEG) achieves data compression by quantizing Discrete Cosine Transform (DCT) coefficients, which inevitably introduces compression artifacts. Most existing JPEG quality enhancement methods operate in the pixel domain, suffering from the high computational costs of decoding. Consequently, direct enhancement of JPEG images in the DCT domain has gained increasing attention. However, current DCT-domain methods often exhibit limited performance. To address this challenge, we identify two critical types of correlations within the DCT coefficients of JPEG images. Building on this insight, we propose an Advanced DCT-domain JPEG Quality Enhancement (AJQE) method that fully exploits these correlations. The AJQE method enables the adaptation of numerous well-established pixel-domain models to the DCT domain, achieving superior performance with reduced computational complexity. Compared to the pixel-domain counterparts, the DCT-domain models derived by our method demonstrate a 0.35 dB improvement in PSNR and a 60.5% increase in enhancement throughput on average.
☆ Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition
Learning semantic representations from point sets of 3D object shapes is often challenged by significant geometric variations, primarily due to differences in data acquisition methods. Typically, training data is generated using point simulators, while testing data is collected with distinct 3D sensors, leading to a simulation-to-reality (Sim2Real) domain gap that limits the generalization ability of point classifiers. Current unsupervised domain adaptation (UDA) techniques struggle with this gap, as they often lack robust, domain-insensitive descriptors capable of capturing global topological information, resulting in overfitting to the limited semantic patterns of the source domain. To address this issue, we introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds. Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures, and by modeling the topological relations of local geometric features through a novel self-supervised learning task. Additionally, we propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training, effectively reducing the impact of noisy pseudo-labels and enhancing the robustness of the adaptation process. Experimental results on three public Sim2Real benchmarks validate the effectiveness of our TAM framework, showing consistent improvements over state-of-the-art methods across all evaluated tasks. The source code of this work will be available at https://github.com/zou-longkun/TAG.git.
☆ Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image
Generating realistic 3D objects from single-view images requires natural appearance, 3D consistency, and the ability to capture multiple plausible interpretations of unseen regions. Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference or 3D Gaussian Splatting, but their results generally suffer from poor multiview consistency and lack geometric detail. To takle these issues, we present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training to reconstruct detailed 3D objects from a single image. Specifically, we train three different Gaussian branches initialized from the geometry prior, perception prior and Gaussian noise, respectively. The geometry prior captures the rough 3D shapes, while the perception prior utilizes the 2D pretrained diffusion model to enhance multiview information. Subsequently, we refine 3D Gaussian branches through mutual interaction between geometry and perception priors, further enhanced by a reprojection-based strategy that enforces depth consistency. Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction, demonstrating robust and consistent 3D object generation.
comment: 10 pages, 5 figures
☆ Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels
The accurate segmentation of myocardial scars from cardiac MRI is essential for clinical assessment and treatment planning. In this study, we propose a robust deep-learning pipeline for fully automated myocardial scar detection and segmentation by fine-tuning state-of-the-art models. The method explicitly addresses challenges of label noise from semi-automatic annotations, data heterogeneity, and class imbalance through the use of Kullback-Leibler loss and extensive data augmentation. We evaluate the model's performance on both acute and chronic cases and demonstrate its ability to produce accurate and smooth segmentations despite noisy labels. In particular, our approach outperforms state-of-the-art models like nnU-Net and shows strong generalizability in an out-of-distribution test set, highlighting its robustness across various imaging conditions and clinical tasks. These results establish a reliable foundation for automated myocardial scar quantification and support the broader clinical adoption of deep learning in cardiac imaging.
comment: MICCAI 2025
☆ Tree-based Semantic Losses: Application to Sparsely-supervised Large Multi-class Hyperspectral Segmentation
Hyperspectral imaging (HSI) shows great promise for surgical applications, offering detailed insights into biological tissue differences beyond what the naked eye can perceive. Refined labelling efforts are underway to train vision systems to distinguish large numbers of subtly varying classes. However, commonly used learning methods for biomedical segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the label space. In this work, we introduce two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations. Extensive experiments demonstrate that our proposed method reaches state-of-the-art performance on a sparsely annotated HSI dataset comprising $107$ classes organised in a clinically-defined semantic tree structure. Furthermore, our method enables effective detection of out-of-distribution (OOD) pixels without compromising segmentation performance on in-distribution (ID) pixels.
☆ Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion
Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client's unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.
☆ YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection
Surface defect detection in industrial scenarios is both crucial and technically demanding due to the wide variability in defect types, irregular shapes and sizes, fine-grained requirements, and complex material textures. Although recent advances in AI-based detectors have improved performance, existing methods often suffer from redundant features, limited detail sensitivity, and weak robustness under multiscale conditions. To address these challenges, we propose YOLO-FDA, a novel YOLO-based detection framework that integrates fine-grained detail enhancement and attention-guided feature fusion. Specifically, we adopt a BiFPN-style architecture to strengthen bidirectional multilevel feature aggregation within the YOLOv5 backbone. To better capture fine structural changes, we introduce a Detail-directional Fusion Module (DDFM) that introduces a directional asymmetric convolution in the second-lowest layer to enrich spatial details and fuses the second-lowest layer with low-level features to enhance semantic consistency. Furthermore, we propose two novel attention-based fusion strategies, Attention-weighted Concatenation (AC) and Cross-layer Attention Fusion (CAF) to improve contextual representation and reduce feature noise. Extensive experiments on benchmark datasets demonstrate that YOLO-FDA consistently outperforms existing state-of-the-art methods in terms of both accuracy and robustness across diverse types of defects and scales.
comment: 14 pages, 6 figures. Submitted to The 8th Chinese Conference on Pattern Recognition and Computer Vision
☆ Learning to See in the Extremely Dark ICCV 2025
Learning-based methods have made promising advances in low-light RAW image enhancement, while their capability to extremely dark scenes where the environmental illuminance drops as low as 0.0001 lux remains to be explored due to the lack of corresponding datasets. To this end, we propose a paired-to-paired data synthesis pipeline capable of generating well-calibrated extremely low-light RAW images at three precise illuminance ranges of 0.01-0.1 lux, 0.001-0.01 lux, and 0.0001-0.001 lux, together with high-quality sRGB references to comprise a large-scale paired dataset named See-in-the-Extremely-Dark (SIED) to benchmark low-light RAW image enhancement approaches. Furthermore, we propose a diffusion-based framework that leverages the generative ability and intrinsic denoising property of diffusion models to restore visually pleasing results from extremely low-SNR RAW inputs, in which an Adaptive Illumination Correction Module (AICM) and a color consistency loss are introduced to ensure accurate exposure correction and color restoration. Extensive experiments on the proposed SIED and publicly available benchmarks demonstrate the effectiveness of our method. The code and dataset are available at https://github.com/JianghaiSCU/SIED.
comment: Accepted by ICCV 2025
GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction ICML 2025
Trajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context representations. We develop a feature adaptor to effectively aggregate lane-graph features into grid space, enabling seamless integration with the maximum entropy IRL paradigm to infer the reward distribution and obtain the policy that can be sampled to induce multiple plausible plans. Furthermore, conditioned on the sampled plans, we implement a hierarchical parameterized trajectory generator with a refinement module to enhance prediction accuracy and a probability fusion strategy to boost prediction confidence. Extensive experimental results showcase our approach not only achieves state-of-the-art performance on the large-scale Argoverse & nuScenes motion forecasting benchmarks but also exhibits superior generalization abilities compared to existing supervised models.
comment: Accepted by ICML 2025
☆ CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization ICCV 2025
In dynamic 3D environments, accurately updating scene representations over time is crucial for applications in robotics, mixed reality, and embodied AI. As scenes evolve, efficient methods to incorporate changes are needed to maintain up-to-date, high-quality reconstructions without the computational overhead of re-optimizing the entire scene. This paper introduces CL-Splats, which incrementally updates Gaussian splatting-based 3D representations from sparse scene captures. CL-Splats integrates a robust change-detection module that segments updated and static components within the scene, enabling focused, local optimization that avoids unnecessary re-computation. Moreover, CL-Splats supports storing and recovering previous scene states, facilitating temporal segmentation and new scene-analysis applications. Our extensive experiments demonstrate that CL-Splats achieves efficient updates with improved reconstruction quality over the state-of-the-art. This establishes a robust foundation for future real-time adaptation in 3D scene reconstruction tasks.
comment: ICCV 2025, Project Page: https://cl-splats.github.io
☆ IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
Video Large Language Models (VideoLLMs) have demonstrated remarkable understanding capabilities, but are found struggling to tackle multi-shot scenarios,e.g., video clips with varying camera angles or scene changes. This challenge can render failures such as instance identity forgetting and key frame negligence. In this work, we first attribute the challenge to the lack of multi-shot annotations among existing datasets and therefore we introduce a new dataset termed MultiClip-Bench, featuring dense descriptions and instruction-based question-answering pairs tailored for multi-shot scenarios. We empirically find that the training set significantly boosts the multi-shot performance, while the testing benchmark provides a reliable measure of the model capability in multi-shot scenarios. By further analyzing and discovering that current models only encode instance features in a discrete or lossy manner, at the risk of missing identity information, we then contribute a new model IPFormer-VideoLLM. Its key idea is the injection of instance-level features as instance prompts through an efficient attention-based connector. This allows for the aggregation of instance-specific information across scenes. Experiments demonstrate that our proposed dataset and model not only enhance the multi-scene video understanding significantly, but also offer distinct advantages across various video benchmarks.
☆ Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection
Remote sensing change detection is essential for monitoring urban expansion, disaster assessment, and resource management, offering timely, accurate, and large-scale insights into dynamic landscape transformations. While deep learning has revolutionized change detection, the increasing complexity and computational demands of modern models have not necessarily translated into significant accuracy gains. Instead of following this trend, this study explores a more efficient approach, focusing on lightweight models that maintain high accuracy while minimizing resource consumption, which is an essential requirement for on-satellite processing. To this end, we propose FlickCD, which means quick flick then get great results, pushing the boundaries of the performance-resource trade-off. FlickCD introduces an Enhanced Difference Module (EDM) to amplify critical feature differences between temporal phases while suppressing irrelevant variations such as lighting and weather changes, thereby reducing computational costs in the subsequent change decoder. Additionally, the FlickCD decoder incorporates Local-Global Fusion Blocks, leveraging Shifted Window Self-Attention (SWSA) and Enhanced Global Self-Attention (EGSA) to efficiently capture semantic information at multiple scales, preserving both coarse- and fine-grained changes. Extensive experiments on four benchmark datasets demonstrate that FlickCD reduces computational and storage overheads by more than an order of magnitude while achieving state-of-the-art (SOTA) performance or incurring only a minor (<1\% F1) accuracy trade-off. The implementation code is publicly available at https://github.com/xulsh8/FlickCD.
comment: 12 pages
☆ OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography ICCV 2025
As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations. Despite the discovery of approximately 4,500 OBS characters, only about 1,600 have been deciphered. The remaining undeciphered ones, with their complex structure and abstract imagery, pose significant challenges for interpretation. To address these challenges, this paper proposes a novel two-stage semantic typography framework, named OracleFusion. In the first stage, this approach leverages the Multimodal Large Language Model (MLLM) with enhanced Spatial Awareness Reasoning (SAR) to analyze the glyph structure of the OBS character and perform visual localization of key components. In the second stage, we introduce Oracle Structural Vector Fusion (OSVF), incorporating glyph structure constraints and glyph maintenance constraints to ensure the accurate generation of semantically enriched vector fonts. This approach preserves the objective integrity of the glyph structure, offering visually enhanced representations that assist experts in deciphering OBS. Extensive qualitative and quantitative experiments demonstrate that OracleFusion outperforms state-of-the-art baseline models in terms of semantics, visual appeal, and glyph maintenance, significantly enhancing both readability and aesthetic quality. Furthermore, OracleFusion provides expert-like insights on unseen oracle characters, making it a valuable tool for advancing the decipherment of OBS.
comment: Accepted to ICCV 2025
☆ ESMStereo: Enhanced ShuffleMixer Disparity Upsampling for Real-Time and Accurate Stereo Matching
Stereo matching has become an increasingly important component of modern autonomous systems. Developing deep learning-based stereo matching models that deliver high accuracy while operating in real-time continues to be a major challenge in computer vision. In the domain of cost-volume-based stereo matching, accurate disparity estimation depends heavily on large-scale cost volumes. However, such large volumes store substantial redundant information and also require computationally intensive aggregation units for processing and regression, making real-time performance unattainable. Conversely, small-scale cost volumes followed by lightweight aggregation units provide a promising route for real-time performance, but lack sufficient information to ensure highly accurate disparity estimation. To address this challenge, we propose the Enhanced Shuffle Mixer (ESM) to mitigate information loss associated with small-scale cost volumes. ESM restores critical details by integrating primary features into the disparity upsampling unit. It quickly extracts features from the initial disparity estimation and fuses them with image features. These features are mixed by shuffling and layer splitting then refined through a compact feature-guided hourglass network to recover more detailed scene geometry. The ESM focuses on local contextual connectivity with a large receptive field and low computational cost, leading to the reconstruction of a highly accurate disparity map at real-time. The compact version of ESMStereo achieves an inference speed of 116 FPS on high-end GPUs and 91 FPS on the AGX Orin.
comment: Under peer review
☆ EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception ICCV 2025
Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.
comment: Accepted at ICCV 2025
☆ PoseMaster: Generating 3D Characters in Arbitrary Poses from a Single Image
3D characters play a crucial role in our daily entertainment. To improve the efficiency of 3D character modeling, recent image-based methods use two separate models to achieve pose standardization and 3D reconstruction of the A-pose character. However, these methods are prone to generating distorted and degraded images in the pose standardization stage due to self-occlusion and viewpoints, which further affects the geometric quality of the subsequent reconstruction process. To tackle these problems, we propose PoseMaster, an end-to-end controllable 3D character generation framework. Specifically, we unify pose transformation and 3D character generation into a flow-based 3D native generation framework. To achieve accurate arbitrary-pose control, we propose to leverage the 3D body bones existing in the skeleton of an animatable character as the pose condition. Furthermore, considering the specificity of multi-condition control, we randomly empty the pose condition and the image condition during training to improve the effectiveness and generalizability of pose control. Finally, we create a high-quality pose-control dataset derived from realistic character animation data to make the model learning the implicit relationships between skeleton and skinning weights. Extensive experiments show that PoseMaster outperforms current state-of-the-art techniques in both qualitative and quantitative evaluations for A-pose character generation while demonstrating its powerful ability to achieve precise control for arbitrary poses.
☆ SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification
Retrieving 3D objects in complex indoor environments using only a masked 2D image and a natural language description presents significant challenges. The ROOMELSA challenge limits access to full 3D scene context, complicating reasoning about object appearance, geometry, and semantics. These challenges are intensified by distorted viewpoints, textureless masked regions, ambiguous language prompts, and noisy segmentation masks. To address this, we propose SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification. SAMURAI integrates CLIP-based semantic matching with shape-guided re-ranking derived from binary silhouettes of masked regions, alongside a robust majority voting strategy. A dedicated preprocessing pipeline enhances mask quality by extracting the largest connected component and removing background noise. Our hybrid retrieval framework leverages both language and shape cues, achieving competitive performance on the ROOMELSA private test set. These results highlight the importance of combining shape priors with language understanding for robust open-world 3D object retrieval.
☆ Class-Agnostic Region-of-Interest Matching in Document Images
Document understanding and analysis have received a lot of attention due to their widespread application. However, existing document analysis solutions, such as document layout analysis and key information extraction, are only suitable for fixed category definitions and granularities, and cannot achieve flexible applications customized by users. Therefore, this paper defines a new task named ``Class-Agnostic Region-of-Interest Matching'' (``RoI-Matching'' for short), which aims to match the customized regions in a flexible, efficient, multi-granularity, and open-set manner. The visual prompt of the reference document and target document images are fed into our model, while the output is the corresponding bounding boxes in the target document images. To meet the above requirements, we construct a benchmark RoI-Matching-Bench, which sets three levels of difficulties following real-world conditions, and propose the macro and micro metrics to evaluate. Furthermore, we also propose a new framework RoI-Matcher, which employs a siamese network to extract multi-level features both in the reference and target domains, and cross-attention layers to integrate and align similar semantics in different domains. Experiments show that our method with a simple procedure is effective on RoI-Matching-Bench, and serves as the baseline for further research. The code is available at https://github.com/pd162/RoI-Matching.
comment: Accepted by ICDAR2025
☆ Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features ICCV 2025
The ability of deep neural networks (DNNs) come from extracting and interpreting features from the data provided. By exploiting intermediate features in DNNs instead of relying on hard labels, we craft adversarial perturbation that generalize more effectively, boosting black-box transferability. These features ubiquitously come from supervised learning in previous work. Inspired by the exceptional synergy between self-supervised learning and the Transformer architecture, this paper explores whether exploiting self-supervised Vision Transformer (ViT) representations can improve adversarial transferability. We present dSVA -- a generative dual self-supervised ViT features attack, that exploits both global structural features from contrastive learning (CL) and local textural features from masked image modeling (MIM), the self-supervised learning paradigm duo for ViTs. We design a novel generative training framework that incorporates a generator to create black-box adversarial examples, and strategies to train the generator by exploiting joint features and the attention mechanism of self-supervised ViTs. Our findings show that CL and MIM enable ViTs to attend to distinct feature tendencies, which, when exploited in tandem, boast great adversarial generalizability. By disrupting dual deep features distilled by self-supervised ViTs, we are rewarded with remarkable black-box transferability to models of various architectures that outperform state-of-the-arts. Code available at https://github.com/spencerwooo/dSVA.
comment: 14 pages, 9 figures, to appear in ICCV 2025
☆ Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling
Text-guided diffusion models have become essential for high-quality image synthesis, enabling dynamic image editing. In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved. However, achieving optimal results is challenging because of the inherent trade-off between editability and faithfulness. To address this, we propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability. FGS incorporates faithfulness guidance to strengthen the preservation of input image information and introduces a scheduling strategy to resolve misalignment between editability and faithfulness. Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability. Moreover, its compatibility with various editing methods enables precise, high-quality image edits across diverse tasks.
comment: preprint
☆ Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability ICCV2025
Detectors often suffer from performance drop due to domain gap between training and testing data. Recent methods explore diffusion models applied to domain generalization (DG) and adaptation (DA) tasks, but still struggle with large inference costs and have not yet fully leveraged the capabilities of diffusion models. We propose to tackle these problems by extracting intermediate features from a single-step diffusion process, improving feature collection and fusion to reduce inference time by 75% while enhancing performance on source domains (i.e., Fitness). Then, we construct an object-centered auxiliary branch by applying box-masked images with class prompts to extract robust and domain-invariant features that focus on object. We also apply consistency loss to align the auxiliary and ordinary branch, balancing fitness and generalization while preventing overfitting and improving performance on target domains (i.e., Generalization). Furthermore, within a unified framework, standard detectors are guided by diffusion detectors through feature-level and object-level alignment on source domains (for DG) and unlabeled target domains (for DA), thereby improving cross-domain detection performance (i.e., Transferability). Our method achieves competitive results on 3 DA benchmarks and 5 DG benchmarks. Additionally, experiments on COCO generalization benchmark demonstrate that our method maintains significant advantages and show remarkable efficiency in large domain shifts and low-data scenarios. Our work shows the superiority of applying diffusion models to domain generalized and adaptive detection tasks and offers valuable insights for visual perception tasks across diverse domains. The code is available at \href{https://github.com/heboyong/Fitness-Generalization-Transferability}{Fitness-Generalization-Transferability}.
comment: Accepted by ICCV2025. arXiv admin note: text overlap with arXiv:2503.02101
☆ V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling
Ensuring robust planning and decision-making under rare, diverse, and visually degraded long-tail scenarios remains a fundamental challenge for autonomous driving in urban environments. This issue becomes more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address this challenge, we propose V2X-REALM, a vision-language model (VLM)-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios. V2X-REALM introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions such as snow and fog across vehicle- and infrastructure-side views, enriching training diversity efficiently; (ii) a gated multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability. Extensive experiments demonstrate that V2X-REALM significantly outperforms existing baselines in robustness, semantic reasoning, safety, and planning accuracy under complex, challenging driving conditions, advancing the scalability of end-to-end cooperative autonomous driving.
☆ RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment ICCV 2025
Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more data-efficient training paradigms. Data selection has shown promise to mitigate redundancy by identifying the most representative samples, thereby reducing training costs without compromising performance. Existing methods typically rely on static scoring metrics or pretrained models, overlooking the combined effect of selected samples and their evolving dynamics during training. We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships, capturing the intrinsic structure of the dataset. Based on this, we reformulate data selection as a reinforcement learning (RL) process and propose RL-Selector, where a lightweight RL agent optimizes the selection policy by leveraging epsilon-sample cover derived from evolving dataset distribution as a reward signal. Extensive experiments across benchmark datasets and diverse architectures demonstrate that our method consistently outperforms existing state-of-the-art baselines. Models trained with our selected datasets show enhanced generalization performance with improved training efficiency.
comment: ICCV 2025
☆ DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation
Commercial RGB-D cameras often produce noisy, incomplete depth maps for non-Lambertian objects. Traditional depth completion methods struggle to generalize due to the limited diversity and scale of training data. Recent advances exploit visual priors from pre-trained text-to-image diffusion models to enhance generalization in dense prediction tasks. However, we find that biases arising from training-inference mismatches in the vanilla diffusion framework significantly impair depth completion performance. Additionally, the lack of distinct visual features in non-Lambertian regions further hinders precise prediction. To address these issues, we propose \textbf{DidSee}, a diffusion-based framework for depth completion on non-Lambertian objects. First, we integrate a rescaled noise scheduler enforcing a zero terminal signal-to-noise ratio to eliminate signal leakage bias. Second, we devise a noise-agnostic single-step training formulation to alleviate error accumulation caused by exposure bias and optimize the model with a task-specific loss. Finally, we incorporate a semantic enhancer that enables joint depth completion and semantic segmentation, distinguishing objects from backgrounds and yielding precise, fine-grained depth maps. DidSee achieves state-of-the-art performance on multiple benchmarks, demonstrates robust real-world generalization, and effectively improves downstream tasks such as category-level pose estimation and robotic grasping.Project page: https://wenzhoulyu.github.io/DidSee/
☆ Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
Image tokenization plays a critical role in reducing the computational demands of modeling high-resolution images, significantly improving the efficiency of image and multimodal understanding and generation. Recent advances in 1D latent spaces have reduced the number of tokens required by eliminating the need for a 2D grid structure. In this paper, we further advance compact discrete image representation by introducing 1D binary image latents. By representing each image as a sequence of binary vectors, rather than using traditional one-hot codebook tokens, our approach preserves high-resolution details while maintaining the compactness of 1D latents. To the best of our knowledge, our text-to-image models are the first to achieve competitive performance in both diffusion and auto-regressive generation using just 128 discrete tokens for images up to 1024x1024, demonstrating up to a 32-fold reduction in token numbers compared to standard VQ-VAEs. The proposed 1D binary latent space, coupled with simple model architectures, achieves marked improvements in speed training and inference speed. Our text-to-image models allow for a global batch size of 4096 on a single GPU node with 8 AMD MI300X GPUs, and the training can be completed within 200 GPU days. Our models achieve competitive performance compared to modern image generation models without any in-house private training data or post-training refinements, offering a scalable and efficient alternative to conventional tokenization methods.
☆ LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection
Effective deep feature extraction via feature-level fusion is crucial for multimodal object detection. However, previous studies often involve complex training processes that integrate modality-specific features by stacking multiple feature-level fusion units, leading to significant computational overhead. To address this issue, we propose a new fusion detection baseline that uses a single feature-level fusion unit to enable high-performance detection, thereby simplifying the training process. Based on this approach, we propose a lightweight attention-guided self-modulation feature fusion network (LASFNet), which introduces a novel attention-guided self-modulation feature fusion (ASFF) module that adaptively adjusts the responses of fusion features at both global and local levels based on attention information from different modalities, thereby promoting comprehensive and enriched feature generation. Additionally, a lightweight feature attention transformation module (FATM) is designed at the neck of LASFNet to enhance the focus on fused features and minimize information loss. Extensive experiments on three representative datasets demonstrate that, compared to state-of-the-art methods, our approach achieves a favorable efficiency-accuracy trade-off, reducing the number of parameters and computational cost by as much as 90% and 85%, respectively, while improving detection accuracy (mAP) by 1%-3%. The code will be open-sourced at https://github.com/leileilei2000/LASFNet.
☆ Multimodal Prompt Alignment for Facial Expression Recognition ICCV2025
Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs) like CLIP for various downstream tasks. Despite their success, current VLM-based facial expression recognition (FER) methods struggle to capture fine-grained textual-visual relationships, which are essential for distinguishing subtle differences between facial expressions. To address this challenge, we propose a multimodal prompt alignment framework for FER, called MPA-FER, that provides fine-grained semantic guidance to the learning process of prompted visual features, resulting in more precise and interpretable representations. Specifically, we introduce a multi-granularity hard prompt generation strategy that utilizes a large language model (LLM) like ChatGPT to generate detailed descriptions for each facial expression. The LLM-based external knowledge is injected into the soft prompts by minimizing the feature discrepancy between the soft prompts and the hard prompts. To preserve the generalization abilities of the pretrained CLIP model, our approach incorporates prototype-guided visual feature alignment, ensuring that the prompted visual features from the frozen image encoder align closely with class-specific prototypes. Additionally, we propose a cross-modal global-local alignment module that focuses on expression-relevant facial features, further improving the alignment between textual and visual features. Extensive experiments demonstrate our framework outperforms state-of-the-art methods on three FER benchmark datasets, while retaining the benefits of the pretrained model and minimizing computational costs.
comment: To appear in ICCV2025
☆ HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation
Machine learning-assisted diagnosis is gaining traction in skin disease detection, but training effective models requires large amounts of high-quality data. Skin disease datasets often suffer from class imbalance, privacy concerns, and object bias, making data augmentation essential. While classical generative models are widely used, they demand extensive computational resources and lengthy training time. Quantum computing offers a promising alternative, but existing quantum-based image generation methods can only yield grayscale low-quality images. Through a novel classical-quantum latent space fusion technique, our work overcomes this limitation and introduces the first classical-quantum generative adversarial network (GAN) capable of generating color medical images. Our model outperforms classical deep convolutional GANs and existing hybrid classical-quantum GANs in both image generation quality and classification performance boost when used as data augmentation. Moreover, the performance boost is comparable with that achieved using state-of-the-art classical generative models, yet with over 25 times fewer parameters and 10 times fewer training epochs. Such results suggest a promising future for quantum image generation as quantum hardware advances. Finally, we demonstrate the robust performance of our model on real IBM quantum machine with hardware noise.
☆ FedSC: Federated Learning with Semantic-Aware Collaboration
Federated learning (FL) aims to train models collaboratively across clients without sharing data for privacy-preserving. However, one major challenge is the data heterogeneity issue, which refers to the biased labeling preferences at multiple clients. A number of existing FL methods attempt to tackle data heterogeneity locally (e.g., regularizing local models) or globally (e.g., fine-tuning global model), often neglecting inherent semantic information contained in each client. To explore the possibility of using intra-client semantically meaningful knowledge in handling data heterogeneity, in this paper, we propose Federated Learning with Semantic-Aware Collaboration (FedSC) to capture client-specific and class-relevant knowledge across heterogeneous clients. The core idea of FedSC is to construct relational prototypes and consistent prototypes at semantic-level, aiming to provide fruitful class underlying knowledge and stable convergence signals in a prototype-wise collaborative way. On the one hand, FedSC introduces an inter-contrastive learning strategy to bring instance-level embeddings closer to relational prototypes with the same semantics and away from distinct classes. On the other hand, FedSC devises consistent prototypes via a discrepancy aggregation manner, as a regularization penalty to constrain the optimization region of the local model. Moreover, a theoretical analysis for FedSC is provided to ensure a convergence guarantee. Experimental results on various challenging scenarios demonstrate the effectiveness of FedSC and the efficiency of crucial components.
comment: 12 pages, KDD 2025
☆ Bridging Video Quality Scoring and Justification via Large Multimodal Models
Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core of the approach lies in the video quality-centric instruction data. Previous explorations mainly focus on the image domain, and their data generation processes heavily rely on human quality annotations and proprietary systems, limiting data scalability and effectiveness. To address these challenges, we propose the Score-based Instruction Generation (SIG) pipeline. Specifically, SIG first scores multiple quality dimensions of an unlabeled video and maps scores to text-defined levels. It then explicitly incorporates a hierarchical Chain-of-Thought (CoT) to model the correlation between specific dimensions and overall quality, mimicking the human visual system's reasoning process. The automated pipeline eliminates the reliance on expert-written quality descriptions and proprietary systems, ensuring data scalability and generation efficiency. To this end, the resulting Score2Instruct (S2I) dataset contains over 320K diverse instruction-response pairs, laying the basis for instruction tuning. Moreover, to advance video LMMs' quality scoring and justification abilities simultaneously, we devise a progressive tuning strategy to fully unleash the power of S2I. Built upon SIG, we further curate a benchmark termed S2I-Bench with 400 open-ended questions to better evaluate the quality justification capacity of video LMMs. Experimental results on the S2I-Bench and existing benchmarks indicate that our method consistently improves quality scoring and justification capabilities across multiple video LMMs.
comment: 15 pages, 4 figures, 8 tables
☆ User-in-the-Loop View Sampling with Error Peaking Visualization
Augmented reality (AR) provides ways to visualize missing view samples for novel view synthesis. Existing approaches present 3D annotations for new view samples and task users with taking images by aligning the AR display. This data collection task is known to be mentally demanding and limits capture areas to pre-defined small areas due to the ideal but restrictive underlying sampling theory. To free users from 3D annotations and limited scene exploration, we propose using locally reconstructed light fields and visualizing errors to be removed by inserting new views. Our results show that the error-peaking visualization is less invasive, reduces disappointment in final results, and is satisfactory with fewer view samples in our mobile view synthesis system. We also show that our approach can contribute to recent radiance field reconstruction for larger scenes, such as 3D Gaussian splatting.
comment: Accepted at IEEE ICIP 2025, Project Page: https://mediated-reality.github.io/projects/yasunaga_icip25/
☆ The Aging Multiverse: Generating Condition-Aware Facial Aging Tree via Training-Free Diffusion
We introduce the Aging Multiverse, a framework for generating multiple plausible facial aging trajectories from a single image, each conditioned on external factors such as environment, health, and lifestyle. Unlike prior methods that model aging as a single deterministic path, our approach creates an aging tree that visualizes diverse futures. To enable this, we propose a training-free diffusion-based method that balances identity preservation, age accuracy, and condition control. Our key contributions include attention mixing to modulate editing strength and a Simulated Aging Regularization strategy to stabilize edits. Extensive experiments and user studies demonstrate state-of-the-art performance across identity preservation, aging realism, and conditional alignment, outperforming existing editing and age-progression models, which often fail to account for one or more of the editing criteria. By transforming aging into a multi-dimensional, controllable, and interpretable process, our approach opens up new creative and practical avenues in digital storytelling, health education, and personalized visualization.
☆ Detection of Breast Cancer Lumpectomy Margin with SAM-incorporated Forward-Forward Contrastive Learning
Complete removal of cancer tumors with a negative specimen margin during lumpectomy is essential in reducing breast cancer recurrence. However, 2D specimen radiography (SR), the current method used to assess intraoperative specimen margin status, has limited accuracy, resulting in nearly a quarter of patients requiring additional surgery. To address this, we propose a novel deep learning framework combining the Segment Anything Model (SAM) with Forward-Forward Contrastive Learning (FFCL), a pre-training strategy leveraging both local and global contrastive learning for patch-level classification of SR images. After annotating SR images with regions of known maligancy, non-malignant tissue, and pathology-confirmed margins, we pre-train a ResNet-18 backbone with FFCL to classify margin status, then reconstruct coarse binary masks to prompt SAM for refined tumor margin segmentation. Our approach achieved an AUC of 0.8455 for margin classification and segmented margins with a 27.4% improvement in Dice similarity over baseline models, while reducing inference time to 47 milliseconds per image. These results demonstrate that FFCL-SAM significantly enhances both the speed and accuracy of intraoperative margin assessment, with strong potential to reduce re-excision rates and improve surgical outcomes in breast cancer treatment. Our code is available at https://github.com/tbwa233/FFCL-SAM/.
comment: 19 pages, 7 figures, 3 tables
☆ VisionGuard: Synergistic Framework for Helmet Violation Detection
Enforcing helmet regulations among motorcyclists is essential for enhancing road safety and ensuring the effectiveness of traffic management systems. However, automatic detection of helmet violations faces significant challenges due to environmental variability, camera angles, and inconsistencies in the data. These factors hinder reliable detection of motorcycles and riders and disrupt consistent object classification. To address these challenges, we propose VisionGuard, a synergistic multi-stage framework designed to overcome the limitations of frame-wise detectors, especially in scenarios with class imbalance and inconsistent annotations. VisionGuard integrates two key components: Adaptive Labeling and Contextual Expander modules. The Adaptive Labeling module is a tracking-based refinement technique that enhances classification consistency by leveraging a tracking algorithm to assign persistent labels across frames and correct misclassifications. The Contextual Expander module improves recall for underrepresented classes by generating virtual bounding boxes with appropriate confidence scores, effectively addressing the impact of data imbalance. Experimental results show that VisionGuard improves overall mAP by 3.1% compared to baseline detectors, demonstrating its effectiveness and potential for real-world deployment in traffic surveillance systems, ultimately promoting safety and regulatory compliance.
☆ Inverse Scene Text Removal
Scene text removal (STR) aims to erase textual elements from images. It was originally intended for removing privacy-sensitiveor undesired texts from natural scene images, but is now also appliedto typographic images. STR typically detects text regions and theninpaints them. Although STR has advanced through neural networksand synthetic data, misuse risks have increased. This paper investi-gates Inverse STR (ISTR), which analyzes STR-processed images andfocuses on binary classification (detecting whether an image has un-dergone STR) and localizing removed text regions. We demonstrate inexperiments that these tasks are achievable with high accuracies, en-abling detection of potential misuse and improving STR. We also at-tempt to recover the removed text content by training a text recognizerto understand its difficulty.
comment: 17 pages
☆ Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology
Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and robustness of detection models. Without additional training, SAIC first selects an appropriate candidate from the abnormal cell bank based on attribute guidance. Then, it employs a high-frequency feature reconstruction to achieve a style-aligned and high-fidelity composition of abnormal cells and pathological backgrounds. Finally, it introduces a large vision-language model to filter high-quality synthesis images. Experimental results demonstrate that incorporating SAIC-synthesized images effectively enhances the performance and robustness of abnormal cell detection for tail categories and styles, thereby improving overall detection performance. The comprehensive quality evaluation further confirms the generalizability and practicality of SAIC in clinical application scenarios. Our code will be released at https://github.com/Joey-Qi/SAIC.
comment: MIDL 2025 Oral
☆ DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting CVPR
Novel view synthesis is a task of generating scenes from unseen perspectives; however, synthesizing dynamic scenes from blurry monocular videos remains an unresolved challenge that has yet to be effectively addressed. Existing novel view synthesis methods are often constrained by their reliance on high-resolution images or strong assumptions about static geometry and rigid scene priors. Consequently, their approaches lack robustness in real-world environments with dynamic object and camera motion, leading to instability and degraded visual fidelity. To address this, we propose Motion-aware Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting (DBMovi-GS), a method designed for dynamic view synthesis from blurry monocular videos. Our model generates dense 3D Gaussians, restoring sharpness from blurry videos and reconstructing detailed 3D geometry of the scene affected by dynamic motion variations. Our model achieves robust performance in novel view synthesis under dynamic blurry scenes and sets a new benchmark in realistic novel view synthesis for blurry monocular video inputs.
comment: CVPRW 2025, Neural Fields Beyond Conventional Cameras
♻ ☆ Learning to Be a Transformer to Pinpoint Anomalies
To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may hinder a method's ability to pinpoint tiny anomalies. We propose a novel Teacher--Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently. The core idea concerns training two shallow MLPs (the Students) by nominal images so as to mimic the mappings between the patch embeddings induced by the self-attention layers of a frozen vision Transformer (the Teacher). Indeed, learning these mappings sets forth a challenging pretext task that small-capacity models are unlikely to accomplish on out-of-distribution data such as anomalous images. Our method can spot anomalies from high-resolution images and runs way faster than competitors, achieving state-of-the-art performance on MVTec AD and the best segmentation results on VisA. We also propose novel evaluation metrics to capture robustness to defect size, i.e., the ability to preserve good localisation from large anomalies to tiny ones. Evaluating our method also by these metrics reveals its neatly superior performance.
comment: Accepted at IEEE Access
♻ ☆ CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation from Arbitrary-Length Sequences ICCV2025
We introduce Canonical Consolidation Fields (CanFields). This novel method interpolates arbitrary-length sequences of independently sampled 3D point clouds into a unified, continuous, and coherent deforming shape. Unlike prior methods that oversmooth geometry or produce topological and geometric artifacts, CanFields optimizes fine-detailed geometry and deformation jointly in an unsupervised fitting with two novel bespoke modules. First, we introduce a dynamic consolidator module that adjusts the input and assigns confidence scores, balancing the optimization of the canonical shape and its motion. Second, we represent the motion as a diffeomorphic flow parameterized by a smooth velocity field. We have validated our robustness and accuracy on more than 50 diverse sequences, demonstrating its superior performance even with missing regions, noisy raw scans, and sparse data. Our project page is at: https://wangmiaowei.github.io/CanFields.github.io/.
comment: ICCV2025 Accepted
♻ ☆ SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
With the rapid advancement of autonomous driving technology, a lack of data has become a major obstacle to enhancing perception model accuracy. Researchers are now exploring controllable data generation using world models to diversify datasets. However, previous work has been limited to studying image generation quality on specific public datasets. There is still relatively little research on how to build data generation engines for real-world application scenes to achieve large-scale data generation for challenging scenes. In this paper, a simulator-conditioned scene generation engine based on world model is proposed. By constructing a simulation system consistent with real-world scenes, simulation data and labels, which serve as the conditions for data generation in the world model, for any scenes can be collected. It is a novel data generation pipeline by combining the powerful scene simulation capabilities of the simulation engine with the robust data generation capabilities of the world model. In addition, a benchmark with proportionally constructed virtual and real data, is provided for exploring the capabilities of world models in real-world scenes. Quantitative results show that these generated images significantly improve downstream perception models performance. Finally, we explored the generative performance of the world model in urban autonomous driving scenarios. All the data and code will be available at https://github.com/Li-Zn-H/SimWorld.
comment: 8 pages, 4 figures
♻ ☆ Chain-of-Sketch: Enabling Global Visual Reasoning
Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features do not provide significant information. Minsky and Papert put forward such tasks in 1969 with their connectivity study, exposing the limitations of the perceptron model. In this paper, we introduce an expanded set of global visual datasets involving graphs, strings, mazes, and image grids. We show that large vision models still struggle to learn these tasks efficiently. Similarly, state-of-the-art multi-modal LLMs perform poorly on these datasets. We explain this learning inefficiency by means of the 'globality degree' measure. To mitigate this, we propose a method called chain-of-sketch (CoS). Similar to the chain-of-thought and scratchpad techniques used in language models, CoS breaks the original task into intermediate visual steps to help learn a complex task. In addition, we show that not all CoS strategies perform equally well. Our key insight is to impose a Markovian structure on the CoS frames. This leads to the introduction of 'inductive CoS' which achieves better out-of-distribution generalization and performs well even with smaller models compared to non-inductive variants.
comment: additional experiments added, title changed from "Visual Scratchpads: Enabling Global Reasoning in Vision" to "Chain-of-Sketch: Enabling Global Visual Reasoning"
♻ ☆ QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning ICCV 2025
The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.
comment: ICCV 2025. Code is available at https://github.com/hatchetProject/QuEST
♻ ☆ AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration ICCV 2025
We present AnyCalib, a method for calibrating the intrinsic parameters of a camera from a single in-the-wild image, that is agnostic to the camera model. Current methods are predominantly tailored to specific camera models and/or require extrinsic cues, such as the direction of gravity, to be visible in the image. In contrast, we argue that the perspective and distortion cues inherent in images are sufficient for model-agnostic camera calibration. To demonstrate this, we frame the calibration process as the regression of the rays corresponding to each pixel. We show, for the first time, that this intermediate representation allows for a closed-form recovery of the intrinsics for a wide range of camera models, including but not limited to: pinhole, Brown-Conrady and Kannala-Brandt. Our approach also applies to edited -- cropped and stretched -- images. Experimentally, we demonstrate that AnyCalib consistently outperforms alternative methods, including 3D foundation models, despite being trained on orders of magnitude less data. Code is available at https://github.com/javrtg/AnyCalib.
comment: Accepted to ICCV 2025
♻ ☆ EgoM2P: Egocentric Multimodal Multitask Pretraining ICCV 2025
Understanding multimodal signals in egocentric vision, such as RGB video, depth, camera poses, and gaze, is essential for applications in augmented reality, robotics, and human-computer interaction, enabling systems to better interpret the camera wearer's actions, intentions, and surrounding environment. However, building large-scale egocentric multimodal and multitask models presents unique challenges. Egocentric data are inherently heterogeneous, with large variations in modality coverage across devices and settings. Generating pseudo-labels for missing modalities, such as gaze or head-mounted camera trajectories, is often infeasible, making standard supervised learning approaches difficult to scale. Furthermore, dynamic camera motion and the complex temporal and spatial structure of first-person video pose additional challenges for the direct application of existing multimodal foundation models. To address these challenges, we introduce a set of efficient temporal tokenizers and propose EgoM2P, a masked modeling framework that learns from temporally-aware multimodal tokens to train a large, general-purpose model for egocentric 4D understanding. This unified design supports multitasking across diverse egocentric perception and synthesis tasks, including gaze prediction, egocentric camera tracking, and monocular depth estimation from egocentric video, and also serves as a generative model for conditional egocentric video synthesis. Across these tasks, EgoM2P matches or outperforms specialist models while being an order of magnitude faster. We will fully open-source EgoM2P to support the community and advance egocentric vision research. Project page: https://egom2p.github.io/.
comment: Accepted by ICCV 2025
♻ ☆ Fake it till You Make it: Reward Modeling as Discriminative Prediction
An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models. However, current approaches of reward modeling suffer from implementation complexity due to their reliance on extensive human-annotated preference data or meticulously engineered quality dimensions that are often incomplete and engineering-intensive. Inspired by adversarial training in generative adversarial networks (GANs), this paper proposes GAN-RM, an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering. Our method trains the reward model through discrimination between a small set of representative, unpaired target samples(denoted as Preference Proxy Data) and model-generated ordinary outputs, requiring only a few hundred target samples. Comprehensive experiments demonstrate our GAN-RM's effectiveness across multiple key applications including test-time scaling implemented as Best-of-N sample filtering, post-training approaches like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Code and data will be released at https://github.com/Visualignment/GAN-RM.
♻ ☆ Materialist: Physically Based Editing Using Single-Image Inverse Rendering
Achieving physically consistent image editing remains a significant challenge in computer vision. Existing image editing methods typically rely on neural networks, which struggle to accurately handle shadows and refractions. Conversely, physics-based inverse rendering often requires multi-view optimization, limiting its practicality in single-image scenarios. In this paper, we propose Materialist, a method combining a learning-based approach with physically based progressive differentiable rendering. Given an image, our method leverages neural networks to predict initial material properties. Progressive differentiable rendering is then used to optimize the environment map and refine the material properties with the goal of closely matching the rendered result to the input image. Our approach enables a range of applications, including material editing, object insertion, and relighting, while also introducing an effective method for editing material transparency without requiring full scene geometry. Furthermore, Our envmap estimation method also achieves state-of-the-art performance, further enhancing the accuracy of image editing task. Experiments demonstrate strong performance across synthetic and real-world datasets, excelling even on challenging out-of-domain images. Project website: https://lez-s.github.io/materialist_project/
comment: Add acknowledgements, more authors and more results. Project website: https://lez-s.github.io/materialist_project/
♻ ☆ DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection ICCV 2025
Out-of-distribution (OOD) detection holds significant importance across many applications. While semantic and domain-shift OOD problems are well-studied, this work focuses on covariate shifts - subtle variations in the data distribution that can degrade machine learning performance. We hypothesize that detecting these subtle shifts can improve our understanding of in-distribution boundaries, ultimately improving OOD detection. In adversarial discriminators trained with Batch Normalization (BN), real and adversarial samples form distinct domains with unique batch statistics - a property we exploit for OOD detection. We introduce DisCoPatch, an unsupervised Adversarial Variational Autoencoder (VAE) framework that harnesses this mechanism. During inference, batches consist of patches from the same image, ensuring a consistent data distribution that allows the model to rely on batch statistics. DisCoPatch uses the VAE's suboptimal outputs (generated and reconstructed) as negative samples to train the discriminator, thereby improving its ability to delineate the boundary between in-distribution samples and covariate shifts. By tightening this boundary, DisCoPatch achieves state-of-the-art results in public OOD detection benchmarks. The proposed model not only excels in detecting covariate shifts, achieving 95.5% AUROC on ImageNet-1K(-C) but also outperforms all prior methods on public Near-OOD (95.0%) benchmarks. With a compact model size of 25MB, it achieves high OOD detection performance at notably lower latency than existing methods, making it an efficient and practical solution for real-world OOD detection applications. The code is publicly available.
comment: ICCV 2025
♻ ☆ Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling ICCV 2025
Masked Image Modeling (MIM) has become an essential method for building foundational visual models in remote sensing (RS). However, the limitations in size and diversity of existing RS datasets restrict the ability of MIM methods to learn generalizable representations. Additionally, conventional MIM techniques, which require reconstructing all tokens, introduce unnecessary computational overhead. To address these issues, we present a new pre-training pipeline for RS models, featuring the creation of a large-scale RS dataset and an efficient MIM approach. We curated a high-quality dataset named \textbf{OpticalRS-13M} by collecting publicly available RS datasets and processing them through exclusion, slicing, and deduplication. OpticalRS-13M comprises 13 million optical images covering various RS tasks, such as object detection and pixel segmentation. To enhance efficiency, we propose \textbf{SelectiveMAE}, a pre-training method that dynamically encodes and reconstructs semantically rich patch tokens, thereby reducing the inefficiencies of traditional MIM models caused by redundant background pixels in RS images. Extensive experiments show that OpticalRS-13M significantly improves classification, detection, and segmentation performance, while SelectiveMAE increases training efficiency over 2$\times$ times. This highlights the effectiveness and scalability of our pipeline in developing RS foundational models. The dataset, source code, and trained models will be released at https://github.com/MiliLab/SelectiveMAE.
comment: ICCV 2025
♻ ☆ OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts. However, rapid T2I model advancements reveal limitations in early benchmarks, lacking comprehensive evaluations, for example, the evaluation on reasoning, text rendering and style. Notably, recent state-of-the-art models, with their rich knowledge modeling capabilities, show promising results on the image generation problems requiring strong reasoning ability, yet existing evaluation systems have not adequately addressed this frontier. To systematically address these gaps, we introduce OneIG-Bench, a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including prompt-image alignment, text rendering precision, reasoning-generated content, stylization, and diversity. By structuring the evaluation, this benchmark enables in-depth analysis of model performance, helping researchers and practitioners pinpoint strengths and bottlenecks in the full pipeline of image generation. Specifically, OneIG-Bench enables flexible evaluation by allowing users to focus on a particular evaluation subset. Instead of generating images for the entire set of prompts, users can generate images only for the prompts associated with the selected dimension and complete the corresponding evaluation accordingly. Our codebase and dataset are now publicly available to facilitate reproducible evaluation studies and cross-model comparisons within the T2I research community.
♻ ☆ Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation, where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction. We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion. Project page is available at https://cvlab-kaist.github.io/MoAI.
comment: Project page at https://cvlab-kaist.github.io/MoAI
♻ ☆ STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
The use of Multimodal Large Language Models (MLLMs) as an end-to-end solution for Embodied AI and Autonomous Driving has become a prevailing trend. While MLLMs have been extensively studied for visual semantic understanding tasks, their ability to perform precise and quantitative spatial-temporal understanding in real-world applications remains largely unexamined, leading to uncertain prospects. To evaluate models' Spatial-Temporal Intelligence, we introduce STI-Bench, a benchmark designed to evaluate MLLMs' spatial-temporal understanding through challenging tasks such as estimating and predicting the appearance, pose, displacement, and motion of objects. Our benchmark encompasses a wide range of robot and vehicle operations across desktop, indoor, and outdoor scenarios. The extensive experiments reveals that the state-of-the-art MLLMs still struggle in real-world spatial-temporal understanding, especially in tasks requiring precise distance estimation and motion analysis.
♻ ☆ Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception IROS 2025
Deep object pose estimators are notoriously overconfident. A grasping agent that both estimates the 6-DoF pose of a target object and predicts the uncertainty of its own estimate could avoid task failure by choosing not to act under high uncertainty. Even though object pose estimation improves and uncertainty quantification research continues to make strides, few studies have connected them to the downstream task of robotic grasping. We propose a method for training lightweight, deep networks to predict whether a grasp guided by an image-based pose estimate will succeed before that grasp is attempted. We generate training data for our networks via object pose estimation on real images and simulated grasping. We also find that, despite high object variability in grasping trials, networks benefit from training on all objects jointly, suggesting that a diverse variety of objects can nevertheless contribute to the same goal.
comment: Accepted to IROS 2025
♻ ☆ Tackling fluffy clouds: robust field boundary delineation across global agricultural landscapes with Sentinel-1 and Sentinel-2 Time Series
Accurate delineation of agricultural field boundaries is essential for effective crop monitoring and resource management. However, competing methodologies often face significant challenges, particularly in their reliance on extensive manual efforts for cloud-free data curation and limited adaptability to diverse global conditions. In this paper, we introduce PTAViT3D, a deep learning architecture specifically designed for processing three-dimensional time series of satellite imagery from either Sentinel-1 (S1) or Sentinel-2 (S2). Additionally, we present PTAViT3D-CA, an extension of the PTAViT3D model incorporating cross-attention mechanisms to fuse S1 and S2 datasets, enhancing robustness in cloud-contaminated scenarios. The proposed methods leverage spatio-temporal correlations through a memory-efficient 3D Vision Transformer architecture, facilitating accurate boundary delineation directly from raw, cloud-contaminated imagery. We comprehensively validate our models through extensive testing on various datasets, including Australia's ePaddocks - CSIRO's national agricultural field boundary product - alongside public benchmarks Fields-of-the-World, PASTIS, and AI4SmallFarms. Our results consistently demonstrate state-of-the-art performance, highlighting excellent global transferability and robustness. Crucially, our approach significantly simplifies data preparation workflows by reliably processing cloud-affected imagery, thereby offering strong adaptability across diverse agricultural environments. Our code and models are publicly available at https://github.com/feevos/tfcl.
comment: revision 1, under review
♻ ☆ Mr. DETR++: Instructive Multi-Route Training for Detection Transformers with Mixture-of-Experts CVPR 2025
Existing methods enhance the training of detection transformers by incorporating an auxiliary one-to-many assignment. In this work, we treat the model as a multi-task framework, simultaneously performing one-to-one and one-to-many predictions. We investigate the roles of each component in the transformer decoder across these two training targets, including self-attention, cross-attention, and feed-forward network. Our empirical results demonstrate that any independent component in the decoder can effectively learn both targets simultaneously, even when other components are shared. This finding leads us to propose a multi-route training mechanism, featuring a primary route for one-to-one prediction and two auxiliary training routes for one-to-many prediction. We propose a novel instructive self-attention mechanism, integrated into the first auxiliary route, which dynamically and flexibly guides object queries for one-to-many prediction. For the second auxiliary route, we introduce a route-aware Mixture-of-Experts (MoE) to facilitate knowledge sharing while mitigating potential conflicts between routes. Additionally, we apply an MoE to low-scale features in the encoder, optimizing the balance between efficiency and effectiveness. The auxiliary routes are discarded during inference. We conduct extensive experiments across various object detection baselines, achieving consistent improvements as demonstrated in Fig. 1. Our method is highly flexible and can be readily adapted to other tasks. To demonstrate its versatility, we conduct experiments on both instance segmentation and panoptic segmentation, further validating its effectiveness. Project page: https://visual-ai.github.io/mrdetr/
comment: Under review. Extended version of our CVPR 2025 paper, see arXiv:2412.10028v3
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks
Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
♻ ☆ Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
While the human visual system employs distinct mechanisms to perceive salient and camouflaged objects, existing models struggle to disentangle these tasks. Specifically, salient object detection (SOD) models frequently misclassify camouflaged objects as salient, while camouflaged object detection (COD) models conversely misinterpret salient objects as camouflaged. We hypothesize that this can be attributed to two factors: (i) the specific annotation paradigm of current SOD and COD datasets, and (ii) the lack of explicit attribute relationship modeling in current models. Prevalent SOD/COD datasets enforce a mutual exclusivity constraint, assuming scenes contain either salient or camouflaged objects, which poorly aligns with the real world. Furthermore, current SOD/COD methods are primarily designed for these highly constrained datasets and lack explicit modeling of the relationship between salient and camouflaged objects. In this paper, to promote the development of unconstrained salient and camouflaged object detection, we construct a large-scale dataset, USC12K, which features comprehensive labels and four different scenes that cover all possible logical existence scenarios of both salient and camouflaged objects. To explicitly model the relationship between salient and camouflaged objects, we propose a model called USCNet, which introduces two distinct prompt query mechanisms for modeling inter-sample and intra-sample attribute relationships. Additionally, to assess the model's ability to distinguish between salient and camouflaged objects, we design an evaluation metric called CSCS. The proposed method achieves state-of-the-art performance across all scenes in various metrics. The code and dataset will be available at https://github.com/ssecv/USCNet.
comment: 18 pages, 11 figures
♻ ☆ Recall and Refine: A Simple but Effective Source-free Open-set Domain Adaptation Framework
Open-set Domain Adaptation (OSDA) aims to adapt a model from a labeled source domain to an unlabeled target domain, where novel classes - also referred to as target-private unknown classes - are present. Source-free Open-set Domain Adaptation (SF-OSDA) methods address OSDA without accessing labeled source data, making them particularly relevant under privacy constraints. However, SF-OSDA presents significant challenges due to distribution shifts and the introduction of novel classes. Existing SF-OSDA methods typically rely on thresholding the prediction entropy of a sample to identify it as either a known or unknown class, but fail to explicitly learn discriminative features for the target-private unknown classes. We propose Recall and Refine (RRDA), a novel SF-OSDA framework designed to address these limitations by explicitly learning features for target-private unknown classes. RRDA employs a two-stage process. First, we enhance the model's capacity to recognize unknown classes by training a target classifier with an additional decision boundary,guided by synthetic samples generated from target domain features. This enables the classifier to effectively separate known and unknown classes. Second, we adapt the entire model to the target domain, addressing both domain shifts and distinguishability to unknown classes. Any off-the-shelf source-free domain adaptation method (e.g. SHOT, AaD) can be seamlessly integrated into our framework at this stage. Extensive experiments on three benchmark datasets demonstrate that RRDA significantly outperforms existing SF-OSDA and OSDA methods.
comment: Accepted at TMLR 2025
♻ ☆ Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Finding correspondences between semantically similar points across images and object instances is one of the everlasting challenges in computer vision. While large pre-trained vision models have recently been demonstrated as effective priors for semantic matching, they still suffer from ambiguities for symmetric objects or repeated object parts. We propose to improve semantic correspondence estimation via 3D-aware pseudo-labeling. Specifically, we train an adapter to refine off-the-shelf features using pseudo-labels obtained via 3D-aware chaining, filtering wrong labels through relaxed cyclic consistency, and 3D spherical prototype mapping constraints. While reducing the need for dataset specific annotations compared to prior work, we set a new state-of-the-art on SPair-71k by over 4% absolute gain and by over 7% against methods with similar supervision requirements. The generality of our proposed approach simplifies extension of training to other data sources, which we demonstrate in our experiments.
comment: Project page: https://genintel.github.io/DIY-SC
♻ ☆ Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance
Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.
♻ ☆ Enhancing Dynamic CT Image Reconstruction with Neural Fields and Optical Flow
In this paper, we investigate image reconstruction for dynamic Computed Tomography. The motion of the target with respect to the measurement acquisition rate leads to highly resolved in time but highly undersampled in space measurements. Such problems pose a major challenge: not accounting for the dynamics of the process leads to a poor reconstruction with non-realistic motion. Variational approaches that penalize time evolution have been proposed to relate subsequent frames and improve image quality based on classical grid-based discretizations. Neural fields have emerged as a novel way to parameterize the quantity of interest using a neural network with a low-dimensional input, benefiting from being lightweight, continuous, and biased towards smooth representations. The latter property has been exploited when solving dynamic inverse problems with neural fields by minimizing a data-fidelity term only. We investigate and show the benefits of introducing explicit motion regularizers for dynamic inverse problems based on partial differential equations, namely, the optical flow equation, for the optimization of neural fields. We compare it against its unregularized counterpart and show the improvements in the reconstruction. We also compare neural fields against a grid-based solver and show that the former outperforms the latter in terms of PSNR in this task.
♻ ☆ TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper, we propose TCDiff++, a music-driven end-to-end framework designed to generate harmonious group dance. Specifically, to mitigate multi-dancer collisions, we utilize a dancer positioning embedding to better maintain the relative positioning among dancers. Additionally, we incorporate a distance-consistency loss to ensure that inter-dancer distances remain within plausible ranges. To address the issue of single-dancer foot sliding, we introduce a swap mode embedding to indicate dancer swapping patterns and design a Footwork Adaptor to refine raw motion, thereby minimizing foot sliding. For long group dance generation, we present a long group diffusion sampling strategy that reduces abrupt position shifts by injecting positional information into the noisy input. Furthermore, we integrate a Sequence Decoder layer to enhance the model's ability to selectively process long sequences. Extensive experiments demonstrate that our TCDiff++ achieves state-of-the-art performance, particularly in long-duration scenarios, ensuring high-quality and coherent group dance generation.
♻ ☆ 3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors IROS 2025
Crop yield estimation is a relevant problem in agriculture, because an accurate yield estimate can support farmers' decisions on harvesting or precision intervention. Robots can help to automate this process. To do so, they need to be able to perceive the surrounding environment to identify target objects such as trees and plants. In this paper, we introduce a novel approach to address the problem of hierarchical panoptic segmentation of apple orchards on 3D data from different sensors. Our approach is able to simultaneously provide semantic segmentation, instance segmentation of trunks and fruits, and instance segmentation of trees (a trunk with its fruits). This allows us to identify relevant information such as individual plants, fruits, and trunks, and capture the relationship among them, such as precisely estimate the number of fruits associated to each tree in an orchard. To efficiently evaluate our approach for hierarchical panoptic segmentation, we provide a dataset designed specifically for this task. Our dataset is recorded in Bonn, Germany, in a real apple orchard with a variety of sensors, spanning from a terrestrial laser scanner to a RGB-D camera mounted on different robots platforms. The experiments show that our approach surpasses state-of-the-art approaches in 3D panoptic segmentation in the agricultural domain, while also providing full hierarchical panoptic segmentation. Our dataset is publicly available at https://www.ipb.uni-bonn.de/data/hops/. The open-source implementation of our approach is available at https://github.com/PRBonn/hapt3D.
comment: Accepted to IROS 2025
♻ ☆ Cell Tracking according to Biological Needs -- Strong Mitosis-aware Multi-Hypothesis Tracker with Aleatoric Uncertainty
Cell tracking and segmentation assist biologists in extracting insights from large-scale microscopy time-lapse data. Driven by local accuracy metrics, current tracking approaches often suffer from a lack of long-term consistency and the ability to reconstruct lineage trees correctly. To address this issue, we introduce an uncertainty estimation technique for motion estimation frameworks and extend the multi-hypothesis tracking framework. Our uncertainty estimation lifts motion representations into probabilistic spatial densities using problem-specific test-time augmentations. Moreover, we introduce a novel mitosis-aware assignment problem formulation that allows multi-hypothesis trackers to model cell splits and to resolve false associations and mitosis detections based on long-term conflicts. In our framework, explicit biological knowledge is modeled in assignment costs. We evaluate our approach on nine competitive datasets and demonstrate that we outperform the current state-of-the-art on biologically inspired metrics substantially, achieving improvements by a factor of approximately 6 and uncover new insights into the behavior of motion estimation uncertainty.
comment: 13 pages, 4 figures, 4 tables. This work has been accepted to the IEEE for publication
♻ ☆ SA-Person: Text-Based Person Retrieval with Scene-aware Re-ranking
Text-based person retrieval aims to identify a target individual from a gallery of images based on a natural language description. It presents a significant challenge due to the complexity of real-world scenes and the ambiguity of appearance-related descriptions. Existing methods primarily emphasize appearance-based cross-modal retrieval, often neglecting the contextual information embedded within the scene, which can offer valuable complementary insights for retrieval. To address this, we introduce SCENEPERSON-13W, a large-scale dataset featuring over 100,000 scenes with rich annotations covering both pedestrian appearance and environmental cues. Based on this, we propose SA-Person, a two-stage retrieval framework. In the first stage, it performs discriminative appearance grounding by aligning textual cues with pedestrian-specific regions. In the second stage, it introduces SceneRanker, a training-free, scene-aware re-ranking method leveraging multimodal large language models to jointly reason over pedestrian appearance and the global scene context. Experiments on SCENEPERSON-13W validate the effectiveness of our framework in challenging scene-level retrieval scenarios. The code and dataset will be made publicly available.
comment: 22 pages, 7 figures. Under review
♻ ☆ Variational Supervised Contrastive Learning
Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.
♻ ☆ Structure-Preserving Patch Decoding for Efficient Neural Video Representation
Implicit neural representations (INRs) are the subject of extensive research, particularly in their application to modeling complex signals by mapping spatial and temporal coordinates to corresponding values. When handling videos, mapping compact inputs to entire frames or spatially partitioned patch images is an effective approach. This strategy better preserves spatial relationships, reduces computational overhead, and improves reconstruction quality compared to coordinate-based mapping. However, predicting entire frames often limits the reconstruction of high-frequency visual details. Additionally, conventional patch-based approaches based on uniform spatial partitioning tend to introduce boundary discontinuities that degrade spatial coherence. We propose a neural video representation method based on Structure-Preserving Patches (SPPs) to address such limitations. Our method separates each video frame into patch images of spatially aligned frames through a deterministic pixel-based splitting similar to PixelUnshuffle. This operation preserves the global spatial structure while allowing patch-level decoding. We train the decoder to reconstruct these structured patches, enabling a global-to-local decoding strategy that captures the global layout first and refines local details. This effectively reduces boundary artifacts and mitigates distortions from naive upsampling. Experiments on standard video datasets demonstrate that our method achieves higher reconstruction quality and better compression performance than existing INR-based baselines.
♻ ☆ StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
World models have recently become promising tools for predicting realistic visuals based on actions in complex environments. However, their reliance on only a few recent observations leads them to lose track of the long-term context. Consequently, in just a few steps the generated scenes drift from what was previously observed, undermining the temporal coherence of the sequence. This limitation of the state-of-the-art world models, most of which rely on diffusion, comes from their lack of a lasting environment state. To address this problem, we introduce StateSpaceDiffuser, where a diffusion model is enabled to perform long-context tasks by integrating features from a state-space model, representing the entire interaction history. This design restores long-term memory while preserving the high-fidelity synthesis of diffusion models. To rigorously measure temporal consistency, we develop an evaluation protocol that probes a model's ability to reinstantiate seen content in extended rollouts. Comprehensive experiments show that StateSpaceDiffuser significantly outperforms a strong diffusion-only baseline, maintaining a coherent visual context for an order of magnitude more steps. It delivers consistent views in both a 2D maze navigation and a complex 3D environment. These results establish that bringing state-space representations into diffusion models is highly effective in demonstrating both visual details and long-term memory.
♻ ☆ Moderating the Generalization of Score-based Generative Model
Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i.e., re-training the model after removing the undesirable training data, and find it does not work in SGMs. Further analysis of score functions reveals that the MU 'gold standard' does not alter the original score function, which explains its ineffectiveness. Based on this insight, we propose the first Moderated Score-based Generative Model (MSGM), which introduces a novel score adjustment strategy that redirects the score function away from undesirable data during the continuous-time stochastic differential equation process. Extensive experimental results demonstrate that MSGM significantly reduces the likelihood of generating undesirable content while preserving high visual quality for normal image generation. Albeit designed for SGMs, MSGM is a general and flexible MU framework that is compatible with diverse diffusion architectures (SGM and DDPM) and training strategies (re-training and fine-tuning), and enables zero-shot transfer of the pre-trained models to downstream tasks, e.g. image inpainting and reconstruction. The code will be shared upon acceptance.
♻ ☆ Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model's exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy Optimization variant) to incentivize and activate the model's latent reasoning capacity. Subsequently, the targeted SFT stage addresses two key challenges identified during RL: (1) \textit{inefficient trajectory sampling} for tasks where the model possesses but inconsistently applies correct reasoning, which we tackle using self-distilled reasoning trajectories from the RL model itself; and (2) \textit{fundamental capability absence}, which we address by injecting expert-augmented knowledge for prompts where the model entirely fails. This strategic application of RL for incentivization followed by SFT for enhancement forms the core of Metis-RISE, leading to two versions of our MLLMs (7B and 72B parameters). Evaluations on the OpenCompass Multimodal Reasoning Leaderboard demonstrate that both models achieve state-of-the-art performance among similar-sized models, with the 72B version ranking fourth overall. Please refer to our project page for open-source information.
comment: Project Page: https://github.com/MM-Thinking/Metis-RISE
♻ ☆ Self-Regulated Neurogenesis for Online Data-Incremental Learning
Neural networks often struggle with catastrophic forgetting when learning sequences of tasks or data streams, unlike humans who can continuously learn and consolidate new concepts even in the absence of explicit cues. Online data-incremental learning seeks to emulate this capability by processing each sample only once, without having access to task or stream cues at any point in time since this is more realistic compared to offline setups, where all data from novel class(es) is assumed to be readily available. However, existing methods typically rely on storing the subsets of data in memory or expanding the initial model architecture, resulting in significant computational overhead. Drawing inspiration from 'self-regulated neurogenesis'-brain's mechanism for creating specialized regions or circuits for distinct functions-we propose a novel approach SERENA which encodes each concept in a specialized network path called 'concept cell', integrated into a single over-parameterized network. Once a concept is learned, its corresponding concept cell is frozen, effectively preventing the forgetting of previously acquired information. Furthermore, we introduce two new continual learning scenarios that more closely reflect real-world conditions, characterized by gradually changing sample sizes. Experimental results show that our method not only establishes new state-of-the-art results across ten benchmarks but also remarkably surpasses offline supervised batch learning performance. The code is available at https://github.com/muratonuryildirim/serena.
comment: Published at Conference on Lifelong Learning Agents (CoLLAs) 2025
♻ ☆ Referring Expression Instance Retrieval and A Strong End-to-End Baseline
Using natural language to query visual information is a fundamental need in real-world applications. Text-Image Retrieval (TIR) retrieves a target image from a gallery based on an image-level description, while Referring Expression Comprehension (REC) localizes a target object within a given image using an instance-level description. However, real-world applications often present more complex demands. Users typically query an instance-level description across a large gallery and expect to receive both relevant image and the corresponding instance location. In such scenarios, TIR struggles with fine-grained descriptions and object-level localization, while REC is limited in its ability to efficiently search large galleries and lacks an effective ranking mechanism. In this paper, we introduce a new task called \textbf{Referring Expression Instance Retrieval (REIR)}, which supports both instance-level retrieval and localization based on fine-grained referring expressions. First, we propose a large-scale benchmark for REIR, named REIRCOCO, constructed by prompting advanced vision-language models to generate high-quality referring expressions for instances in the MSCOCO and RefCOCO datasets. Second, we present a baseline method, Contrastive Language-Instance Alignment with Relation Experts (CLARE), which employs a dual-stream architecture to address REIR in an end-to-end manner. Given a referring expression, the textual branch encodes it into a query embedding. The visual branch detects candidate objects and extracts their instance-level visual features. The most similar candidate to the query is selected for bounding box prediction. CLARE is first trained on object detection and REC datasets to establish initial grounding capabilities, then optimized via Contrastive Language-Instance Alignment (CLIA) for improved retrieval across images. We will release our code and benchmark publicly.
♻ ☆ ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object Detection IROS 2025
Vision-based Bird's-Eye-View (BEV) 3D object detection has recently become popular in autonomous driving. However, objects with a high similarity to the background from a camera perspective cannot be detected well by existing methods. In this paper, we propose a BEV-based 3D Object Detection Network with 2D Region-Oriented Attention (ROA-BEV), which enables the backbone to focus more on feature learning of the regions where objects exist. Moreover, our method further enhances the information feature learning ability of ROA through multi-scale structures. Each block of ROA utilizes a large kernel to ensure that the receptive field is large enough to catch information about large objects. Experiments on nuScenes show that ROA-BEV improves the performance based on BEVDepth. The source codes of this work will be available at https://github.com/DFLyan/ROA-BEV.
comment: accepted by IROS 2025
♻ ☆ Is my Data in your AI Model? Membership Inference Test with Application to Face Images
This article introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if given data was used during the training of AI/ML models. Specifically, we propose two MINT architectures designed to learn the distinct activation patterns that emerge when an Audited Model is exposed to data used during its training process. These architectures are based on Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). The experimental framework focuses on the challenging task of Face Recognition, considering three state-of-the-art Face Recognition systems. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Different experimental scenarios are considered depending on the context of the AI model to test. Our proposed MINT approach achieves promising results, with up to 90\% accuracy, indicating the potential to recognize if an AI model has been trained with specific data. The proposed MINT approach can serve to enforce privacy and fairness in several AI applications, e.g., revealing if sensitive or private data was used for training or tuning Large Language Models (LLMs).
comment: 26 pages main text and 2 pages appendix
♻ ☆ HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis
Pathology is essential for cancer diagnosis, with multiple instance learning (MIL) widely used for whole slide image (WSI) analysis. WSIs exhibit a natural hierarchy -- patches, regions, and slides -- with distinct semantic associations. While some methods attempt to leverage this hierarchy for improved representation, they predominantly rely on Euclidean embeddings, which struggle to fully capture semantic hierarchies. To address this limitation, we propose HyperPath, a novel method that integrates knowledge from textual descriptions to guide the modeling of semantic hierarchies of WSIs in hyperbolic space, thereby enhancing WSI classification. Our approach adapts both visual and textual features extracted by pathology vision-language foundation models to the hyperbolic space. We design an Angular Modality Alignment Loss to ensure robust cross-modal alignment, while a Semantic Hierarchy Consistency Loss further refines feature hierarchies through entailment and contradiction relationships and thus enhance semantic coherence. The classification is performed with geodesic distance, which measures the similarity between entities in the hyperbolic semantic hierarchy. This eliminates the need for linear classifiers and enables a geometry-aware approach to WSI analysis. Extensive experiments show that our method achieves superior performance across tasks compared to existing methods, highlighting the potential of hyperbolic embeddings for WSI analysis.
♻ ☆ HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics ICCV 2025
Long-form video understanding presents unique challenges that extend beyond traditional short-video analysis approaches, particularly in capturing long-range dependencies, processing redundant information efficiently, and extracting high-level semantic concepts. To address these challenges, we propose a novel approach that more accurately reflects human cognition. This paper introduces HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics, featuring two versatile modules that can enhance existing video-language models or operate as a standalone system. Our Episodic COmpressor (ECO) efficiently aggregates representations from micro to semi-macro levels, reducing computational overhead while preserving temporal dependencies. Our Semantics ReTRiever (SeTR) enriches these representations with semantic information by focusing on broader context, dramatically reducing feature dimensionality while preserving relevant macro-level information. We demonstrate that these modules can be seamlessly integrated into existing SOTA models, consistently improving their performance while reducing inference latency by up to 43% and memory usage by 46%. As a standalone system, HERMES achieves state-of-the-art performance across multiple long-video understanding benchmarks in both zero-shot and fully-supervised settings.
comment: Accepted for ICCV 2025. Project page: https://joslefaure.github.io/assets/html/hermes.html
♻ ☆ ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring ICCV 2025
Motion deblurring addresses the challenge of image blur caused by camera or scene movement. Event cameras provide motion information that is encoded in the asynchronous event streams. To efficiently leverage the temporal information of event streams, we employ Spiking Neural Networks (SNNs) for motion feature extraction and Artificial Neural Networks (ANNs) for color information processing. Due to the non-uniform distribution and inherent redundancy of event data, existing cross-modal feature fusion methods exhibit certain limitations. Inspired by the visual attention mechanism in the human visual system, this study introduces a bioinspired dual-drive hybrid network (BDHNet). Specifically, the Neuron Configurator Module (NCM) is designed to dynamically adjusts neuron configurations based on cross-modal features, thereby focusing the spikes in blurry regions and adapting to varying blurry scenarios dynamically. Additionally, the Region of Blurry Attention Module (RBAM) is introduced to generate a blurry mask in an unsupervised manner, effectively extracting motion clues from the event features and guiding more accurate cross-modal feature fusion. Extensive subjective and objective evaluations demonstrate that our method outperforms current state-of-the-art methods on both synthetic and real-world datasets.
comment: Accepted by ICCV 2025
♻ ☆ ToMiE: Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
In this paper, we highlight a critical yet often overlooked factor in most 3D human tasks, namely modeling complicated 3D human with with hand-held objects or loose-fitting clothing. It is known that the parameterized formulation of SMPL is able to fit human skin; while hand-held objects and loose-fitting clothing, are difficult to get modeled within the unified framework, since their movements are usually decoupled with the human body. To enhance the capability of SMPL skeleton in response to this situation, we propose a growth strategy that enables the joint tree of the skeleton to expand adaptively. Specifically, our method, called ToMiE, consists of parent joints localization and external joints optimization. For parent joints localization, we employ a gradient-based approach guided by both LBS blending weights and motion kernels. Once the external joints are obtained, we proceed to optimize their transformations in SE(3) across different frames, enabling rendering and explicit animation. ToMiE manages to outperform other methods across various cases with hand-held objects and loose-fitting clothing, not only in rendering quality but also by offering free animation of grown joints, thereby enhancing the expressive ability of SMPL skeleton for a broader range of applications.
♻ ☆ RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS ICCV 2025
3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances. To address this, we propose RobustSplat, a robust solution based on two critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method. Our project page is https://fcyycf.github.io/RobustSplat/.
comment: ICCV 2025. Project page: https://fcyycf.github.io/RobustSplat/
♻ ☆ 2D Triangle Splatting for Direct Differentiable Mesh Training
Differentiable rendering with 3D Gaussian primitives has emerged as a powerful method for reconstructing high-fidelity 3D scenes from multi-view images. While it offers improvements over NeRF-based methods, this representation still encounters challenges with rendering speed and advanced rendering effects, such as relighting and shadow rendering, compared to mesh-based models. In this paper, we propose 2D Triangle Splatting (2DTS), a novel method that replaces 3D Gaussian primitives with 2D triangle facelets. This representation naturally forms a discrete mesh-like structure while retaining the benefits of continuous volumetric modeling. By incorporating a compactness parameter into the triangle primitives, we enable direct training of photorealistic meshes. Our experimental results demonstrate that our triangle-based method, in its vanilla version (without compactness tuning), achieves higher fidelity compared to state-of-the-art Gaussian-based methods. Furthermore, our approach produces reconstructed meshes with superior visual quality compared to existing mesh reconstruction methods. Please visit our project page at https://gaoderender.github.io/triangle-splatting.
comment: 13 pages, 8 figures
♻ ☆ High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight CVPR2025
Semantic segmentation from RGB cameras is essential to the perception of autonomous flying vehicles. The stability of predictions through the captured videos is paramount to their reliability and, by extension, to the trustworthiness of the agents. In this paper, we propose a lightweight video semantic segmentation approach-suited to onboard real-time inference-achieving high temporal consistency on aerial data through Semantic Similarity Propagation across frames. SSP temporally propagates the predictions of an efficient image segmentation model with global registration alignment to compensate for camera movements. It combines the current estimation and the prior prediction with linear interpolation using weights computed from the features similarities of the two frames. Because data availability is a challenge in this domain, we propose a consistency-aware Knowledge Distillation training procedure for sparsely labeled datasets with few annotations. Using a large image segmentation model as a teacher to train the efficient SSP, we leverage the strong correlations between labeled and unlabeled frames in the same training videos to obtain high-quality supervision on all frames. KD-SSP obtains a significant temporal consistency increase over the base image segmentation model of 12.5% and 6.7% TC on UAVid and RuralScapes respectively, with higher accuracy and comparable inference speed. On these aerial datasets, KD-SSP provides a superior segmentation quality and inference speed trade-off than other video methods proposed for general applications and shows considerably higher consistency. Project page: https://github.com/FraunhoferIVI/SSP.
comment: Accepted by CVPR2025
♻ ☆ CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. Videos and additional materials can be found on the project page: https://amrl.cs.utexas.edu/creste
comment: 18 pages, 10 figures, 5 tables
♻ ☆ Generate the Forest before the Trees -- A Hierarchical Diffusion model for Climate Downscaling
Downscaling is essential for generating the high-resolution climate data needed for local planning, but traditional methods remain computationally demanding. Recent years have seen impressive results from AI downscaling models, particularly diffusion models, which have attracted attention due to their ability to generate ensembles and overcome the smoothing problem common in other AI methods. However, these models typically remain computationally intensive. We introduce a Hierarchical Diffusion Downscaling (HDD) model, which introduces an easily-extensible hierarchical sampling process to the diffusion framework. A coarse-to-fine hierarchy is imposed via a simple downsampling scheme. HDD achieves competitive accuracy on ERA5 reanalysis datasets and CMIP6 models, significantly reducing computational load by running on up to half as many pixels with competitive results. Additionally, a single model trained at 0.25{\deg} resolution transfers seamlessly across multiple CMIP6 models with much coarser resolution. HDD thus offers a lightweight alternative for probabilistic climate downscaling, facilitating affordable large-ensemble high-resolution climate projections. See a full code implementation at: https://github.com/HDD-Hierarchical-Diffusion-Downscaling/HDD-Hierarchical-Diffusion-Downscaling.
comment: 8 pages
♻ ☆ A Multi-Source Data Fusion-based Semantic Segmentation Model for Relic Landslide Detection
As a natural disaster, landslide often brings tremendous losses to human lives, so it urgently demands reliable detection of landslide risks. When detecting relic landslides that present important information for landslide risk warning, problems such as visual blur and small-sized dataset cause great challenges when using remote sensing images. To extract accurate semantic features, a hyper-pixel-wise contrastive learning augmented segmentation network (HPCL-Net) is proposed, which augments the local salient feature extraction from boundaries of landslides through HPCL and fuses heterogeneous information in the semantic space from high-resolution remote sensing images and digital elevation model data. For full utilization of precious samples, a global hyper-pixel-wise sample pair queues-based contrastive learning method is developed, which includes the construction of global queues that store hyper-pixel-wise samples and the updating scheme of a momentum encoder, reliably enhancing the extraction ability of semantic features. The proposed HPCL-Net is evaluated on the Loess Plateau relic landslide dataset and experimental results verify that the proposed HPCL-Net greatly outperforms existing models, where the mIoU is increased from 0.620 to 0.651, the Landslide IoU is improved from 0.334 to 0.394 and the F1score is enhanced from 0.501 to 0.565.
♻ ☆ Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion ICCV 2025
Ultra-high-definition (UHD) image restoration often faces computational bottlenecks and information loss due to its extremely high resolution. Existing studies based on Variational Autoencoders (VAE) improve efficiency by transferring the image restoration process from pixel space to latent space. However, degraded components are inherently coupled with background elements in degraded images, both information loss during compression and information gain during compensation remain uncontrollable. These lead to restored images often exhibiting image detail loss and incomplete degradation removal. To address this issue, we propose a Controlled Differential Disentangled VAE, which utilizes Hierarchical Contrastive Disentanglement Learning and an Orthogonal Gated Projection Module to guide the VAE to actively discard easily recoverable background information while encoding more difficult-to-recover degraded information into the latent space. Additionally, we design a Complex Invertible Multiscale Fusion Network to handle background features, ensuring their consistency, and utilize a latent space restoration network to transform the degraded latent features, leading to more accurate restoration results. Extensive experimental results demonstrate that our method effectively alleviates the information loss problem in VAE models while ensuring computational efficiency, significantly improving the quality of UHD image restoration, and achieves state-of-the-art results in six UHD restoration tasks with only 1M parameters.
comment: Accepted by ICCV 2025
♻ ☆ JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers ICCV
We present JointDiT, a diffusion transformer that models the joint distribution of RGB and depth. By leveraging the architectural benefit and outstanding image prior of the state-of-the-art diffusion transformer, JointDiT not only generates high-fidelity images but also produces geometrically plausible and accurate depth maps. This solid joint distribution modeling is achieved through two simple yet effective techniques that we propose, i.e., adaptive scheduling weights, which depend on the noise levels of each modality, and the unbalanced timestep sampling strategy. With these techniques, we train our model across all noise levels for each modality, enabling JointDiT to naturally handle various combinatorial generation tasks, including joint generation, depth estimation, and depth-conditioned image generation by simply controlling the timestep of each branch. JointDiT demonstrates outstanding joint generation performance. Furthermore, it achieves comparable results in depth estimation and depth-conditioned image generation, suggesting that joint distribution modeling can serve as a replaceable alternative to conditional generation. The project page is available at https://byungki-k.github.io/JointDiT/.
comment: Accepted to IEEE/CVF International Conference on Computer Vision (ICCV) 2025. Project page: https://byungki-k.github.io/JointDiT/ Code: https://github.com/ByungKi-K/JointDiT-code
♻ ☆ HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes ICCV
3DGS is an emerging and increasingly popular technology in the field of novel view synthesis. Its highly realistic rendering quality and real-time rendering capabilities make it promising for various applications. However, when applied to large-scale aerial urban scenes, 3DGS methods suffer from issues such as excessive memory consumption, slow training times, prolonged partitioning processes, and significant degradation in rendering quality due to the increased data volume. To tackle these challenges, we introduce \textbf{HUG}, a novel approach that enhances data partitioning and reconstruction quality by leveraging a hierarchical neural Gaussian representation. We first propose a visibility-based data partitioning method that is simple yet highly efficient, significantly outperforming existing methods in speed. Then, we introduce a novel hierarchical weighted training approach, combined with other optimization strategies, to substantially improve reconstruction quality. Our method achieves state-of-the-art results on one synthetic dataset and four real-world datasets.
comment: An improved version has recently been accepted to ICCV, manuscript, not camera-ready
♻ ☆ ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model
Speech-driven 3D facial animation aims to generate realistic lip movements and facial expressions for 3D head models from arbitrary audio clips. Although existing diffusion-based methods are capable of producing natural motions, their slow generation speed limits their application potential. In this paper, we introduce a novel autoregressive model that achieves real-time generation of highly synchronized lip movements and realistic head poses and eye blinks by learning a mapping from speech to a multi-scale motion codebook. Furthermore, our model can adapt to unseen speaking styles, enabling the creation of 3D talking avatars with unique personal styles beyond the identities seen during training. Extensive evaluations and user studies demonstrate that our method outperforms existing approaches in lip synchronization accuracy and perceived quality.
comment: More video demonstrations, code, models and data can be found on our project website: http://xg-chu.site/project_artalk/
♻ ☆ Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model
In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. To construct Ophora, we first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K. Then, we propose a Progressive Video-Instruction Tuning scheme to transfer rich spatial-temporal knowledge from a T2V model pre-trained on natural video-text datasets for privacy-preserved ophthalmic surgical video generation based on Ophora-160K. Experiments on video quality evaluation via quantitative analysis and ophthalmologist feedback demonstrate that Ophora can generate realistic and reliable ophthalmic surgical videos based on surgeon instructions. We also validate the capability of Ophora for empowering downstream tasks of ophthalmic surgical workflow understanding. Code is available at https://github.com/mar-cry/Ophora.
comment: Early accepted in MICCAI25
♻ ☆ Efficient Image Generation with Variadic Attention Heads CVPR
While the integration of transformers in vision models have yielded significant improvements on vision tasks they still require significant amounts of computation for both training and inference. Restricted attention mechanisms significantly reduce these computational burdens but come at the cost of losing either global or local coherence. We propose a simple, yet powerful method to reduce these trade-offs: allow the attention heads of a single transformer to attend to multiple receptive fields. We demonstrate our method utilizing Neighborhood Attention (NA) and integrate it into a StyleGAN based architecture for image generation. With this work, dubbed StyleNAT, we are able to achieve a FID of 2.05 on FFHQ, a 6% improvement over StyleGAN-XL, while utilizing 28% fewer parameters and with 4$\times$ the throughput capacity. StyleNAT achieves the Pareto Frontier on FFHQ-256 and demonstrates powerful and efficient image generation on other datasets. Our code and model checkpoints are publicly available at: https://github.com/SHI-Labs/StyleNAT
comment: Published in eLVM @ CVPR (https://openaccess.thecvf.com/content/CVPR2025W/eLVM/html/Walton_Efficient_Image_Generation_with_Variadic_Attention_Heads_CVPRW_2025_paper) | Formerly named StyleNAT: Giving Each Head a New Perspective |
Machine Learning 150
☆ Whole-Body Conditioned Egocentric Video Prediction
We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model's embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human.
comment: Project Page: https://dannytran123.github.io/PEVA
☆ mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale
Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection.
☆ Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkpoints during one-pass pretraining of a 7B large language model (LLM), i.e., OLMoE. We compute the training loss and evaluate generalization on diverse benchmark tasks, including math reasoning, code generation, and commonsense/domain-specific knowledge retrieval tasks. Our study, for the first time, verifies that grokking still happens in the pretraining of large-scale foundation models, though different data may enter grokking stages asynchronously. We further demystify grokking's "emergence of generalization" by investigating LLM internal dynamics. Specifically, we find that training samples' pathways (i.e., expert choices across layers) evolve from random, instance-specific to more structured and shareable between samples during grokking. Also, the complexity of a sample's pathway reduces despite the converged loss. These indicate a memorization-to-generalization conversion, providing a mechanistic explanation of delayed generalization. In the study, we develop two novel metrics to quantify pathway distance and the complexity of a single pathway. We show their ability to predict the generalization improvement on diverse downstream tasks. They are efficient, simple to compute and solely dependent on training data. Hence, they have practical value for pretraining, enabling us to monitor the generalization performance without finetuning and test. Theoretically, we show that more structured pathways reduce model complexity and improve the generalization bound.
☆ HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.
comment: Project webpage: https://plan-lab.github.io/hallusegbench/
☆ Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.
comment: Accepted at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025 Main)
☆ Exploring the Design Space of 3D MLLMs for CT Report Generation
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution
☆ Gaussian Invariant Markov Chain Monte Carlo
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
comment: 29, 2 figures
☆ skLEP: A Slovak General Language Understanding Benchmark
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
comment: ACL 2025 Findings
☆ Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems
Fault diagnosis in Cyber-Physical Systems (CPSs) is essential for ensuring system dependability and operational efficiency by accurately detecting anomalies and identifying their root causes. However, the manual modeling of faulty behaviors often demands extensive domain expertise and produces models that are complex, error-prone, and difficult to interpret. To address this challenge, we present a novel unsupervised fault diagnosis methodology that integrates collective anomaly detection in multivariate time series, process mining, and stochastic simulation. Initially, collective anomalies are detected from low-level sensor data using multivariate time-series analysis. These anomalies are then transformed into structured event logs, enabling the discovery of interpretable process models through process mining. By incorporating timing distributions into the extracted Petri nets, the approach supports stochastic simulation of faulty behaviors, thereby enhancing root cause analysis and behavioral understanding. The methodology is validated using the Robotic Arm Dataset (RoAD), a widely recognized benchmark in smart manufacturing. Experimental results demonstrate its effectiveness in modeling, simulating, and classifying faulty behaviors in CPSs. This enables the creation of comprehensive fault dictionaries that support predictive maintenance and the development of digital twins for industrial environments.
☆ Devising a solution to the problems of Cancer awareness in Telangana
According to the data, the percent of women who underwent screening for cervical cancer, breast and oral cancer in Telangana in the year 2020 was 3.3 percent, 0.3 percent and 2.3 percent respectively. Although early detection is the only way to reduce morbidity and mortality, people have very low awareness about cervical and breast cancer signs and symptoms and screening practices. We developed an ML classification model to predict if a person is susceptible to breast or cervical cancer based on demographic factors. We devised a system to provide suggestions for the nearest hospital or Cancer treatment centres based on the users location or address. In addition to this, we can integrate the health card to maintain medical records of all individuals and conduct awareness drives and campaigns. For ML classification models, we used decision tree classification and support vector classification algorithms for cervical cancer susceptibility and breast cancer susceptibility respectively. Thus, by devising this solution we come one step closer to our goal which is spreading cancer awareness, thereby, decreasing the cancer mortality and increasing cancer literacy among the people of Telangana.
☆ Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model's uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.
comment: 15 pages, 4 figures, 3 tables
☆ Evaluation of Traffic Signals for Daily Traffic Pattern
The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones.
☆ Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage
Extended Stability Runge-Kutta (ESRK) methods are crucial for solving large-scale computational problems in science and engineering, including weather forecasting, aerodynamic analysis, and complex biological modelling. However, balancing accuracy, stability, and computational efficiency remains challenging, particularly for high-order, low-storage schemes. This study introduces a hybrid Genetic Algorithm (GA) and Reinforcement Learning (RL) approach for automated heuristic discovery, optimising low-storage ESRK methods. Unlike traditional approaches that rely on manually designed heuristics or exhaustive numerical searches, our method leverages GA-driven mutations for search-space exploration and an RL-inspired state transition mechanism to refine heuristic selection dynamically. This enables systematic parameter reduction, preserving fourth-order accuracy while significantly improving computational efficiency.The proposed GA-RL heuristic optimisation framework is validated through rigorous testing on benchmark problems, including the 1D and 2D Brusselator systems and the steady-state Navier-Stokes equations. The best-performing heuristic achieves a 25\% reduction in IPOPT runtime compared to traditional ESRK optimisation processes while maintaining numerical stability and accuracy. These findings demonstrate the potential of adaptive heuristic discovery to improve resource efficiency in high-fidelity simulations and broaden the applicability of low-storage Runge-Kutta methods in real-world computational fluid dynamics, physics simulations, and other demanding fields. This work establishes a new paradigm in heuristic optimisation for numerical methods, opening pathways for further exploration using Deep RL and AutoML-based heuristic search
☆ Aligning Spoken Dialogue Models from User Interactions ICML 2025
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
comment: Accepted at ICML 2025
☆ A Keyword-Based Technique to Evaluate Broad Question Answer Script
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This article focuses on finding the keywords from the answer script and then compares them with the keywords that have been parsed from both open and closed domain. The system also checks the grammatical and spelling errors in the answer script. Our proposed system tested with answer scripts of 100 students and gives precision score 0.91.
comment: ACM Conference Proceedings (9 Pages)
☆ Wild refitting for black box prediction
We describe and analyze a computionally efficient refitting procedure for computing high-probability upper bounds on the instance-wise mean-squared prediction error of penalized nonparametric estimates based on least-squares minimization. Requiring only a single dataset and black box access to the prediction method, it consists of three steps: computing suitable residuals, symmetrizing and scaling them with a pre-factor $\rho$, and using them to define and solve a modified prediction problem recentered at the current estimate. We refer to it as wild refitting, since it uses Rademacher residual symmetrization as in a wild bootstrap variant. Under relatively mild conditions allowing for noise heterogeneity, we establish a high probability guarantee on its performance, showing that the wild refit with a suitably chosen wild noise scale $\rho$ gives an upper bound on prediction error. This theoretical analysis provides guidance into the design of such procedures, including how the residuals should be formed, the amount of noise rescaling in the wild sub-problem needed for upper bounds, and the local stability properties of the block-box procedure. We illustrate the applicability of this procedure to various problems, including non-rigid structure-from-motion recovery with structured matrix penalties; plug-and-play image restoration with deep neural network priors; and randomized sketching with kernel methods.
☆ Towards an Optimal Control Perspective of ResNet Training ICML 2025
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
comment: Accepted for presentation at the High-dimensional Learning Dynamics (HiLD) workshop at ICML 2025
☆ A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.
☆ Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.
comment: DSTFT, STFT, spectrogram, time-frequency, IEEE Transactions on Signal Processing, 10 pages
☆ Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort
This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.
comment: 40 pages, 2 figures, 2 tables. To be submitted in Behavior Research Methods
☆ Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.
☆ Distributed Cross-Channel Hierarchical Aggregation for Foundation Models
Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.
☆ Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${\sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.
comment: Accepted at UAI 2025
☆ Early Stopping Tabular In-Context Learning ICML
Tabular foundation models have shown strong performance across various tabular learning tasks via in-context learning, offering robust generalization without any downstream finetuning. However, their inference-time costs remain high, particularly for larger datasets. To address this, we propose early-stopping the in-context learning process. We achieve this by dynamically evaluating whether to stop in-context learning after each Transformer encoder layer. Once stopped, we decode the embedding using a pre-trained layer-wise decoder. Experiments across 34 small classification tasks size show that early stopping in-context learning accelerates inference by up to x1.3 with negligible degradation in predictive performance. To assess scalability, we further evaluate our method on five larger classification tasks, achieving speedups of up to x2.2. Our results demonstrate the potential of early exiting as an effective and practical strategy for improving the efficiency of tabular in-context learning.
comment: ICML Workshop Paper
☆ Temporal-Aware Graph Attention Network for Cryptocurrency Transaction Fraud Detection
Cryptocurrency transaction fraud detection faces the dual challenges of increasingly complex transaction patterns and severe class imbalance. Traditional methods rely on manual feature engineering and struggle to capture temporal and structural dependencies in transaction networks. This paper proposes an Augmented Temporal-aware Graph Attention Network (ATGAT) that enhances detection performance through three modules: (1) designing an advanced temporal embedding module that fuses multi-scale time difference features with periodic position encoding; (2) constructing a temporal-aware triple attention mechanism that jointly optimizes structural, temporal, and global context attention; (3) employing weighted BCE loss to address class imbalance. Experiments on the Elliptic++ cryptocurrency dataset demonstrate that ATGAT achieves an AUC of 0.9130, representing a 9.2% improvement over the best traditional method XGBoost, 12.0% over GCN, and 10.0% over standard GAT. This method not only validates the enhancement effect of temporal awareness and triple attention mechanisms on graph neural networks, but also provides financial institutions with more reliable fraud detection tools, with its design principles generalizable to other temporal graph anomaly detection tasks.
☆ Pay Attention to Small Weights
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.
☆ MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
Nowadays, the rapid growth of Deep Neural Network (DNN) architectures has established them as the defacto approach for providing advanced Machine Learning tasks with excellent accuracy. Targeting low-power DNN computing, this paper examines the interplay of fine-grained error resilience of DNN workloads in collaboration with hardware approximation techniques, to achieve higher levels of energy efficiency. Utilizing the state-of-the-art ROUP approximate multipliers, we systematically explore their fine-grained distribution across the network according to our layer-, filter-, and kernel-level approaches, and examine their impact on accuracy and energy. We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations. The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model, while it provides 2x energy gains with better accuracy versus the state-of-the-art DNN approximations.
comment: Presented at the 13th IEEE LASCAS Conference
☆ rQdia: Regularizing Q-Value Distributions With Image Augmentation
rQdia regularizes Q-value distributions with augmented images in pixel-based deep reinforcement learning. With a simple auxiliary loss, that equalizes these distributions via MSE, rQdia boosts DrQ and SAC on 9/12 and 10/12 tasks respectively in the MuJoCo Continuous Control Suite from pixels, and Data-Efficient Rainbow on 18/26 Atari Arcade environments. Gains are measured in both sample efficiency and longer-term training. Moreover, the addition of rQdia finally propels model-free continuous control from pixels over the state encoding baseline.
☆ SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
Multimodal in-context learning (ICL) remains underexplored despite significant potential for domains such as medicine. Clinicians routinely encounter diverse, specialized tasks requiring adaptation from limited examples, such as drawing insights from a few relevant prior cases or considering a constrained set of differential diagnoses. While multimodal large language models (MLLMs) have shown advances in medical visual question answering (VQA), their ability to learn multimodal tasks from context is largely unknown. We introduce SMMILE, the first expert-driven multimodal ICL benchmark for medical tasks. Eleven medical experts curated problems, each including a multimodal query and multimodal in-context examples as task demonstrations. SMMILE encompasses 111 problems (517 question-image-answer triplets) covering 6 medical specialties and 13 imaging modalities. We further introduce SMMILE++, an augmented variant with 1038 permuted problems. A comprehensive evaluation of 15 MLLMs demonstrates that most models exhibit moderate to poor multimodal ICL ability in medical tasks. In open-ended evaluations, ICL contributes only 8% average improvement over zero-shot on SMMILE and 9.4% on SMMILE++. We observe a susceptibility for irrelevant in-context examples: even a single noisy or irrelevant example can degrade performance by up to 9.5%. Moreover, example ordering exhibits a recency bias, i.e., placing the most relevant example last can lead to substantial performance improvements by up to 71%. Our findings highlight critical limitations and biases in current MLLMs when learning multimodal medical tasks from context.
☆ Lipschitz Bounds for Persistent Laplacian Eigenvalues under One-Simplex Insertions
Persistent Laplacians are matrix operators that track how the shape and structure of data transform across scales and are popularly adopted in biology, physics, and machine learning. Their eigenvalues are concise descriptors of geometric and topological features in a filtration. Although earlier work established global algebraic stability for these operators, the precise change in a single eigenvalue when one simplex, such as a vertex, edge, or triangle, is added has remained unknown. This is important because downstream tools, including heat-kernel signatures and spectral neural networks, depend directly on these eigenvalues. We close this gap by proving a uniform Lipschitz bound: after inserting one simplex, every up-persistent Laplacian eigenvalue can vary by at most twice the Euclidean norm of that simplex's boundary, independent of filtration scale and complex size. This result delivers the first eigenvalue-level robustness guarantee for spectral topological data analysis. It guarantees that spectral features remain stable under local updates and enables reliable error control in dynamic data settings.
comment: 16 pages, 4 figures
☆ DynamicBench: Evaluating Real-Time Report Generation in Large Language Models
Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minute data. DynamicBench utilizes a dual-path retrieval pipeline, integrating web searches with local report databases. It necessitates domain-specific knowledge, ensuring accurate responses report generation within specialized fields. By evaluating models in scenarios that either provide or withhold external documents, DynamicBench effectively measures their capability to independently process recent information or leverage contextual enhancements. Additionally, we introduce an advanced report generation system adept at managing dynamic information synthesis. Our experimental results confirm the efficacy of our approach, with our method achieving state-of-the-art performance, surpassing GPT4o in document-free and document-assisted scenarios by 7.0% and 5.8%, respectively. The code and data will be made publicly available.
☆ AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification
Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.
comment: This work has been submitted to the IEEE for possible publication
☆ Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts
Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently activated during training and inference, leading to significant underutilization of model capacity and computational resources. In this work, we revisit expert routing through a clustering perspective and propose Latent Prototype Routing (LPR), a novel routing framework that generalizes existing approaches while promoting balanced expert utilization without compromising downstream performance. Extensive experiments across multiple open-source MoE models -- including DeepSeek-V3, Qwen3-MoE, and Mixtral -- demonstrate that LPR reduces the Gini coefficient of expert load from 0.70 to 0.035 on average, improves the min-max expert load ratio from 1e-6 to 0.70, achieving near-perfect load balancing.
comment: 15 pages,4 figures
☆ Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning
Neuromorphic and quantum computing have recently emerged as promising paradigms for advancing artificial intelligence, each offering complementary strengths. Neuromorphic systems built on spiking neurons excel at processing time-series data efficiently through sparse, event-driven computation, consuming energy only upon input events. Quantum computing, on the other hand, leverages superposition and entanglement to explore feature spaces that are exponentially large in the number of qubits. Hybrid approaches combining these paradigms have begun to show potential, but existing quantum spiking models have important limitations. Notably, prior quantum spiking neuron implementations rely on classical memory mechanisms on single qubits, requiring repeated measurements to estimate firing probabilities, and they use conventional backpropagation on classical simulators for training. Here we propose a stochastic quantum spiking (SQS) neuron model that addresses these challenges. The SQS neuron uses multi-qubit quantum circuits to realize a spiking unit with internal quantum memory, enabling event-driven probabilistic spike generation in a single shot. Furthermore, we outline how networks of SQS neurons -- dubbed SQS neural networks (SQSNNs) -- can be trained via a hardware-friendly local learning rule, eliminating the need for global classical backpropagation. The proposed SQSNN model fuses the time-series efficiency of neuromorphic computing with the exponentially large inner state space of quantum computing, paving the way for quantum spiking neural networks that are modular, scalable, and trainable on quantum hardware.
☆ On Uniform Weighted Deep Polynomial approximation
It is a classical result in rational approximation theory that certain non-smooth or singular functions, such as $|x|$ and $x^{1/p}$, can be efficiently approximated using rational functions with root-exponential convergence in terms of degrees of freedom \cite{Sta, GN}. In contrast, polynomial approximations admit only algebraic convergence by Jackson's theorem \cite{Lub2}. Recent work shows that composite polynomial architectures can recover exponential approximation rates even without smoothness \cite{KY}. In this work, we introduce and analyze a class of weighted deep polynomial approximants tailored for functions with asymmetric behavior-growing unbounded on one side and decaying on the other. By multiplying a learnable deep polynomial with a one-sided weight, we capture both local non-smoothness and global growth. We show numerically that this framework outperforms Taylor, Chebyshev, and standard deep polynomial approximants, even when all use the same number of parameters. To optimize these approximants in practice, we propose a stable graph-based parameterization strategy building on \cite{Jar}.
☆ Exploring Adapter Design Tradeoffs for Low Resource Music Generation
Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources. Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters while preserving model performance. However, the design choices for adapters, including their architecture, placement, and size, are numerous, and it is unclear which of these combinations would produce optimal adapters and why, for a given case of low-resource music genre. In this paper, we attempt to answer this question by studying various adapter configurations for two AI music models, MusicGen and Mustango, on two genres: Hindustani Classical and Turkish Makam music. Our findings reveal distinct trade-offs: convolution-based adapters excel in capturing fine-grained local musical details such as ornamentations and short melodic phrases, while transformer-based adapters better preserve long-range dependencies crucial for structured improvisation. Additionally, we analyze computational resource requirements across different adapter scales, demonstrating how mid-sized adapters (40M parameters) achieve an optimal balance between expressivity and quality. Furthermore, we find that Mustango, a diffusion-based model, generates more diverse outputs with better adherence to the description in the input prompt while lacking in providing stability in notes, rhythm alignment, and aesthetics. Also, it is computationally intensive and requires significantly more time to train. In contrast, autoregressive models like MusicGen offer faster training and are more efficient, and can produce better quality output in comparison, but have slightly higher redundancy in their generations.
comment: 9 pages, 5 figures
☆ Improved seeding strategies for k-means and k-GMM
We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a lookahead principle--conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a multipass strategy to tame down the effect of randomization. Experiments show a consistent constant factor improvement over classical contenders in terms of the final metric (SSE for k-means, log-likelihood for k-GMM), at a modest overhead. In particular, for k-means, our methods improve on the recently designed multi-swap strategy, which was the first one to outperform the greedy k-means++ seeding. Our experimental analysis also shed light on subtle properties of k-means often overlooked, including the (lack of) correlations between the SSE upon seeding and the final SSE, the variance reduction phenomena observed in iterative seeding methods, and the sensitivity of the final SSE to the pool size for greedy methods. Practically, our most effective seeding methods are strong candidates to become one of the--if not the--standard techniques. From a theoretical perspective, our formalization of seeding opens the door to a new line of analytical approaches.
comment: 13 pages
☆ Small Encoders Can Rival Large Decoders in Detecting Groundedness
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
☆ Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution
We propose a novel variational autoencoder (VAE) architecture that employs a spherical Cauchy (spCauchy) latent distribution. Unlike traditional Gaussian latent spaces or the widely used von Mises-Fisher (vMF) distribution, spCauchy provides a more natural hyperspherical representation of latent variables, better capturing directional data while maintaining flexibility. Its heavy-tailed nature prevents over-regularization, ensuring efficient latent space utilization while offering a more expressive representation. Additionally, spCauchy circumvents the numerical instabilities inherent to vMF, which arise from computing normalization constants involving Bessel functions. Instead, it enables a fully differentiable and efficient reparameterization trick via M\"obius transformations, allowing for stable and scalable training. The KL divergence can be computed through a rapidly converging power series, eliminating concerns of underflow or overflow associated with evaluation of ratios of hypergeometric functions. These properties make spCauchy a compelling alternative for VAEs, offering both theoretical advantages and practical efficiency in high-dimensional generative modeling.
☆ DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.
☆ From On-chain to Macro: Assessing the Importance of Data Source Diversity in Cryptocurrency Market Forecasting
This study investigates the impact of data source diversity on the performance of cryptocurrency forecasting models by integrating various data categories, including technical indicators, on-chain metrics, sentiment and interest metrics, traditional market indices, and macroeconomic indicators. We introduce the Crypto100 index, representing the top 100 cryptocurrencies by market capitalization, and propose a novel feature reduction algorithm to identify the most impactful and resilient features from diverse data sources. Our comprehensive experiments demonstrate that data source diversity significantly enhances the predictive performance of forecasting models across different time horizons. Key findings include the paramount importance of on-chain metrics for both short-term and long-term predictions, the growing relevance of traditional market indices and macroeconomic indicators for longer-term forecasts, and substantial improvements in model accuracy when diverse data sources are utilized. These insights help demystify the short-term and long-term driving factors of the cryptocurrency market and lay the groundwork for developing more accurate and resilient forecasting models.
☆ Zero-Shot Learning for Obsolescence Risk Forecasting
Component obsolescence poses significant challenges in industries reliant on electronic components, causing increased costs and disruptions in the security and availability of systems. Accurate obsolescence risk prediction is essential but hindered by a lack of reliable data. This paper proposes a novel approach to forecasting obsolescence risk using zero-shot learning (ZSL) with large language models (LLMs) to address data limitations by leveraging domain-specific knowledge from tabular datasets. Applied to two real-world datasets, the method demonstrates effective risk prediction. A comparative evaluation of four LLMs underscores the importance of selecting the right model for specific forecasting tasks.
☆ Complexity-aware fine-tuning
General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.55$ vs $0.43$ average accuracy) and provides comparable with distillation performance while using $62\%$ less data ($0.55$ average accuracy for both). We publish our code and data to facilitate further research in this direction.
☆ Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? NeurIPS 2024
Causal reasoning capability is critical in advancing large language models (LLMs) toward strong artificial intelligence. While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark called CausalProbe-2024, whose corpora are fresh and nearly unseen for the studied LLMs. The LLMs exhibit a significant performance drop on CausalProbe-2024 compared to earlier benchmarks, indicating the fact that they primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. We propose G^2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes. Experiments demonstrate that G^2-Reasoner significantly enhances LLMs' causal reasoning capability, particularly in fresh and counterfactual contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.
comment: 24 pages, accepted at NeurIPS 2024
☆ Artificial Delegates Resolve Fairness Issues in Perpetual Voting with Partial Turnout
Perpetual voting addresses fairness in sequential collective decision-making by evaluating representational equity over time. However, existing perpetual voting rules rely on full participation and complete approval information, assumptions that rarely hold in practice, where partial turnout is the norm. In this work, we study the integration of Artificial Delegates, preference-learning agents trained to represent absent voters, into perpetual voting systems. We examine how absenteeism affects fairness and representativeness under various voting methods and evaluate the extent to which Artificial Delegates can compensate for missing participation. Our findings indicate that while absenteeism significantly affects fairness, Artificial Delegates reliably mitigate these effects and enhance robustness across diverse scenarios.
comment: The paper has been accepted at the ACM Collective Intelligence Conference (CI 2025), August 4 to 6, 2025, San Diego, CA, USA
☆ Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
This technical report presents submission systems for Task 4 of the DCASE 2025 Challenge. This model incorporates additional audio features (spectral roll-off and chroma features) into the embedding feature extracted from the mel-spectral feature to im-prove the classification capabilities of an audio-tagging model in the spatial semantic segmentation of sound scenes (S5) system. This approach is motivated by the fact that mixed audio often contains subtle cues that are difficult to capture with mel-spectrograms alone. Thus, these additional features offer alterna-tive perspectives for the model. Second, an agent-based label correction system is applied to the outputs processed by the S5 system. This system reduces false positives, improving the final class-aware signal-to-distortion ratio improvement (CA-SDRi) metric. Finally, we refine the training dataset to enhance the classi-fication accuracy of low-performing classes by removing irrele-vant samples and incorporating external data. That is, audio mix-tures are generated from a limited number of data points; thus, even a small number of out-of-class data points could degrade model performance. The experiments demonstrate that the submit-ted systems employing these approaches relatively improve CA-SDRi by up to 14.7% compared to the baseline of DCASE 2025 Challenge Task 4.
comment: DCASE 2025 challenge Task4, 5 pages
☆ Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design
In many real-world applications, evaluating the goodness of instances is often costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In particular, this is even more critical in reinforcement learning, as new interactions with the environment (i.e., new instances) need to be evaluated to provide a reward signal to learn from. As sufficient exploration is crucial, learning from a diverse mini-batch can have a large impact and help mitigate mode collapse. In this paper, we introduce diverse mini-batch selection for reinforcement learning and propose to use determinantal point processes for this task. We study this framework in the context of a real-world problem, namely drug discovery. We experimentally study how our proposed framework can improve the effectiveness of chemical exploration in de novo drug design, where finding diverse and high-quality solutions is essential. We conduct a comprehensive evaluation with three well-established molecular generation oracles over numerous generative steps. Our experiments conclude that our diverse mini-batch selection framework can substantially improve the diversity of the solutions, while still obtaining solutions of high quality. In drug discovery, such outcome can potentially lead to fulfilling unmet medication needs faster.
☆ Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation ICML 2025
The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at https://github.com/lihe-maxsize/DeppSTCI_Release_Version-master.
comment: 24 pages, accepted at ICML 2025
☆ Linearity-based neural network compression
In neural network compression, most current methods reduce unnecessary parameters by measuring importance and redundancy. To augment already highly optimized existing solutions, we propose linearity-based compression as a novel way to reduce weights in a neural network. It is based on the intuition that with ReLU-like activation functions, neurons that are almost always activated behave linearly, allowing for merging of subsequent layers. We introduce the theory underlying this compression and evaluate our approach experimentally. Our novel method achieves a lossless compression down to 1/4 of the original model size in over the majority of tested models. Applying our method on already importance-based pruned models shows very little interference between different types of compression, demonstrating the option of successful combination of techniques. Overall, our work lays the foundation for a new type of compression method that enables smaller and ultimately more efficient neural network models.
☆ Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion
Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client's unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.
☆ Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks
The growing integration of UAVs into civilian airspace underscores the need for resilient and intelligent intrusion detection systems (IDS), as traditional anomaly detection methods often fail to identify novel threats. A common approach treats unfamiliar attacks as out-of-distribution (OOD) samples; however, this leaves systems vulnerable when mitigation is inadequate. Moreover, conventional OOD detectors struggle to distinguish stealthy adversarial attacks from genuine OOD events. This paper introduces a conditional generative adversarial network (cGAN)-based framework for crafting stealthy adversarial attacks that evade IDS mechanisms. We first design a robust multi-class IDS classifier trained on benign UAV telemetry and known cyber-attacks, including Denial of Service (DoS), false data injection (FDI), man-in-the-middle (MiTM), and replay attacks. Using this classifier, our cGAN perturbs known attacks to generate adversarial samples that misclassify as benign while retaining statistical resemblance to OOD distributions. These adversarial samples are iteratively refined to achieve high stealth and success rates. To detect such perturbations, we implement a conditional variational autoencoder (CVAE), leveraging negative log-likelihood to separate adversarial inputs from authentic OOD samples. Comparative evaluation shows that CVAE-based regret scores significantly outperform traditional Mahalanobis distance-based detectors in identifying stealthy adversarial threats. Our findings emphasize the importance of advanced probabilistic modeling to strengthen IDS capabilities against adaptive, generative-model-based cyber intrusions.
☆ DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding
Electroencephalography (EEG)-based brain-computer interfaces (BCIs) transform spontaneous/evoked neural activity into control commands for external communication. While convolutional neural networks (CNNs) remain the mainstream backbone for EEG decoding, their inherently short receptive field makes it difficult to capture long-range temporal dependencies and global inter-channel relationships. Recent CNN-Transformer (Conformers) hybrids partially address this issue, but most adopt a serial design, resulting in suboptimal integration of local and global features, and often overlook explicit channel-wise modeling. To address these limitations, we propose DBConformer, a dual-branch convolutional Transformer network tailored for EEG decoding. It integrates a temporal Conformer to model long-range temporal dependencies and a spatial Conformer to extract inter-channel interactions, capturing both temporal dynamics and spatial patterns in EEG signals. A lightweight channel attention module further refines spatial representations by assigning data-driven importance to EEG channels. Extensive experiments on five motor imagery (MI) datasets and two seizure detection datasets under three evaluation settings demonstrate that DBConformer consistently outperforms 10 competitive baseline models, with over eight times fewer parameters than the high-capacity EEG Conformer baseline. Further, the visualization results confirm that the features extracted by DBConformer are physiologically interpretable and aligned with sensorimotor priors in MI. The superior performance and interpretability of DBConformer make it reliable for robust and explainable EEG decoding. Code is publicized at https://github.com/wzwvv/DBConformer.
comment: 12 pages, 6 figures
☆ NaLaFormer: Norm-Aware Linear Attention for Transformer Models
Linear attention has emerged as a viable alternative to softmax attention by reducing complexity from quadratic to linear in sequence length. To preserve two fundamental properties of softmax, non-negativity and entropy reduction, current works employ various linearly separatable kernel functions with $L1$ normalization instead of softmax operator. However, query norms are neglected by the normalization operation in linear attention, such degradation heavily leads to an entropy gap. Meanwhile, existing works inhibit negative values of query and key vectors resulting in a missing inner-product interactions after being mapped. To address these dual challenges, we propose a novel Norm-Aware Linear Attention mechanism serving to restore norm-guided dynamic spikiness and recover kernel-perturbed norm distributions. Specifically, we first decouple query and key matrices into two components: norm and direction, to achieve norm-aware spikiness control and norm consistency, respectively. We mathematically reveal that the extent of entropy reduction varies with the query norm in softmax normalization, motivating a query-norm aware kernel function for dynamic control over entropy reduction. Furthermore, to ensure norm consistency and enforce non-negativity constraints, we employ a norm-preserving mapping to project all elements of the angular matrix into positive values, leveraging cosine similarity to inhibit dimensions with opposite directions. We conduct extensive experiments demonstrating that the NaLaFormer improves performance on vision and language tasks, enhancing both expressiveness and efficiency by up to 4.2\%.
☆ Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks
Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation-space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert-guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision-making in environments with evolving threat scenarios.
☆ Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments
The increasing automation of navigation for unmanned aerial vehicles (UAVs) has exposed them to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL) through sensor manipulation. Although existing robust RL methods aim to mitigate such threats, their effectiveness has limited generalization to out-of-distribution shifts from the optimal value distribution, as they are primarily designed to handle fixed perturbation. To address this limitation, this paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts by incorporating a switching mechanism based on discounted Thompson sampling (DTS). This mechanism dynamically selects among multiple robust policies to minimize adversarially induced state-action-value distribution shifts. The proposed approach first derives a diverse ensemble of action robust policies by accounting for a range of perturbations in the policy space. These policies are then modeled as a multiarmed bandit (MAB) problem, where DTS optimally selects policies in response to nonstationary Bernoulli rewards, effectively adapting to evolving adversarial strategies. Theoretical framework has also been provided where by optimizing the DTS to minimize the overall regrets due to distributional shift, results in effective adaptation against unseen adversarial attacks thus inducing antifragility. Extensive numerical simulations validate the effectiveness of the proposed framework in complex navigation environments with multiple dynamic three-dimensional obstacles and with stronger projected gradient descent (PGD) and spoofing attacks. Compared to conventional robust, non-adaptive RL methods, the antifragile approach achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories compared to existing robust RL techniques
☆ Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection
Remote sensing change detection is essential for monitoring urban expansion, disaster assessment, and resource management, offering timely, accurate, and large-scale insights into dynamic landscape transformations. While deep learning has revolutionized change detection, the increasing complexity and computational demands of modern models have not necessarily translated into significant accuracy gains. Instead of following this trend, this study explores a more efficient approach, focusing on lightweight models that maintain high accuracy while minimizing resource consumption, which is an essential requirement for on-satellite processing. To this end, we propose FlickCD, which means quick flick then get great results, pushing the boundaries of the performance-resource trade-off. FlickCD introduces an Enhanced Difference Module (EDM) to amplify critical feature differences between temporal phases while suppressing irrelevant variations such as lighting and weather changes, thereby reducing computational costs in the subsequent change decoder. Additionally, the FlickCD decoder incorporates Local-Global Fusion Blocks, leveraging Shifted Window Self-Attention (SWSA) and Enhanced Global Self-Attention (EGSA) to efficiently capture semantic information at multiple scales, preserving both coarse- and fine-grained changes. Extensive experiments on four benchmark datasets demonstrate that FlickCD reduces computational and storage overheads by more than an order of magnitude while achieving state-of-the-art (SOTA) performance or incurring only a minor (<1\% F1) accuracy trade-off. The implementation code is publicly available at https://github.com/xulsh8/FlickCD.
comment: 12 pages
☆ Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges
Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell's phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model's insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.
☆ Learning to Skip the Middle Layers of Transformers
Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers) or skip layers independently of one another. However, interpretability research has demonstrated that the middle layers of Transformers exhibit greater redundancy, and that early layers aggregate information into token positions. Guided by these insights, we propose a novel architecture that dynamically skips a variable number of layers from the middle outward. In particular, a learned gating mechanism determines whether to bypass a symmetric span of central blocks based on the input, and a gated attention mechanism prevents subsequent tokens from attending to skipped token positions. Residual norms are controlled with a 'sandwich' or 'perilayernorm' scheme and gate sparsity with an adaptive regularization loss. We had aimed to reduce compute requirements for 'simpler' tokens and potentially foster an emergent multi-level representational hierarchy but, at the scales investigated, our approach does not achieve improvements in the trade-off between validation cross-entropy and estimated FLOPs compared to dense baselines with fewer layers. We release our code at https://github.com/tim-lawson/skip-middle.
comment: 11 pages, 2 figures
☆ Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning
Concept-Based Models (CBMs) are a class of deep learning models that provide interpretability by explaining predictions through high-level concepts. These models first predict concepts and then use them to perform a downstream task. However, current CBMs offer interpretability only for the final task prediction, while the concept predictions themselves are typically made via black-box neural networks. To address this limitation, we propose Hierarchical Concept Memory Reasoner (H-CMR), a new CBM that provides interpretability for both concept and task predictions. H-CMR models relationships between concepts using a learned directed acyclic graph, where edges represent logic rules that define concepts in terms of other concepts. During inference, H-CMR employs a neural attention mechanism to select a subset of these rules, which are then applied hierarchically to predict all concepts and the final task. Experimental results demonstrate that H-CMR matches state-of-the-art performance while enabling strong human interaction through concept and model interventions. The former can significantly improve accuracy at inference time, while the latter can enhance data efficiency during training when background knowledge is available.
☆ FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients' private data. However, fairness remains a key concern, as biases in local clients' datasets can impact the entire federated system. Heterogeneous data distributions across clients may lead to models that are fairer for some clients than others. Although several fairness-enhancing solutions are present in the literature, most focus on mitigating bias for a single sensitive attribute, typically binary, overlooking the diverse and sometimes conflicting fairness needs of different clients. This limited perspective can limit the effectiveness of fairness interventions for the different clients. To support more robust and reproducible fairness research in FL, we aim to enable a consistent benchmarking of fairness-aware FL methods at both the global and client levels. In this paper, we contribute in three ways: (1) We introduce FeDa4Fair, a library to generate tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.
☆ Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection
Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work, we propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection. By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models (1-2 layers) without increasing model depth. This design enables lightweight Transformers to achieve detection performance comparable to much deeper models, making them well-suited for deployment on resource-constrained mobile devices. Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers, while maintaining storage and computational efficiency. This represents a promising direction for implementing Transformer-based algorithms in wireless receivers with limited computational resources.
☆ CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions
Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds and the associated structural changes. To address this gap, we introduce a comprehensive benchmark for covalent docking, CovDocker, which is designed to better capture the complexities of covalent binding. We decompose the covalent docking process into three main tasks: reactive location prediction, covalent reaction prediction, and covalent docking. By adapting state-of-the-art models, such as Uni-Mol and Chemformer, we establish baseline performances and demonstrate the effectiveness of the benchmark in accurately predicting interaction sites and modeling the molecular transformations involved in covalent binding. These results confirm the role of the benchmark as a rigorous framework for advancing research in covalent drug design. It underscores the potential of data-driven approaches to accelerate the discovery of selective covalent inhibitors and addresses critical challenges in therapeutic development.
comment: Accepted to KDD 2025 Research Track
☆ EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception ICCV 2025
Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.
comment: Accepted at ICCV 2025
☆ Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games
This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game. The idea is to rescale the learning process by simultaneously reducing the learning rate and increasing the update frequency, effectively treating the agent's parameters as a slow-evolving variable influenced by the fast-mixing game state. Under mild assumptions-ergodicity of the state process and continuity of the updates-we prove the convergence of this rescaled process to an ordinary differential equation (ODE). This ODE provides a tractable, deterministic approximation of the agent's learning dynamics. An implementation of the framework is available at\,: https://github.com/yannKerzreho/MarkovGameApproximation
☆ Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph
Teaching large language models (LLMs) to use tools is crucial for improving their problem-solving abilities and expanding their applications. However, effectively using tools is challenging because it requires a deep understanding of tool functionalities and user intentions. Previous methods relied mainly on LLMs to generate instruction data, but the quality of these data was often insufficient. In this paper, we propose a new method that uses knowledge graphs to generate high-quality instruction data for LLMs. Knowledge graphs are manually curated datasets rich in semantic information. We begin by extracting various query pathways from a given knowledge graph, which are transformed into a broad spectrum of user queries. We then translate the relationships between entities into actionable tools and parse the pathways of each query into detailed solution steps, thereby creating high-quality instruction data. Our experiments show that fine-tuning on just a small sample of this synthetic data can significantly improve the tool utilization and overall capabilities of LLMs.
comment: 20 pages, 12 figures
☆ FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning
In federated learning (FL), the data distribution of each client may change over time, introducing both temporal and spatial data heterogeneity, known as concept drift. Data heterogeneity arises from three drift sources: real drift (a shift in the conditional distribution P(y|x)), virtual drift (a shift in the input distribution P(x)), and label drift (a shift in the label distribution P(y)). However, most existing FL methods addressing concept drift primarily focus on real drift. When clients experience virtual or label drift, these methods often fail to selectively retain useful historical knowledge, leading to catastrophic forgetting. A key challenge lies in distinguishing different sources of drift, as they require distinct adaptation strategies: real drift calls for discarding outdated data, while virtual or label drift benefits from retaining historical data. Without explicitly identifying the drift sources, a general adaptation strategy is suboptimal and may harm generalization. To address this challenge, we propose FedDAA, a dynamic clustered FL framework designed to adapt to multi-source concept drift while preserving valuable historical knowledge. Specifically, FedDAA integrates three modules: a cluster number determination module to find the optimal number of clusters; a real drift detection module to distinguish real drift from virtual/label drift; and a concept drift adaptation module to adapt to new data while retaining useful historical information. We provide theoretical convergence guarantees, and experiments show that FedDAA achieves 7.84% to 8.52% accuracy improvements over state-of-the-art methods on Fashion-MNIST, CIFAR-10, and CIFAR-100.
☆ Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling
Text-guided diffusion models have become essential for high-quality image synthesis, enabling dynamic image editing. In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved. However, achieving optimal results is challenging because of the inherent trade-off between editability and faithfulness. To address this, we propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability. FGS incorporates faithfulness guidance to strengthen the preservation of input image information and introduces a scheduling strategy to resolve misalignment between editability and faithfulness. Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability. Moreover, its compatibility with various editing methods enables precise, high-quality image edits across diverse tasks.
comment: preprint
☆ Efficient Skill Discovery via Regret-Aware Optimization
Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For existing methods, they focus on improving diversity through pure exploration, mutual information optimization, and learning temporal representation. Despite that they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i.e., skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, skill generation comes from an up-gradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods.
☆ Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
comment: 9 technical page followed by references and appendix
☆ RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment ICCV 2025
Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more data-efficient training paradigms. Data selection has shown promise to mitigate redundancy by identifying the most representative samples, thereby reducing training costs without compromising performance. Existing methods typically rely on static scoring metrics or pretrained models, overlooking the combined effect of selected samples and their evolving dynamics during training. We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships, capturing the intrinsic structure of the dataset. Based on this, we reformulate data selection as a reinforcement learning (RL) process and propose RL-Selector, where a lightweight RL agent optimizes the selection policy by leveraging epsilon-sample cover derived from evolving dataset distribution as a reward signal. Extensive experiments across benchmark datasets and diverse architectures demonstrate that our method consistently outperforms existing state-of-the-art baselines. Models trained with our selected datasets show enhanced generalization performance with improved training efficiency.
comment: ICCV 2025
☆ An Information-Theoretic Analysis for Federated Learning under Concept Drift
Recent studies in federated learning (FL) commonly train models on static datasets. However, real-world data often arrives as streams with shifting distributions, causing performance degradation known as concept drift. This paper analyzes FL performance under concept drift using information theory and proposes an algorithm to mitigate the performance degradation. We model concept drift as a Markov chain and introduce the \emph{Stationary Generalization Error} to assess a model's capability to capture characteristics of future unseen data. Its upper bound is derived using KL divergence and mutual information. We study three drift patterns (periodic, gradual, and random) and their impact on FL performance. Inspired by this, we propose an algorithm that regularizes the empirical risk minimization approach with KL divergence and mutual information, thereby enhancing long-term performance. We also explore the performance-cost tradeoff by identifying a Pareto front. To validate our approach, we build an FL testbed using Raspberry Pi4 devices. Experimental results corroborate with theoretical findings, confirming that drift patterns significantly affect performance. Our method consistently outperforms existing approaches for these three patterns, demonstrating its effectiveness in adapting concept drift in FL.
☆ Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning
Continual learning (CL) with large pre-trained models is challenged by catastrophic forgetting and task interference. Existing LoRA-based Mixture-of-Experts (MoE) approaches mitigate forgetting by assigning and freezing task-specific adapters, but suffer from interference, redundancy, and ambiguous routing due to coarse adapter-level selection. However, this design introduces three key challenges: 1) Interference: Activating full LoRA experts per input leads to subspace interference and prevents selective reuse of useful components across tasks. 2) Redundancy: Newly added experts often duplicate or contradict existing knowledge due to unnecessary activation of unrelated ranks and insufficient reuse of relevant ones. 3) Ambiguity: Overlapping features across tasks confuse the router, resulting in unstable expert assignments. As more experts accumulate, earlier task routing degrades, accelerating forgetting. We propose MoRA, a Mixture-of-Rank Adaptive learning approach with self-activated and sparse rank activation for CL. Unlike mixing multiple low-rank matrices, MoRA decomposes each rank-r update into r rank-1 components, each treated as an independent expert, enabling fine-grained mixture of rank-1 expert utilization while mitigating interference and redundancy. To avoid ambiguous routing, we propose that each rank-1 expert can infer its own relevance via intermediate activations. Coupled with our proposed rank pruning and activation budgets, MoRA adaptively selects a sparse mixture of ranks per input. We validate MoRA on continual learning tasks with CLIP and large language models (LLMs), analyzing both in-domain learning and out-of-domain forgetting/generalization during fine-tuning. MoRA shows significant effectiveness on enhancing CL with PTMs, and improving generalization while mitigating forgetting.
comment: Preprint
☆ TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence
Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 11 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction.
☆ HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation
Machine learning-assisted diagnosis is gaining traction in skin disease detection, but training effective models requires large amounts of high-quality data. Skin disease datasets often suffer from class imbalance, privacy concerns, and object bias, making data augmentation essential. While classical generative models are widely used, they demand extensive computational resources and lengthy training time. Quantum computing offers a promising alternative, but existing quantum-based image generation methods can only yield grayscale low-quality images. Through a novel classical-quantum latent space fusion technique, our work overcomes this limitation and introduces the first classical-quantum generative adversarial network (GAN) capable of generating color medical images. Our model outperforms classical deep convolutional GANs and existing hybrid classical-quantum GANs in both image generation quality and classification performance boost when used as data augmentation. Moreover, the performance boost is comparable with that achieved using state-of-the-art classical generative models, yet with over 25 times fewer parameters and 10 times fewer training epochs. Such results suggest a promising future for quantum image generation as quantum hardware advances. Finally, we demonstrate the robust performance of our model on real IBM quantum machine with hardware noise.
☆ Distilling Normalizing Flows CVPR
Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and analyze the probability distribution. The downside of these models is that they are often more difficult to train and have lower sampling quality. Normalizing flows are explicit density models, that use composable bijective functions to turn an intractable probability function into a tractable one. In this work, we present novel knowledge distillation techniques to increase sampling quality and density estimation of smaller student normalizing flows. We seek to study the capacity of knowledge distillation in Compositional Normalizing Flows to understand the benefits and weaknesses provided by these architectures. Normalizing flows have unique properties that allow for a non-traditional forms of knowledge transfer, where we can transfer that knowledge within intermediate layers. We find that through this distillation, we can make students significantly smaller while making substantial performance gains over a non-distilled student. With smaller models there is a proportionally increased throughput as this is dependent upon the number of bijectors, and thus parameters, in the network.
comment: Published in eLVM @ CVPR (https://openaccess.thecvf.com/content/CVPR2025W/eLVM/html/Walton_Distilling_Normalizing_Flows_CVPRW_2025_paper)
☆ Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
We propose a novel step-by-step video-to-audio generation method that sequentially produces individual audio tracks, each corresponding to a specific sound event in the video. Our approach mirrors traditional Foley workflows, aiming to capture all sound events induced by a given video comprehensively. Each generation step is formulated as a guided video-to-audio synthesis task, conditioned on a target text prompt and previously generated audio tracks. This design is inspired by the idea of concept negation from prior compositional generation frameworks. To enable this guided generation, we introduce a training framework that leverages pre-trained video-to-audio models and eliminates the need for specialized paired datasets, allowing training on more accessible data. Experimental results demonstrate that our method generates multiple semantically distinct audio tracks for a single input video, leading to higher-quality composite audio synthesis than existing baselines.
☆ SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-variance evolutionary strategies (ES) or zeroth-order (ZO) optimization, and often fail to achieve satisfactory performance. In this paper, we propose a hybrid Sharpness-aware Zeroth-order optimization (SharpZO) approach, specifically designed to enhance the performance of ZO VLM fine-tuning via a sharpness-aware warm-up training. SharpZO features a two-stage optimization process: a sharpness-aware ES stage that globally explores and smooths the loss landscape to construct a strong initialization, followed by a fine-grained local search via sparse ZO optimization. The entire optimization relies solely on forward passes. Detailed theoretical analysis and extensive experiments on CLIP models demonstrate that SharpZO significantly improves accuracy and convergence speed, achieving up to 7% average gain over state-of-the-art forward-only methods.
☆ Can Gradient Descent Simulate Prompting?
There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM's own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model performance -- showing improvement on the ``reversal curse'' tasks, and answering questions about text passages after a single gradient update. These results suggest that, with appropriate initialization, gradient descent can be surprisingly expressive. Our results suggest new avenues for long-context modeling and offer insight into the generalization capabilities of gradient-based learning.
comment: 14 pages, 2 figures
☆ EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.
comment: Under review
☆ Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding
Antibody design remains a critical challenge in therapeutic and diagnostic development, particularly for complex antigens with diverse binding interfaces. Current computational methods face two main limitations: (1) capturing geometric features while preserving symmetries, and (2) generalizing novel antigen interfaces. Despite recent advancements, these methods often fail to accurately capture molecular interactions and maintain structural integrity. To address these challenges, we propose \textbf{AbMEGD}, an end-to-end framework integrating \textbf{M}ulti-scale \textbf{E}quivariant \textbf{G}raph \textbf{D}iffusion for antibody sequence and structure co-design. Leveraging advanced geometric deep learning, AbMEGD combines atomic-level geometric features with residue-level embeddings, capturing local atomic details and global sequence-structure interactions. Its E(3)-equivariant diffusion method ensures geometric precision, computational efficiency, and robust generalizability for complex antigens. Furthermore, experiments using the SAbDab database demonstrate a 10.13\% increase in amino acid recovery, 3.32\% rise in improvement percentage, and a 0.062~\AA\ reduction in root mean square deviation within the critical CDR-H3 region compared to DiffAb, a leading antibody design model. These results highlight AbMEGD's ability to balance structural integrity with improved functionality, establishing a new benchmark for sequence-structure co-design and affinity optimization. The code is available at: https://github.com/Patrick221215/AbMEGD.
comment: 9 pages, 4 figures, accepted at IJCAI 2025
☆ Model State Arithmetic for Machine Unlearning
Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance. Eliminating the influence of such problematic datapoints through complete retraining -- by repeatedly pretraining the model on datasets that exclude these specific instances -- is computationally prohibitive. For this reason, unlearning algorithms have emerged that aim to eliminate the influence of particular datapoints, while otherwise preserving the model -- at a low computational cost. However, precisely estimating and undoing the influence of individual datapoints has proved to be challenging. In this work, we propose a new algorithm, MSA, for estimating and undoing the influence of datapoints -- by leveraging model checkpoints i.e. artifacts capturing model states at different stages of pretraining. Our experimental results demonstrate that MSA consistently outperforms existing machine unlearning algorithms across multiple benchmarks, models, and evaluation metrics, suggesting that MSA could be an effective approach towards more flexible large language models that are capable of data erasure.
comment: Preprint. Work in progress
♻ ☆ Chain-of-Sketch: Enabling Global Visual Reasoning
Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features do not provide significant information. Minsky and Papert put forward such tasks in 1969 with their connectivity study, exposing the limitations of the perceptron model. In this paper, we introduce an expanded set of global visual datasets involving graphs, strings, mazes, and image grids. We show that large vision models still struggle to learn these tasks efficiently. Similarly, state-of-the-art multi-modal LLMs perform poorly on these datasets. We explain this learning inefficiency by means of the 'globality degree' measure. To mitigate this, we propose a method called chain-of-sketch (CoS). Similar to the chain-of-thought and scratchpad techniques used in language models, CoS breaks the original task into intermediate visual steps to help learn a complex task. In addition, we show that not all CoS strategies perform equally well. Our key insight is to impose a Markovian structure on the CoS frames. This leads to the introduction of 'inductive CoS' which achieves better out-of-distribution generalization and performs well even with smaller models compared to non-inductive variants.
comment: additional experiments added, title changed from "Visual Scratchpads: Enabling Global Reasoning in Vision" to "Chain-of-Sketch: Enabling Global Visual Reasoning"
♻ ☆ Mesh-Informed Neural Operator : A Transformer Generative Approach
Generative models in function spaces, situated at the intersection of generative modeling and operator learning, are attracting increasing attention due to their immense potential in diverse scientific and engineering applications. While functional generative models are theoretically domain- and discretization-agnostic, current implementations heavily rely on the Fourier Neural Operator (FNO), limiting their applicability to regular grids and rectangular domains. To overcome these critical limitations, we introduce the Mesh-Informed Neural Operator (MINO). By leveraging graph neural operators and cross-attention mechanisms, MINO offers a principled, domain- and discretization-agnostic backbone for generative modeling in function spaces. This advancement significantly expands the scope of such models to more diverse applications in generative, inverse, and regression tasks. Furthermore, MINO provides a unified perspective on integrating neural operators with general advanced deep learning architectures. Finally, we introduce a suite of standardized evaluation metrics that enable objective comparison of functional generative models, addressing another critical gap in the field.
♻ ☆ Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.
♻ ☆ NY Real Estate Racial Equity Analysis via Applied Machine Learning
This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities. We use an advanced race/ethnicity imputation model (LSTM+Geo with XGBoost filtering, validated at 89.2% accuracy) to compare the predicted racial composition of property owners to the resident population from census data. We examine both a Full Model (statewide) and a Name-Only LSTM Model (NYC) to assess how incorporating geospatial context affects our predictions and disparity estimates. The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners. These disparities are most pronounced in minority-majority neighborhoods, where ownership is predominantly White despite a predominantly non-White population. Corporate ownership (LLCs, trusts, etc.) exacerbates these gaps by reducing owner-occupied opportunities in urban minority communities. We provide a breakdown of ownership vs. population by race for majority-White, -Black, -Hispanic, and -Asian tracts, identify those with extreme ownership disparities, and compare patterns in urban, suburban, and rural contexts. The findings underscore persistent racial inequity in property ownership, reflecting broader historical and socio-economic forces, and highlight the importance of data-driven approaches to address these issues.
comment: updated/replaced stale reference links. Added narrative covering gentrification, racial capitalism, financialization of housing, and segregation. Moved model details to appendices. Added Nivea
♻ ☆ Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment AAAI 2026
While large-scale unsupervised language models (LMs) capture broad world knowledge and reasoning capabilities, steering their behavior toward desired objectives remains challenging due to the lack of explicit supervision. Existing alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on training a reward model and performing reinforcement learning to align with human preferences. However, RLHF is often computationally intensive, unstable, and sensitive to hyperparameters. To address these limitations, Direct Preference Optimization (DPO) was introduced as a lightweight and stable alternative, enabling direct alignment of language models with pairwise preference data via classification loss. However, DPO and its extensions generally assume a single static preference distribution, limiting flexibility in multi-objective or dynamic alignment settings. In this paper, we propose a novel framework: Multi-Preference Lambda-weighted Listwise DPO, which extends DPO to incorporate multiple human preference dimensions (e.g., helpfulness, harmlessness, informativeness) and enables dynamic interpolation through a controllable simplex-weighted formulation. Our method supports both listwise preference feedback and flexible alignment across varying user intents without re-training. Empirical and theoretical analysis demonstrates that our method is as effective as traditional DPO on static objectives while offering greater generality and adaptability for real-world deployment.
comment: 10 pages, 4 figures, appendix included. To appear in Proceedings of AAAI 2026. Code: https://github.com/yuhui15/Multi-Preference-Lambda-weighted-DPO
♻ ☆ One Model to Forecast Them All and in Entity Distributions Bind Them
Probabilistic forecasting in power systems often involves multi-entity datasets like households, feeders, and wind turbines, where generating reliable entity-specific forecasts presents significant challenges. Traditional approaches require training individual models for each entity, making them inefficient and hard to scale. This study addresses this problem using GUIDE-VAE, a conditional variational autoencoder that allows entity-specific probabilistic forecasting using a single model. GUIDE-VAE provides flexible outputs, ranging from interpretable point estimates to full probability distributions, thanks to its advanced covariance composition structure. These distributions capture uncertainty and temporal dependencies, offering richer insights than traditional methods. To evaluate our GUIDE-VAE-based forecaster, we use household electricity consumption data as a case study due to its multi-entity and highly stochastic nature. Experimental results demonstrate that GUIDE-VAE outperforms conventional quantile regression techniques across key metrics while ensuring scalability and versatility. These features make GUIDE-VAE a powerful and generalizable tool for probabilistic forecasting tasks, with potential applications beyond household electricity consumption.
♻ ☆ Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Although multilingual LLMs have achieved remarkable performance across benchmarks, we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that LLMs are pretrained with orthographic scripts, which are dominated by Latin characters that obscure their shared phonology with non-Latin scripts. We propose leveraging phonemic transcriptions as complementary signals to induce script-invariant representations. Our study demonstrates that integrating phonemic signals improves performance across both non-Latin and Latin script languages, with a particularly significant impact on closing the performance gap between the two. Through detailed experiments, we show that phonemic and orthographic scripts retrieve distinct examples for in-context learning (ICL). This motivates our proposed Mixed-ICL retrieval strategy, where further aggregation from both leads to our significant performance improvements for both Latin script languages (up to 12.6%) and non-Latin script languages (up to 15.1%) compared to randomized ICL retrieval.
comment: Accepted to NAACL 2025 (Main Conference). This version contains minor improvements to the camera-ready
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents
Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking. All the related resources, including industry products, research papers, benchmark datasets, and open-source implementations, are collected for the community in https://github.com/DavidZWZ/Awesome-Deep-Research.
♻ ☆ In-Context Learning Strategies Emerge Rationally
Recent work analyzing in-context learning (ICL) has identified a broad set of strategies that describe model behavior in different experimental conditions. We aim to unify these findings by asking why a model learns these disparate strategies in the first place. Specifically, we start with the observation that when trained to learn a mixture of tasks, as is popular in the literature, the strategies learned by a model for performing ICL can be captured by a family of Bayesian predictors: a memorizing predictor, which assumes a discrete prior on the set of seen tasks, and a generalizing predictor, where the prior matches the underlying task distribution. Adopting the normative lens of rational analysis, where a learner's behavior is explained as an optimal adaptation to data given computational constraints, we develop a hierarchical Bayesian framework that almost perfectly predicts Transformer next-token predictions throughout training -- without assuming access to its weights. Under this framework, pretraining is viewed as a process of updating the posterior probability of different strategies, and inference-time behavior as a posterior-weighted average over these strategies' predictions. Our framework draws on common assumptions about neural network learning dynamics, which make explicit a tradeoff between loss and complexity among candidate strategies: beyond how well it explains the data, a model's preference towards implementing a strategy is dictated by its complexity. This helps explain well-known ICL phenomena, while offering novel predictions: e.g., we show a superlinear trend in the timescale for transitioning from generalization to memorization as task diversity increases. Overall, our work advances an explanatory and predictive account of ICL grounded in tradeoffs between strategy loss and complexity.
comment: Preprint
♻ ☆ Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs
We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we can stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$, we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$, adding ${\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $\Theta(\sqrt{TK(1+d/C)+Td\log(K)})$ and the optimal capacity is $\Theta(\min\{K/\log(K),d\})$ in the bandit setting, while in the full-information feedback setting, the minimax regret is $\Theta(\sqrt{T(d+1)\log(K)})$ and the optimal capacity is $\Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel preemptive and non-preemptive scheduling policies, based on Pareto-distributed proxy delays, and batching techniques, respectively. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.
♻ ☆ Fake it till You Make it: Reward Modeling as Discriminative Prediction
An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models. However, current approaches of reward modeling suffer from implementation complexity due to their reliance on extensive human-annotated preference data or meticulously engineered quality dimensions that are often incomplete and engineering-intensive. Inspired by adversarial training in generative adversarial networks (GANs), this paper proposes GAN-RM, an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering. Our method trains the reward model through discrimination between a small set of representative, unpaired target samples(denoted as Preference Proxy Data) and model-generated ordinary outputs, requiring only a few hundred target samples. Comprehensive experiments demonstrate our GAN-RM's effectiveness across multiple key applications including test-time scaling implemented as Best-of-N sample filtering, post-training approaches like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Code and data will be released at https://github.com/Visualignment/GAN-RM.
♻ ☆ Measurement to Meaning: A Validity-Centered Framework for AI Evaluation
While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence. For instance, our framework helps determine whether performance on a mathematical benchmark is an indication of the ability to solve problems on math tests or instead indicates a broader ability to reason. Our framework is well-suited for the contemporary paradigm in machine learning, where various stakeholders provide measurements and evaluations that downstream users use to validate their claims and decisions. At the same time, our framework also informs the construction of evaluations designed to speak to the validity of the relevant claims. By leveraging psychometrics' breakdown of validity, evaluations can prioritize the most critical facets for a given claim, improving empirical utility and decision-making efficacy. We illustrate our framework through detailed case studies of vision and language model evaluations, highlighting how explicitly considering validity strengthens the connection between evaluation evidence and the claims being made.
comment: Correspondence to olawale@mit.edu
♻ ☆ PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries
LLM serving systems typically treat user prompts as monolithic inputs, optimizing inference through decoding tricks or inter-query batching. However, many real-world prompts contain latent semantic parallelism--decomposable structures where subtasks can be executed independently to reduce latency while preserving meaning. We introduce PARALLELPROMPT, the first benchmark for measuring intra-query parallelism in natural user prompts. Our dataset comprises over 37,000 real-world prompts from public LLM chat logs, each annotated with a structured schema capturing task templates, shared context, and iteration inputs. These schemas are extracted using LLM-assisted prompting with rule-based multilingual validation. To evaluate the benefits of decomposition, we provide an execution suite that benchmarks serial vs. parallel strategies, measuring latency, structural adherence, and semantic fidelity. Our results show that intra-query parallelism can be successfully parsed in over 75% of curated datasets, unlocking up to 5x speedups on tasks like translation, comprehension, and comparative analysis, with minimal quality degradation. By releasing this benchmark, curation pipeline, and evaluation suite, we provide the first standardized testbed for studying structure-aware execution in LLM serving pipelines.
comment: In Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning
♻ ☆ New Bounds for Sparse Variational Gaussian Processes
Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models. At the core of these methods is the assumption that the true posterior distribution over training function values ${\bf f}$ and inducing variables ${\bf u}$ is approximated by a variational distribution that incorporates the conditional GP prior $p({\bf f} | {\bf u})$ in its factorization. While this assumption is considered as fundamental, we show that for model training we can relax it through the use of a more general variational distribution $q({\bf f} | {\bf u})$ that depends on $N$ extra parameters, where $N$ is the number of training examples. In GP regression, we can analytically optimize the evidence lower bound over the extra parameters and express a tractable collapsed bound that is tighter than the previous bound. The new bound is also amenable to stochastic optimization and its implementation requires minor modifications to existing sparse GP code. Further, we also describe extensions to non-Gaussian likelihoods. On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance.
comment: 18 pages, 5 figures
♻ ☆ Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Large language models like GPT, LLAMA, and Claude have become incredibly powerful at generating text, but they are still black boxes, so it is hard to understand how they decide what to say. That lack of transparency can be problematic, especially in fields where trust and accountability matter. To help with this, we introduce SMILE, a new method that explains how these models respond to different parts of a prompt. SMILE is model-agnostic and works by slightly changing the input, measuring how the output changes, and then highlighting which words had the most impact. Create simple visual heat maps showing which parts of a prompt matter the most. We tested SMILE on several leading LLMs and used metrics such as accuracy, consistency, stability, and fidelity to show that it gives clear and reliable explanations. By making these models easier to understand, SMILE brings us one step closer to making AI more transparent and trustworthy.
comment: The submission contains incorrect references that require substantial revision
♻ ☆ Graph Neural Network for Neutrino Physics Event Reconstruction
Liquid Argon Time Projection Chamber (LArTPC) detector technology offers a wealth of high-resolution information on particle interactions, and leveraging that information to its full potential requires sophisticated automated reconstruction techniques. This article describes NuGraph2, a Graph Neural Network (GNN) for low-level reconstruction of simulated neutrino interactions in a LArTPC detector. Simulated neutrino interactions in the MicroBooNE detector geometry are described as heterogeneous graphs, with energy depositions on each detector plane forming nodes on planar subgraphs. The network utilizes a multi-head attention message-passing mechanism to perform background filtering and semantic labelling on these graph nodes, identifying those associated with the primary physics interaction with 98.0\% efficiency and labelling them according to particle type with 94.9\% efficiency. The network operates directly on detector observables across multiple 2D representations, but utilizes a 3D-context-aware mechanism to encourage consistency between these representations. Model inference takes 0.12~s/event on a CPU, and 0.005s/event batched on a GPU. This architecture is designed to be a general-purpose solution for particle reconstruction in neutrino physics, with the potential for deployment across a broad range of detector technologies, and offers a core convolution engine that can be leveraged for a variety of tasks beyond the two described in this article.
comment: 18 pages, 14 figures, published in Physical Review D
♻ ☆ The Sample Complexity of Learning Lipschitz Operators with respect to Gaussian Measures
Operator learning, the approximation of mappings between infinite-dimensional function spaces using machine learning, has gained increasing research attention in recent years. Approximate operators, learned from data, can serve as efficient surrogate models for problems in computational science and engineering, complementing traditional methods. However, despite their empirical success, our understanding of the underlying mathematical theory is in large part still incomplete. In this paper, we study the approximation of Lipschitz operators with respect to Gaussian measures. We prove higher Gaussian Sobolev regularity of Lipschitz operators and establish lower and upper bounds on the Hermite polynomial approximation error. We then study general reconstruction strategies of Lipschitz operators from $m$ arbitrary (potentially adaptive) linear samples. As a key finding, we tightly characterize the corresponding sample complexity, that is, the smallest achievable worst-case error among all possible choices of (adaptive) sampling and reconstruction strategies in terms of $m$. As a consequence, we identify an inherent curse of sample complexity: No method to approximate Lipschitz operators based on $m$ linear samples can achieve algebraic convergence rates in $m$. On the positive side, we prove that a sufficiently fast spectral decay of the covariance operator of the underlying Gaussian measure guarantees convergence rates which are arbitrarily close to any algebraic rate. Overall, by tightly characterizing the sample complexity, our work confirms the intrinsic difficulty of learning Lipschitz operators, regardless of the data or learning technique.
comment: Section 6 about pointwise sampling in v2 of this paper has been cut and will appear elsewhere
♻ ☆ TracLLM: A Generic Framework for Attributing Long Context LLMs
Long context large language models (LLMs) are deployed in many real-world applications such as RAG, agent, and broad LLM-integrated applications. Given an instruction and a long context (e.g., documents, PDF files, webpages), a long context LLM can generate an output grounded in the provided context, aiming to provide more accurate, up-to-date, and verifiable outputs while reducing hallucinations and unsupported claims. This raises a research question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs) in the context that contribute most to or are responsible for the generated output by an LLM? This process, which we call context traceback, has various real-world applications, such as 1) debugging LLM-based systems, 2) conducting post-attack forensic analysis for attacks (e.g., prompt injection attack, knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources to enhance the trust of users towards outputs generated by LLMs. When applied to context traceback for long context LLMs, existing feature attribution methods such as Shapley have sub-optimal performance and/or incur a large computational cost. In this work, we develop TracLLM, the first generic context traceback framework tailored to long context LLMs. Our framework can improve the effectiveness and efficiency of existing feature attribution methods. To improve the efficiency, we develop an informed search based algorithm in TracLLM. We also develop contribution score ensemble/denoising techniques to improve the accuracy of TracLLM. Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM. Our code and data are at: https://github.com/Wang-Yanting/TracLLM.
comment: To appear in USENIX Security Symposium 2025. The code and data are at: https://github.com/Wang-Yanting/TracLLM
♻ ☆ Continual Learning as Computationally Constrained Reinforcement Learning
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.
♻ ☆ Improving Stochastic Cubic Newton with Momentum
We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
♻ ☆ Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional ICML 2025
Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model's original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.
comment: ICML 2025
♻ ☆ Representation Learning of Lab Values via Masked AutoEncoders
Accurate imputation of missing laboratory values in electronic health records (EHRs) is critical to enable robust clinical predictions and reduce biases in AI systems in healthcare. Existing methods, such as XGBoost, softimpute, GAIN, Expectation Maximization (EM), and MICE, struggle to model the complex temporal and contextual dependencies in EHR data, particularly in underrepresented groups. In this work, we propose Lab-MAE, a novel transformer-based masked autoencoder framework that leverages self-supervised learning for the imputation of continuous sequential lab values. Lab-MAE introduces a structured encoding scheme that jointly models laboratory test values and their corresponding timestamps, enabling explicit capturing temporal dependencies. Empirical evaluation on the MIMIC-IV dataset demonstrates that Lab-MAE significantly outperforms state-of-the-art baselines such as XGBoost, softimpute, GAIN, EM, and MICE across multiple metrics, including root mean square error (RMSE), R-squared (R2), and Wasserstein distance (WD). Notably, Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions. We further investigate the role of follow-up laboratory values as potential shortcut features, revealing Lab-MAE's robustness in scenarios where such data is unavailable. The findings suggest that our transformer-based architecture, adapted to the characteristics of EHR data, offers a foundation model for more accurate and fair clinical imputation. In addition, we measure and compare the carbon footprint of Lab-MAE with the a XGBoost model, highlighting its environmental requirements.
comment: 14 pages of main text, 11 appendix
♻ ☆ HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps
We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.
♻ ☆ Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.
♻ ☆ On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds considerable promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks -- with standard architectures trained in a standard, supervised way -- learn symmetries from data. Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others -- only a subset. In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning. The group-cyclic nature of the dataset allows us to analyze the Gram matrix of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of class separation (signal) and class-orbit density (noise). This characterization reveals that generalization can only be successful when the local structure of the data prevails over its non-local, symmetry-induced structure, in the kernel space defined by the architecture. We extend our theoretical treatment to any finite group, including non-abelian groups. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional deep networks lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
comment: JMLR accepted version, including an extension of the theory to general finite groups (including non-abelian groups)
♻ ☆ Learning Value of Information towards Joint Communication and Control in 6G V2X
As Cellular Vehicle-to-Everything (C-V2X) evolves towards future sixth-generation (6G) networks, Connected Autonomous Vehicles (CAVs) are emerging to become a key application. Leveraging data-driven Machine Learning (ML), especially Deep Reinforcement Learning (DRL), is expected to significantly enhance CAV decision-making in both vehicle control and V2X communication under uncertainty. These two decision-making processes are closely intertwined, with the value of information (VoI) acting as a crucial bridge between them. In this paper, we introduce Sequential Stochastic Decision Process (SSDP) models to define and assess VoI, demonstrating their application in optimizing communication systems for CAVs. Specifically, we formally define the SSDP model and demonstrate that the MDP model is a special case of it. The SSDP model offers a key advantage by explicitly representing the set of information that can enhance decision-making when available. Furthermore, as current research on VoI remains fragmented, we propose a systematic VoI modeling framework grounded in the MDP, Reinforcement Learning (RL) and Optimal Control theories. We define different categories of VoI and discuss their corresponding estimation methods. Finally, we present a structured approach to leverage the various VoI metrics for optimizing the ``When", ``What", and ``How" to communicate problems. For this purpose, SSDP models are formulated with VoI-associated reward functions derived from VoI-based optimization objectives. While we use a simple vehicle-following control problem to illustrate the proposed methodology, it holds significant potential to facilitate the joint optimization of stochastic, sequential control and communication decisions in a wide range of networked control systems.
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks
Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
♻ ☆ Regret Bounds for Robust Online Decision Making
We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).
♻ ☆ A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis
In this work, a scalable quantum neural network is introduced as a means to approximate any unitary evolution through the Standard Recursive Block Basis (SRBB) and, subsequently, redesigned with a number of CNOTs asymptotically reduced by an exponential contribution. This algebraic approach to the problem of unitary synthesis exploits Lie algebras and their topological features to obtain scalable parameterizations of unitary operators. First, the original SRBB-based scalability scheme, already known in the literature only from a theoretical point of view, is reformulated for efficient algorithm implementation and complexity management. Remarkably, 2-qubit operators emerge as a special case outside the original scaling scheme. Furthermore, an algorithm is proposed to reduce the number of CNOTs, thus deriving a new implementable scaling scheme that requires only one layer of approximation. The scalable CNOT-reduced quantum neural network is implemented and its performance is assessed with a variety of different unitary matrices, both sparse and dense, up to 6 qubits via the PennyLane library. The effectiveness of the approximation is measured with different metrics in relation to two optimizers: a gradient-based method and the Nelder-Mead method. The approximate CNOT-reduced SRBB-based synthesis algorithm is also tested on real hardware and compared with other valid approximation and decomposition methods available in the literature.
♻ ☆ ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion
Graph Neural Networks (GNNs) have demonstrated impressive performance across diverse graph-based tasks by leveraging message passing to capture complex node relationships. However, when applied to large-scale real-world graphs, GNNs face two major challenges: First, it becomes increasingly difficult to ensure both scalability and efficiency, as the repeated aggregation of large neighborhoods leads to significant computational overhead; Second, the over-smoothing problem arises, where excessive or deep propagation makes node representations indistinguishable, severely hindering model expressiveness. To tackle these issues, we propose ScaleGNN, a novel framework that adaptively fuses multi-hop node features for both scalable and effective graph learning. First, we construct per-hop pure neighbor matrices that capture only the exclusive structural information at each hop, avoiding the redundancy of conventional aggregation. Then, an enhanced feature fusion strategy significantly balances low-order and high-order information, preserving both local detail and global correlations without incurring excessive complexity. To further reduce redundancy and over-smoothing, we introduce a Local Contribution Score (LCS)-based masking mechanism to filter out less relevant high-order neighbors, ensuring that only the most meaningful information is aggregated. In addition, learnable sparse constraints selectively integrate multi-hop valuable features, emphasizing the most informative high-order neighbors. Extensive experiments on real-world datasets demonstrate that ScaleGNN consistently outperforms state-of-the-art GNNs in both predictive accuracy and computational efficiency, highlighting its practical value for large-scale graph learning.
♻ ☆ Context-Aware Doubly-Robust Semi-Supervised Learning
The widespread adoption of artificial intelligence (AI) in next-generation communication systems is challenged by the heterogeneity of traffic and network conditions, which call for the use of highly contextual, site-specific, data. A promising solution is to rely not only on real-world data, but also on synthetic pseudo-data generated by a network digital twin (NDT). However, the effectiveness of this approach hinges on the accuracy of the NDT, which can vary widely across different contexts. To address this problem, this paper introduces context-aware doubly-robust (CDR) learning, a novel semi-supervised scheme that adapts its reliance on the pseudo-data to the different levels of fidelity of the NDT across contexts. CDR is evaluated on the task of downlink beamforming where it outperforms previous state-of-the-art approaches, providing a 24% loss decrease when compared to doubly-robust (DR) semi-supervised learning in regimes with low labeled data availability.
comment: This work has been accepted for publication in IEEE Signal Processing Letters
♻ ☆ Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance
Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.
♻ ☆ Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols
Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have gained considerable attention. These protocols leverage the privacy advantages of local differential privacy (LDP) and the effectiveness of GNN's message-passing in calibrating noisy data, offering strict privacy guarantees for users' local data while maintaining high utility (e.g., node classification accuracy) for graph learning. Despite these advantages, such protocols may be vulnerable to data poisoning attacks, a threat that has not been considered in previous research. Identifying and addressing these threats is crucial for ensuring the robustness and security of privacy-preserving graph learning frameworks. This work introduces the first data poisoning attack targeting locally private graph learning protocols. The attacker injects fake users into the protocol, manipulates these fake users to establish links with genuine users, and sends carefully crafted data to the server, ultimately compromising the utility of private graph learning. The effectiveness of the attack is demonstrated both theoretically and empirically. In addition, several defense strategies have also been explored, but their limited effectiveness highlights the need for more robust defenses.
♻ ☆ Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
The most widely used generative models map noise and data distributions by matching flows or scores. However, they struggle to incorporate partial observations and additional priors--something energy-based models (EBMs) handle elegantly by simply adding corresponding scalar energy terms. We address this issue by proposing Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs. Far from the data manifold, samples move along curl-free, optimal transport paths from noise to data. As they approach the data manifold, an entropic energy term guides the system into a Boltzmann equilibrium distribution, explicitly capturing the underlying likelihood structure of the data. We parameterize this dynamic with a single time-independent scalar field, which serves as both a powerful generator and a flexible prior for effective regularization of inverse problems. Our method substantially outperforms existing EBMs on CIFAR-10 and ImageNet generation in terms of fidelity, while retaining simulation-free training of transport-based approaches away from the data manifold. Furthermore, we leverage the method's flexibility to introduce an interaction energy that supports diverse mode exploration, which we demonstrate in a controlled protein-generation setting. Our approach focuses on learning a scalar potential energy--without time-conditioning, auxiliary generators, or additional networks--which marks a significant departure from recent EBM methods. We believe that this simplified framework significantly advances EBMs capabilities and paves the way for their wider adoption in generative modeling across diverse domains.
♻ ☆ Lagrangian Index Policy for Restless Bandits with Average Reward
We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.
♻ ☆ A GREAT Architecture for Edge-Based Graph Problems Like TSP
In the last years, many learning-based approaches have been proposed to tackle combinatorial optimization problems such as routing problems. Many of these approaches are based on graph neural networks (GNNs) or related transformers, operating on the Euclidean coordinates representing the routing problems. However, models operating on Euclidean coordinates are ill-suited for non-Euclidean, asymmetric problem instances that are often found in real-world settings. To overcome this limitation, we propose a novel GNN-based and edge-focused neural model called Graph Edge Attention Network (GREAT). Using GREAT as an encoder to capture the properties of a routing problem instance, we build a reinforcement learning framework which we apply to Euclidean and non-Euclidean variants of vehicle routing problems such as Traveling Salesman Problem, Capacitated Vehicle Routing Problem and Orienteering Problem. Our framework is among the first to tackle non-Euclidean variants of these problems and achieves competitive results among learning-based solvers.
comment: 15 pages, 7 figures
♻ ☆ These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining
Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data. However, a significant challenge remains in ensuring that transferred features are sufficient to handle unseen datasets, amplified by the difficulty of quantifying whether two tasks are "related". To address these challenges, we evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training. We identify a fundamental limitation in deep learning models -- an "information saturation bottleneck" -- where networks fail to learn new features once they encode similar competing features during training. When restricted to learning only a subset of key features during pretraining, models will permanently lose critical features for transfer and perform inconsistently on data distributions, even components of the training mixture. Empirical evidence from published studies suggests that this phenomenon is pervasive in deep learning architectures -- factors such as data distribution or ordering affect the features that current representation learning methods can learn over time. This study suggests that relying solely on large-scale networks may not be as effective as focusing on task-specific training, when available. We propose richer feature representations as a potential solution to better generalize across new datasets and, specifically, present existing methods alongside a novel approach, the initial steps towards addressing this challenge.
comment: 10 pages, 7 figures, Preprint. Under review
♻ ☆ Simulating Hard Attention Using Soft Attention
We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several subclasses of languages recognized by hard-attention transformers, which can be defined in variants of linear temporal logic. We demonstrate how soft-attention transformers can compute formulas of these logics using unbounded positional embeddings or temperature scaling. Second, we demonstrate how temperature scaling allows softmax transformers to simulate general hard-attention transformers, using a temperature that depends on the minimum gap between the maximum attention scores and other attention scores.
comment: 19 pages
♻ ☆ Wavelet Diffusion Neural Operator
Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling system states with abrupt changes and generalizing to higher resolutions. In this work, we propose Wavelet Diffusion Neural Operator (WDNO), a novel PDE simulation and control framework that enhances the handling of these complexities. WDNO comprises two key innovations. Firstly, WDNO performs diffusion-based generative modeling in the wavelet domain for the entire trajectory to handle abrupt changes and long-term dependencies effectively. Secondly, to address the issue of poor generalization across different resolutions, which is one of the fundamental tasks in modeling physical systems, we introduce multi-resolution training. We validate WDNO on five physical systems, including 1D advection equation, three challenging physical systems with abrupt changes (1D Burgers' equation, 1D compressible Navier-Stokes equation and 2D incompressible fluid), and a real-world dataset ERA5, which demonstrates superior performance on both simulation and control tasks over state-of-the-art methods, with significant improvements in long-term and detail prediction accuracy. Remarkably, in the challenging context of the 2D high-dimensional and indirect control task aimed at reducing smoke leakage, WDNO reduces the leakage by 78% compared to the second-best baseline. The code can be found at https://github.com/AI4Science-WestlakeU/wdno.git.
♻ ☆ Radio Map Estimation via Latent Domain Plug-and-Play Denoising
Radio map estimation (RME), also known as spectrum cartography, aims to reconstruct the strength of radio interference across different domains (e.g., space and frequency) from sparsely sampled measurements. To tackle this typical inverse problem, state-of-the-art RME methods rely on handcrafted or data-driven structural information of radio maps. However, the former often struggles to model complex radio frequency (RF) environments and the latter requires excessive training -- making it hard to quickly adapt to in situ sensing tasks. This work presents a spatio-spectral RME approach based on plug-and-play (PnP) denoising, a technique from computational imaging. The idea is to leverage the observation that the denoising operations of signals like natural images and radio maps are similar -- despite the nontrivial differences of the signals themselves. Hence, sophisticated denoisers designed for or learned from natural images can be directly employed to assist RME, avoiding using radio map data for training. Unlike conventional PnP methods that operate directly in the data domain, the proposed method exploits the underlying physical structure of radio maps and proposes an ADMM algorithm that denoises in a latent domain. This design significantly improves computational efficiency and enhances noise robustness. Theoretical aspects, e.g., recoverability of the complete radio map and convergence of the ADMM algorithm are analyzed. Synthetic and real data experiments are conducted to demonstrate the effectiveness of our approach.
♻ ☆ Capturing Style in Author and Document Representation
A wide range of Deep Natural Language Processing (NLP) models integrates continuous and low dimensional representations of words and documents. Surprisingly, very few models study representation learning for authors. These representations can be used for many NLP tasks, such as author identification and classification, or in recommendation systems. A strong limitation of existing works is that they do not explicitly capture writing style, making them hardly applicable to literary data. We therefore propose a new architecture based on Variational Information Bottleneck (VIB) that learns embeddings for both authors and documents with a stylistic constraint. Our model fine-tunes a pre-trained document encoder. We stimulate the detection of writing style by adding predefined stylistic features making the representation axis interpretable with respect to writing style indicators. We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship Corpus and IMDb62, for which we show that it matches or outperforms strong/recent baselines in authorship attribution while capturing much more accurately the authors stylistic aspects.
♻ ☆ Rapid Gyroscope Calibration: A Deep Learning Approach
Low-cost gyroscope calibration is essential for ensuring the accuracy and reliability of gyroscope measurements. Stationary calibration estimates the deterministic parts of measurement errors. To this end, a common practice is to average the gyroscope readings during a predefined period and estimate the gyroscope bias. Calibration duration plays a crucial role in performance, therefore, longer periods are preferred. However, some applications require quick startup times and calibration is therefore allowed only for a short time. In this work, we focus on reducing low-cost gyroscope calibration time using deep learning methods. We propose an end-to-end convolutional neural network for the application of gyroscope calibration. We explore the possibilities of using multiple real and virtual gyroscopes to improve the calibration performance of single gyroscopes. To train and validate our approach, we recorded a dataset consisting of 186.6 hours of gyroscope readings, using 36 gyroscopes of four different brands. We also created a virtual dataset consisting of simulated gyroscope readings. The six datasets were used to evaluate our proposed approach. One of our key achievements in this work is reducing gyroscope calibration time by up to 89% using three low-cost gyroscopes. Our dataset is publicly available to allow reproducibility of our work and to increase research in the field.
comment: 10 Pages, 14 Figures
♻ ☆ Balancing Privacy, Robustness, and Efficiency in Machine Learning
This position paper argues that achieving robustness, privacy, and efficiency simultaneously in machine learning systems is infeasible under prevailing threat models. The tension between these goals arises not from algorithmic shortcomings but from structural limitations imposed by worst-case adversarial assumptions. We advocate for a systematic research agenda aimed at formalizing the robustness-privacy-efficiency trilemma, exploring how principled relaxations of threat models can unlock better trade-offs, and designing benchmarks that expose rather than obscure the compromises made. By shifting focus from aspirational universal guarantees to context-aware system design, the machine learning community can build models that are truly appropriate for real-world deployment.
♻ ☆ Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs
Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.
♻ ☆ LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities arising from human-AI collaboration. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems.
comment: Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems
♻ ☆ Seal Your Backdoor with Variational Defense ICCV 2025
We propose VIBE, a model-agnostic framework that trains classifiers resilient to backdoor attacks. The key concept behind our approach is to treat malicious inputs and corrupted labels from the training dataset as observed random variables, while the actual clean labels are latent. VIBE then recovers the corresponding latent clean label posterior through variational inference. The resulting training procedure follows the expectation-maximization (EM) algorithm. The E-step infers the clean pseudolabels by solving an entropy-regularized optimal transport problem, while the M-step updates the classifier parameters via gradient descent. Being modular, VIBE can seamlessly integrate with recent advancements in self-supervised representation learning, which enhance its ability to resist backdoor attacks. We experimentally validate the method effectiveness against contemporary backdoor attacks on standard datasets, a large-scale setup with 1$k$ classes, and a dataset poisoned with multiple attacks. VIBE consistently outperforms previous defenses across all tested scenarios.
comment: Accepted to ICCV 2025
♻ ☆ PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp
The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.
♻ ☆ Variational Supervised Contrastive Learning
Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.
♻ ☆ Moderating the Generalization of Score-based Generative Model
Score-based Generative Models (SGMs) have demonstrated remarkable generalization abilities, e.g. generating unseen, but natural data. However, the greater the generalization power, the more likely the unintended generalization, and the more dangerous the abuse. Research on moderated generalization in SGMs remains limited. To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i.e., re-training the model after removing the undesirable training data, and find it does not work in SGMs. Further analysis of score functions reveals that the MU 'gold standard' does not alter the original score function, which explains its ineffectiveness. Based on this insight, we propose the first Moderated Score-based Generative Model (MSGM), which introduces a novel score adjustment strategy that redirects the score function away from undesirable data during the continuous-time stochastic differential equation process. Extensive experimental results demonstrate that MSGM significantly reduces the likelihood of generating undesirable content while preserving high visual quality for normal image generation. Albeit designed for SGMs, MSGM is a general and flexible MU framework that is compatible with diverse diffusion architectures (SGM and DDPM) and training strategies (re-training and fine-tuning), and enables zero-shot transfer of the pre-trained models to downstream tasks, e.g. image inpainting and reconstruction. The code will be shared upon acceptance.
♻ ☆ Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model's exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy Optimization variant) to incentivize and activate the model's latent reasoning capacity. Subsequently, the targeted SFT stage addresses two key challenges identified during RL: (1) \textit{inefficient trajectory sampling} for tasks where the model possesses but inconsistently applies correct reasoning, which we tackle using self-distilled reasoning trajectories from the RL model itself; and (2) \textit{fundamental capability absence}, which we address by injecting expert-augmented knowledge for prompts where the model entirely fails. This strategic application of RL for incentivization followed by SFT for enhancement forms the core of Metis-RISE, leading to two versions of our MLLMs (7B and 72B parameters). Evaluations on the OpenCompass Multimodal Reasoning Leaderboard demonstrate that both models achieve state-of-the-art performance among similar-sized models, with the 72B version ranking fourth overall. Please refer to our project page for open-source information.
comment: Project Page: https://github.com/MM-Thinking/Metis-RISE
♻ ☆ Self-Regulated Neurogenesis for Online Data-Incremental Learning
Neural networks often struggle with catastrophic forgetting when learning sequences of tasks or data streams, unlike humans who can continuously learn and consolidate new concepts even in the absence of explicit cues. Online data-incremental learning seeks to emulate this capability by processing each sample only once, without having access to task or stream cues at any point in time since this is more realistic compared to offline setups, where all data from novel class(es) is assumed to be readily available. However, existing methods typically rely on storing the subsets of data in memory or expanding the initial model architecture, resulting in significant computational overhead. Drawing inspiration from 'self-regulated neurogenesis'-brain's mechanism for creating specialized regions or circuits for distinct functions-we propose a novel approach SERENA which encodes each concept in a specialized network path called 'concept cell', integrated into a single over-parameterized network. Once a concept is learned, its corresponding concept cell is frozen, effectively preventing the forgetting of previously acquired information. Furthermore, we introduce two new continual learning scenarios that more closely reflect real-world conditions, characterized by gradually changing sample sizes. Experimental results show that our method not only establishes new state-of-the-art results across ten benchmarks but also remarkably surpasses offline supervised batch learning performance. The code is available at https://github.com/muratonuryildirim/serena.
comment: Published at Conference on Lifelong Learning Agents (CoLLAs) 2025
♻ ☆ A Novel Federated Learning-Based IDS for Enhancing UAVs Privacy and Security
Unmanned aerial vehicles (UAVs) operating within Flying Ad-hoc Networks (FANETs) encounter security challenges due to the dynamic and distributed nature of these networks. Previous studies focused predominantly on centralized intrusion detection, assuming a central entity responsible for storing and analyzing data from all devices. However, these approaches face challenges including computation and storage costs, along with a single point of failure risk, threatening data privacy and availability. The widespread dispersion of data across interconnected devices underscores the need for decentralized approaches. This paper introduces the Federated Learning-based Intrusion Detection System (FL-IDS), addressing challenges encountered by centralized systems in FANETs. FL-IDS reduces computation and storage costs for both clients and the central server, which is crucial for resource-constrained UAVs. Operating in a decentralized manner, FL-IDS enables UAVs to collaboratively train a global intrusion detection model without sharing raw data, thus avoiding delay in decisions based on collected data, as is often the case with traditional methods. Experimental results demonstrate FL-IDS's competitive performance with Central IDS (C-IDS) while mitigating privacy concerns, with the Bias Towards Specific Clients (BTSC) method further enhancing FL-IDS performance even at lower attacker ratios. Comparative analysis with traditional intrusion detection methods, including Local IDS (L-IDS), sheds light on the strengths of FL-IDS. This study significantly contributes to UAV security by introducing a privacy-aware, decentralized intrusion detection approach tailored to UAV networks. Moreover, by introducing a realistic dataset for FANETs and federated learning, our approach differs from others lacking high dynamism and 3D node movements or accurate federated data federations.
comment: Published in Internet of Things, Volume 25, 2025, Article 101592
♻ ☆ Multi-convex Programming for Discrete Latent Factor Models Prototyping
Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.
♻ ☆ Solving Inverse Problem for Multi-armed Bandits via Convex Optimization
We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.
♻ ☆ Inverse Reinforcement Learning via Convex Optimization
We consider the inverse reinforcement learning (IRL) problem, where an unknown reward function of some Markov decision process is estimated based on observed expert demonstrations. In most existing approaches, IRL is formulated and solved as a nonconvex optimization problem, posing challenges in scenarios where robustness and reproducibility are critical. We discuss a convex formulation of the IRL problem (CIRL) initially proposed by Ng and Russel, and reformulate the problem such that the domain-specific language CVXPY can be applied directly to specify and solve the convex problem. We also extend the CIRL problem to scenarios where the expert policy is not given analytically but by trajectory as state-action pairs, which can be strongly inconsistent with optimality, by augmenting some of the constraints. Theoretical analysis and practical implementation for hyperparameter auto-selection are introduced. This note helps the users to easily apply CIRL for their problems, without background knowledge on convex optimization.
♻ ☆ SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations
The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.
♻ ☆ Sharp concentration of uniform generalization errors in binary linear classification
We examine the concentration of uniform generalization errors around their expectation in binary linear classification problems via an isoperimetric argument. In particular, we establish Poincar\'{e} and log-Sobolev inequalities for the joint distribution of the output labels and the label-weighted input vectors, which we apply to derive concentration bounds. The derived concentration bounds are sharp up to moderate multiplicative constants by those under well-balanced labels. In asymptotic analysis, we also show that almost sure convergence of uniform generalization errors to their expectation occurs in very broad settings, such as proportionally high-dimensional regimes. Using this convergence, we establish uniform laws of large numbers under dimension-free conditions.
comment: 26 pages, 1 figure; minor edits to improve readability
♻ ☆ SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at https://github.com/THUDM/SceneGenAgent .
comment: Accepted to ACL 2025
♻ ☆ PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current VQ works, places greater emphasis on reducing the magnitude error. This property is contrary to the above finding, unavoidably leading to larger quantization errors. To these ends, this paper proposes Polar Coordinate Decoupled Vector Quantization (PCDVQ), an effective and efficient VQ framework consisting of two key modules: 1) Polar Coordinate Decoupling (PCD), which transforms vectors into their polar coordinate representations and perform independent quantization of the direction and magnitude parameters.2) Distribution Aligned Codebook Construction (DACC), which optimizes the direction and magnitude codebooks in accordance with the source distribution. Experimental results show that PCDVQ outperforms baseline methods at 2-bit level by at least 1.5\% zero-shot accuracy, establishing a novel paradigm for accurate and highly compressed LLMs.
♻ ☆ Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems
In the realm of activity detection for massive machine-type communications, intelligent reflecting surfaces (IRS) have shown significant potential in enhancing coverage for devices lacking direct connections to the base station (BS). However, traditional activity detection methods are typically designed for a single type of channel model, which does not reflect the complexities of real-world scenarios, particularly in systems incorporating IRS. To address this challenge, this paper introduces a novel approach that combines model-driven deep unfolding with a mixture of experts (MoE) framework. By automatically selecting one of three expert designs and applying it to the unfolded projected gradient method, our approach eliminates the need for prior knowledge of channel types between devices and the BS. Simulation results demonstrate that the proposed MoE-augmented deep unfolding method surpasses the traditional covariance-based method and black-box neural network design, delivering superior detection performance under mixed channel fading conditions.
comment: 5 pages, 5 figures, Accepted in IEEE Wireless Communications Letters
♻ ☆ Efficient Image Generation with Variadic Attention Heads CVPR
While the integration of transformers in vision models have yielded significant improvements on vision tasks they still require significant amounts of computation for both training and inference. Restricted attention mechanisms significantly reduce these computational burdens but come at the cost of losing either global or local coherence. We propose a simple, yet powerful method to reduce these trade-offs: allow the attention heads of a single transformer to attend to multiple receptive fields. We demonstrate our method utilizing Neighborhood Attention (NA) and integrate it into a StyleGAN based architecture for image generation. With this work, dubbed StyleNAT, we are able to achieve a FID of 2.05 on FFHQ, a 6% improvement over StyleGAN-XL, while utilizing 28% fewer parameters and with 4$\times$ the throughput capacity. StyleNAT achieves the Pareto Frontier on FFHQ-256 and demonstrates powerful and efficient image generation on other datasets. Our code and model checkpoints are publicly available at: https://github.com/SHI-Labs/StyleNAT
comment: Published in eLVM @ CVPR (https://openaccess.thecvf.com/content/CVPR2025W/eLVM/html/Walton_Efficient_Image_Generation_with_Variadic_Attention_Heads_CVPRW_2025_paper) | Formerly named StyleNAT: Giving Each Head a New Perspective |
♻ ☆ Proximal Point Method for Online Saddle Point Problem
This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method (OPPM), the Optimistic OPPM (OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric. The technical appendix and code can be found at https://github.com/qingxin6174/PPM-for-OSP.
♻ ☆ Review learning: Real world validation of privacy preserving continual learning across medical institutions
When a deep learning model is trained sequentially on different datasets, it often forgets the knowledge learned from previous data, a problem known as catastrophic forgetting. This damages the model's performance on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we introduce "review learning" (RevL), a low cost continual learning algorithm for diagnosis prediction using electronic health records (EHR) within a PPDL framework. RevL generates data samples from the model which are used to review knowledge from previous datasets. Six simulated institutional experiments and one real-world experiment involving three medical institutions were conducted to validate RevL, using three binary classification EHR data. In the real-world experiment with data from 106,508 patients, the mean global area under the receiver operating curve was 0.710 for RevL and 0.655 for TL. These results demonstrate RevL's ability to retain previously learned knowledge and its effectiveness in real-world PPDL scenarios. Our work establishes a realistic pipeline for PPDL research based on model transfers across institutions and highlights the practicality of continual learning in real-world medical settings using private EHR data.
♻ ☆ Genetic Algorithm with Innovative Chromosome Patterns in the Breeding Process
This paper proposes Genetic Algorithm with Border Trades (GAB), a novel modification of the standard genetic algorithm that enhances exploration by incorporating new chromosome patterns in the breeding process. This approach significantly mitigates premature convergence and improves search diversity. Empirically, GAB achieves up to 8x higher fitness and 10x faster convergence on complex job scheduling problems compared to standard Genetic Algorithms, reaching average fitness scores of 888 versus 106 in under 20 seconds. On the classic Flip-Flop problem, GAB consistently finds optimal or near-optimal solutions in fewer generations, even as input sizes scale to thousands of bits. These results highlight GAB as a highly effective and computationally efficient alternative for solving large-scale combinatorial optimization problems.
♻ ☆ Pretrained Reversible Generation as Unsupervised Visual Representation Learning ICCV 2025
Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous generation model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. This framework enables the flexible selection of feature hierarchies tailored to specific downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model based methods, including 78% top-1 accuracy on ImageNet at a resolution of 64*64. Extensive ablation studies, including out-of-distribution evaluations, further validate the effectiveness of our approach. Code is available at https://github.com/opendilab/PRG.
comment: Accepted by ICCV 2025
♻ ☆ Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity
The foundations of deep learning are supported by the seemingly opposing perspectives of approximation or learning theory. The former advocates for large/expressive models that need not generalize, while the latter considers classes that generalize but may be too small/constrained to be universal approximators. Motivated by real-world deep learning implementations that are both expressive and statistically reliable, we ask: "Is there a class of neural networks that is both large enough to be universal but structured enough to generalize?" This paper constructively provides a positive answer to this question by identifying a highly structured class of ReLU multilayer perceptions (MLPs), which are optimal function approximators and are statistically well-behaved. We show that any $(L,\alpha)$-H\"{o}lder function from $[0,1]^d$ to $[-n,n]$ can be approximated to a uniform $\mathcal{O}(1/n)$ error on $[0,1]^d$ with a sparsely connected ReLU MLP with the same H\"{o}lder exponent $\alpha$ and coefficient $L$, of width $\mathcal{O}(dn^{d/\alpha})$, depth $\mathcal{O}(\log(d))$, with $\mathcal{O}(dn^{d/\alpha})$ nonzero parameters, and whose weights and biases take values in $\{0,\pm 1/2\}$ except in the first and last layers which instead have magnitude at-most $n$. Further, our class of MLPs achieves a near-optimal sample complexity of $\mathcal{O}(\log(N)/\sqrt{N})$ when given $N$ i.i.d. normalized sub-Gaussian training samples. We achieve this through a new construction that perfectly fits together linear pieces using Kuhn triangulations, along with a new proof technique which shows that our construction preserves the regularity of not only the H\"{o}lder functions, but also any uniformly continuous function. Our results imply that neural networks can solve the McShane extension problem on suitable finite sets.
comment: 16 pages main body, 40 pages proofs, 10 figures, 1 table
♻ ☆ Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem
The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous optimization-driven applications. Traditional methods, typically based on the \textit{Quotient} formulation, often suffer from challenges related to computational efficiency and reliance on prior spectral knowledge. In this work, we leverage the alternative \textit{Difference} formulation to reinterpret the classical power method as a first-order optimization algorithm. This perspective allows for a novel convergence analysis and facilitates the development of accelerated variants with larger step-sizes, achieving faster convergence without additional computational cost. Building on this insight, we introduce a generalized family of Difference-based methods, with the power method as a special case. Within this family, we propose Split-Merge, an algorithm that attains accelerated convergence without requiring spectral knowledge and operates solely via matrix-vector products. Extensive experiments on both synthetic and real-world datasets demonstrate that Split-Merge consistently outperforms state-of-the-art methods in both efficiency and scalability. In particular, it achieves more than a $\boldsymbol{10\times}$ speedup over the classical power method, underscoring its practical effectiveness for large-scale problems.
♻ ☆ Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations ICCV
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. However, the wide range of tasks and high computational costs make full fine-tuning impractical. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. Despite the success of these methods, they are primarily designed for linear layers, focusing on two-dimensional matrices while largely ignoring higher-dimensional parameter spaces like convolutional kernels. Moreover, directly applying these methods to higher-dimensional parameter spaces often disrupts their structural relationships. Given the rapid advancements in matrix-based PEFT methods, rather than designing a specialized strategy, we propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties. Specifically, we treat parameters as elements of a Lie group, with updates modeled as perturbations in the corresponding Lie algebra. These perturbations are mapped back to the Lie group through the exponential map, ensuring smooth, consistent updates that preserve the inherent structure of the parameter space. Extensive experiments on computer vision and natural language processing validate the effectiveness and versatility of our approach, demonstrating clear improvements over existing methods.
comment: 2025 ICCV
♻ ☆ Explainable quantum regression algorithm with encoded data structure
Hybrid variational quantum algorithms (VQAs) are promising for solving practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. However, with typical random ansatz or quantum alternating operator ansatz, derived variational quantum algorithms become a black box that cannot be trusted for model interpretation, not to mention deploying as applications in informing critical decisions: the results of these variational parameters are just rotational angles for the quantum gates and have nothing to do with interpretable values that a model can provide directly. In this paper, we construct the first interpretable quantum regression algorithm, in which the quantum state exactly encodes the classical data table and the variational parameters correspond directly to the regression coefficients, which are real numbers by construction, providing a high degree of model interpretability and minimal cost to optimize due to the right expressiveness. We also take advantage of the encoded data structure to reduce the time complexity of computing the regression map. To shorten the circuit depth for nonlinear regression, our algorithm can be extended by building nonlinear features by classical preprocessing as the independent encoded column vectors. Even though the realization of compressed encoding in superconducting qubits has been achieved by the less noisy compressed encoding recently by the authors, we envision potential quantum utilities with multi-qubit gates implemented in neutral cold atoms and ions.
Systems and Control 16
☆ Joint Scheduling of DER under Demand Charges: Structure and Approximation
We study the joint scheduling of behind-the-meter distributed energy resources (DERs), including flexible loads, renewable generation, and battery energy storage systems, under net energy metering frameworks with demand charges. The problem is formulated as a stochastic dynamic program aimed at maximizing expected operational surplus while accounting for renewable generation uncertainty. We analytically characterize the structure of the optimal control policy and show that it admits a threshold-based form. However, due to the strong temporal coupling of the storage and demand charge constraints, the number of conditional branches in the policy scales combinatorially with the scheduling horizon, as it requires a look-ahead over future states. To overcome the high computational complexity in the general formulation, an efficient approximation algorithm is proposed, which searches for the peak demand under a mildly relaxed problem. We show that the algorithm scales linearly with the scheduling horizon. Extensive simulations using two open-source datasets validate the proposed algorithm and compare its performance against different DER control strategies, including a reinforcement learning-based one. Under varying storage and tariff parameters, the results show that the proposed algorithm outperforms various benchmarks in achieving a relatively small solution gap compared to the theoretical upper bound.
comment: 15 pages, 4 tables, 4 figures
☆ Towards an Optimal Control Perspective of ResNet Training ICML 2025
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
comment: Accepted for presentation at the High-dimensional Learning Dynamics (HiLD) workshop at ICML 2025
☆ Plasmonically Enhanced Flexural-Mode AlScN Nanoplate Resonator as Uncooled and Ultrafast IR Detector with High Responsivity
This letter introduces a novel class of miniaturized, uncooled, and ultra-fast infrared (IR) resonant thermal detectors (RTDs) based on 30%-doped Aluminum Scandium Nitride (AlScN) nanoplates. Exploiting high electromechanical coupling, good thermal properties, and enhanced and selective IR absorption, the presented device aims to demonstrate significant advancements over the state-of-the-art IR RTDs. This single pixel combines compact footprint, high spectral selectivity and responsivity, reduced noise, and fast thermal response, allowing for the potential development of innovative IR thermal imagers through multi-pixel integration. The flexural nature of the actuated resonance mode eventually enables an interferometric optical readout, paving the way towards achieving extremely low Noise Equivalent Power levels. These results demonstrate a high IR responsivity of around 130 ppt/pW, a thermal time constant of around 330 us, and a large out-of-plane displacement. This work represents the first experimental integration on a resonating platform of plasmonic absorbers that utilize AlScN as dielectric layer.
comment: This manuscript has been submitted to ACS Nano Letters for consideration
☆ Estimating Technical Loss without Power Flows: A Practical, Data-Driven Approach for Loss Estimation in Distribution Grids
Electric grids in low- and middle-income countries (LMICs) across the world face an acute challenge. To support global decarbonisation efforts and raise millions from energy poverty, these grids must shoulder substantial load growth while integrating distributed renewable generation. However, decades of rapid and poorly funded infrastructure expansions have led to national grids in many LMICs that are strained and weak, composed of aging, faulty, and undersized infrastructure. A cause and symptom of this weakness is excessive technical loss within the grid infrastructure during energy delivery, particularly at the distribution level; network losses are regularly estimated to be well over 20 percent, compared to a baseline of 5 percent in higher-income nations. Addressing technical loss through targeted interventions is essential for bolstering grids' physical and economic strength. Unfortunately, current approaches for estimating and localizing technical loss require expensive, extensive power flow sensing, which is essentially absent in LMIC distribution systems. We present a novel approach to technical loss estimation without power flows, which leverages more readily available voltage magnitude measurements at sparse locations in the grid. This estimator puts loss estimation and localization within reach for LMIC grids globally, and provides a critical tool for the effective design, implementation, and evaluation of loss-reduction interventions.
comment: 6 pages, 3 figures
☆ Coordinated Control of Autonomous Vehicles for Traffic Density Reduction at a Signalized Junction: An MPC Approach
The effective and safe management of traffic is a key issue due to the rapid advancement of the urban transportation system. Connected autonomous vehicles (CAVs) possess the capability to connect with each other and adjacent infrastructure, presenting novel opportunities for enhancing traffic flow and coordination. This work proposes a dual-mode model predictive control (MPC) architecture that tackles two interrelated issues: mitigating traffic density at signalized junctions and facilitating seamless, cooperative lane changes in high-density traffic conditions. The objective of this work is to facilitate responsive decision-making for CAVs, thereby enhancing the efficiency and safety of urban mobility. Moreover, we ensure recursive feasibility and convergence of the proposed MPC scheme by the integration of an online-calculated maximal control invariant terminal set. Finally, the efficacy of the proposed approach is validated through numerical simulation.
☆ Active Disturbance Rejection Control for Trajectory Tracking of a Seagoing USV: Design, Simulation, and Field Experiments IROS 2025
Unmanned Surface Vessels (USVs) face significant control challenges due to uncertain environmental disturbances like waves and currents. This paper proposes a trajectory tracking controller based on Active Disturbance Rejection Control (ADRC) implemented on the DUS V2500. A custom simulation incorporating realistic waves and current disturbances is developed to validate the controller's performance, supported by further validation through field tests in the harbour of Scheveningen, the Netherlands, and at sea. Simulation results demonstrate that ADRC significantly reduces cross-track error across all tested conditions compared to a baseline PID controller but increases control effort and energy consumption. Field trials confirm these findings while revealing a further increase in energy consumption during sea trials compared to the baseline.
comment: Accepted for presentation at IROS 2025. Submitted version
☆ Exact Time-Varying Turnpikes for Dynamic Operation of District Heating Networks
District heating networks (DHNs) are crucial for decarbonizing the heating sector. Yet, their efficient and reliable operation requires the coordination of multiple heat producers and the consideration of future demands. Predictive and optimization-based control is commonly used to address this task, but existing results for DHNs do not account for time-varying problem aspects. Since the turnpike phenomenon can serve as a basis for model predictive control design and analysis, this paper examines its role in DHN optimization by analyzing the underlying optimal control problem with time-varying prices and demands. That is, we derive conditions for the existence of a unique time-varying singular arc, which constitutes the time varying turnpike, and we provide its closed-form expression. Additionally, we present converse turnpike results showing a exact time-varying case implies strict dissipativity of the optimal control problem. A numerical example illustrates our findings.
☆ Estimation of superconducting cavity bandwidth and detuning using a Luenberger observer
Enabled by progress in superconducting technology, several continuous wave linear accelerators are foreseen in the next decade. For these machines, it is of crucial importance to track the main cavity parameters, such as the resonator bandwidth and detuning. The bandwidth yields information on the superconducting state of the cavity. The detuning should be minimized to limit the required power to operate the cavity. The estimation of these parameters is commonly implemented in the digital electronics of the Low-Level RF control system to minimize the computation delay. In this proceeding, we present a way to compute the bandwidth and detuning using a Luenberger observer. In contrast to previous methods, a state observer yields estimations at the native control system sample rate without explicitly filtering the input signals. Additionally, the error convergence properties of the estimations can be controlled intuitively by adjusting gain parameters. Implementation considerations and test results on the derived observer are presented in the manuscript.
comment: 10 pages, 4 figures, to be published in APS Physical Review - Accelerator and Beams
☆ Fault-Tolerant Spacecraft Attitude Determination using State Estimation Techniques
The extended and unscented Kalman filter, and the particle filter provide a robust framework for fault-tolerant attitude estimation on spacecraft. This paper explores how each filter performs for a large satellite in a low earth orbit. Additionally, various techniques, built on these filters, for fault detection, isolation and recovery from erroneous sensor measurements, are analyzed. Key results from this analysis include filter performance for various fault modes.
comment: 8 pages, 19 figures
☆ Optimal Parameter Design for Power Electronic Converters Using a Probabilistic Learning-Based Stochastic Surrogate Model
The selection of optimal design for power electronic converter parameters involves balancing efficiency and thermal constraints to ensure high performance without compromising safety. This paper introduces a probabilistic-learning-based stochastic surrogate modeling framework to address this challenge and significantly reduce the time required during the design phase. The approach begins with a neural network classifier that evaluates the feasibility of parameter configurations, effectively filtering out unsafe and/or impractical inputs. Subsequently, a probabilistic prediction model estimates the converter's efficiency and temperature while quantifying prediction uncertainty, providing both performance insights and reliability metrics. Finally, a heuristic optimization-based model is employed to optimize a multi-objective function that maximizes efficiency while adhering to thermal constraints. The optimization process incorporates penalty terms to discourage solutions that violate practical thresholds, ensuring actionable and realistic recommendations. An advanced heuristic optimization method is used to find the optimal solution and is compared with several well-known search algorithms, including Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Simulated Annealing (SA), Tabu-Search (TS), and Stochastic Hill Climbing (SHC). The results demonstrate significant improvements in predictive accuracy and optimization outcomes, offering a robust solution for advancing power electronics design.
♻ ☆ One Model to Forecast Them All and in Entity Distributions Bind Them
Probabilistic forecasting in power systems often involves multi-entity datasets like households, feeders, and wind turbines, where generating reliable entity-specific forecasts presents significant challenges. Traditional approaches require training individual models for each entity, making them inefficient and hard to scale. This study addresses this problem using GUIDE-VAE, a conditional variational autoencoder that allows entity-specific probabilistic forecasting using a single model. GUIDE-VAE provides flexible outputs, ranging from interpretable point estimates to full probability distributions, thanks to its advanced covariance composition structure. These distributions capture uncertainty and temporal dependencies, offering richer insights than traditional methods. To evaluate our GUIDE-VAE-based forecaster, we use household electricity consumption data as a case study due to its multi-entity and highly stochastic nature. Experimental results demonstrate that GUIDE-VAE outperforms conventional quantile regression techniques across key metrics while ensuring scalability and versatility. These features make GUIDE-VAE a powerful and generalizable tool for probabilistic forecasting tasks, with potential applications beyond household electricity consumption.
♻ ☆ Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.
♻ ☆ Exploring the Effects of Load Altering Attacks on Load Frequency Control through Python and RTDS
The modern power grid increasingly depends on advanced information and communication technology (ICT) systems to enhance performance and reliability through real-time monitoring, intelligent control, and bidirectional communication. However, ICT integration also exposes the grid to cyber-threats. Load altering attacks (LAAs), which use botnets of high-wattage devices to manipulate load profiles, are a notable threat to grid stability. While previous research has examined LAAs, their specific impact on load frequency control (LFC), critical for maintaining nominal frequency during load fluctuations, still needs to be explored. Even minor frequency deviations can jeopardize grid operations. This study bridges the gap by analyzing LAA effects on LFC through simulations of static and dynamic scenarios using Python and RTDS. The results highlight LAA impacts on frequency stability and present an eigenvalue-based stability assessment for dynamic LAAs (DLAAs), identifying key parameters influencing grid resilience.
comment: 2025 IEEE Kiel PowerTech
♻ ☆ Towards Event-Triggered NMPC for Efficient 6G Communications: Experimental Results and Open Problems
Networked control systems enable real-time control and coordination of distributed systems, leveraging the low latency, high reliability, and massive connectivity offered by 5G and future 6G networks. Applications include autonomous vehicles, robotics, industrial automation, and smart grids. Despite networked control algorithms admitting nominal stability guarantees even in the presence of delays and packet dropouts, their practical performance still heavily depends on the specific characteristics and conditions of the underlying network. To achieve the desired performance while efficiently using communication resources, co-design of control and communication is pivotal. Although periodic schemes, where communication instances are fixed, can provide reliable control performance, unnecessary transmissions, when updates are not needed, result in inefficient usage of network resources. In this paper, we investigate the potential for co-design of model predictive control and network communication. To this end, we design and implement an event-triggered nonlinear model predictive controller for stabilizing a Furuta pendulum communicating over a tailored open radio access network 6G research platform. We analyze the control performance as well as network utilization under varying channel conditions and event-triggering criteria. Additionally, we analyze the network-induced delay pattern and its interaction with the event-triggered controller. Our results show that the event-triggered control scheme achieves similar performance to periodic control with reduced communication demand.
comment: Accepted for presentation at IEEE ICCA 2025
♻ ☆ LoRaConnect: Unlocking HTTP Potential on LoRa Backbones for Remote Areas and Ad-Hoc Networks
Minimal infrastructure requirements make LoRa suitable for service delivery in remote areas. Additionally, web applications have become a de-facto standard for modern service delivery. However, Long Range (LoRa) fails to enable HTTP access due to its limited bandwidth, payload size limitations, and high collisions in multi-user setups. We propose LoRaConnect to enable HTTP access over LoRa. The LoRaWeb hardware tethers a WiFi hotspot to which client devices connect and access HTTP resources over LoRa backhaul. It implements caching and synchronization mechanisms to address LoRa's aforementioned limitations. It also implements a message-slicing method in the application layer to overcome LoRa's payload limitations. We evaluate the proposed system using actual hardware in three experimental setups to assess the baseline performance, ideal scenario, and practical application scenario with Frequency Hopping Spread Spectrum (FHSS). Additionally, it implements a ping operation to demonstrate Internet capability and extensible nature. LoRaWeb achieves an average throughput of 1.18 KB/S approximately, with an access delay of only 1.3 S approximately for a 1.5KB webpage in the baseline setup. Moreover, it achieves an access delay of approximately 6.7 S for a 10KB webpage in the ideal case and an average end-to-end delay of only 612 ms approximately in the FHSS-based setup. Comparison with benchmark suggests multi-fold improvement.
comment: 10 pages
♻ ☆ A Review of Safe Reinforcement Learning Methods for Modern Power Systems
Given the availability of more comprehensive measurement data in modern power systems, reinforcement learning (RL) has gained significant interest in operation and control. Conventional RL relies on trial-and-error interactions with the environment and reward feedback, which often leads to exploring unsafe operating regions and executing unsafe actions, especially when deployed in real-world power systems. To address these challenges, safe RL has been proposed to optimize operational objectives while ensuring safety constraints are met, keeping actions and states within safe regions throughout both training and deployment. Rather than relying solely on manually designed penalty terms for unsafe actions, as is common in conventional RL, safe RL methods reviewed here primarily leverage advanced and proactive mechanisms. These include techniques such as Lagrangian relaxation, safety layers, and theoretical guarantees like Lyapunov functions to rigorously enforce safety boundaries. This paper provides a comprehensive review of safe RL methods and their applications across various power system operations and control domains, including security control, real-time operation, operational planning, and emerging areas. It summarizes existing safe RL techniques, evaluates their performance, analyzes suitable deployment scenarios, and examines algorithm benchmarks and application environments. The paper also highlights real-world implementation cases and identifies critical challenges such as scalability in large-scale systems and robustness under uncertainty, providing potential solutions and outlining future directions to advance the reliable integration and deployment of safe RL in modern power systems.
Optimization and Control 26
☆ An equation-based batch distillation simulation to evaluate the effect of multiplicities in thermodynamic activity coefficients
In this paper, we investigate the influence of multiplicities in activity coefficients on batch distillation processes. In order to do so, we develop a rigorous simulation of batch distillation processes based on the MESH equations. In particular, we propose a novel index reduction method to transform the original index-$2$ system into a well-posed differential-algebraic system of index $1$. With the help of this simulation, we then explore whether the alternative NRTL parameters, which yield indistinguishable activity coefficients and VLE diagrams when compared to the reference data, can produce distinguishable profiles in dynamic simulations. As it turns out, this can happen in general and we explain the reasons behind the dynamic distinguishability.
comment: 56 pages, 27 figures
☆ Linear and Second-order-cone Valid Inequalities for Problems with Storage
Batteries are playing an increasingly central role as distributed energy resources in the shift toward power systems dominated by renewable energy sources. However, existing battery models must invariably rely on complementarity constraints to prevent simultaneous charging and discharging, rendering models of a disjunctive nature and NP-hard. In this paper, we analyze the disjunctive structure of the battery's feasible operational set and uncover a submodularity property in its extreme power trajectories. Leveraging this structure, we propose a systematic approach to derive linear valid inequalities that define facets of the convex hull of the battery's feasible operational set, including a distinguished family that generalizes and dominates existing formulations in the literature. To evaluate the practical utility of these inequalities, we conduct computational experiments on two representative problems whose continuous relaxations frequently result in simultaneous charge and discharge: energy arbitrage under negative prices and set-point tracking. For the latter, we further introduce second-order cone inequalities that remarkably and efficiently reduce simultaneous charging and discharging. Our results highlight the potential of structured relaxations to enhance the tractability and fidelity of battery models in optimization-based energy system tools.
☆ Towards an Optimal Control Perspective of ResNet Training ICML 2025
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
comment: Accepted for presentation at the High-dimensional Learning Dynamics (HiLD) workshop at ICML 2025
☆ Proximal aiming in weak KAM theory with nonsmooth Lagrangian
This work extends weak KAM theory to the case of a nonsmooth Lagrangian satisfying a superlinear growth condition. Using the solution of a weak KAM equation that is a stationary Hamilton-Jacobi equation and the proximal aiming method, we construct a family of discontinuous feedback strategies that are nearly optimal for every time interval. This result leads to an analogue of the weak KAM theorem. Additionally, as in classical weak KAM theory, we demonstrate that the effective Hamiltonian (Ma\~{n}\'{e} critical value) can be determined by solving a linear programming problem in the class of probability measures.
comment: 21 pages
☆ Exact Time-Varying Turnpikes for Dynamic Operation of District Heating Networks
District heating networks (DHNs) are crucial for decarbonizing the heating sector. Yet, their efficient and reliable operation requires the coordination of multiple heat producers and the consideration of future demands. Predictive and optimization-based control is commonly used to address this task, but existing results for DHNs do not account for time-varying problem aspects. Since the turnpike phenomenon can serve as a basis for model predictive control design and analysis, this paper examines its role in DHN optimization by analyzing the underlying optimal control problem with time-varying prices and demands. That is, we derive conditions for the existence of a unique time-varying singular arc, which constitutes the time varying turnpike, and we provide its closed-form expression. Additionally, we present converse turnpike results showing a exact time-varying case implies strict dissipativity of the optimal control problem. A numerical example illustrates our findings.
☆ Block Coordinate Descent Network Simplex for Optimal Transport
We propose the Block Coordinate Descent Network Simplex (BCDNS) method for solving large-scale discrete Optimal Transport (OT) problems. BCDNS integrates the Network Simplex (NS) algorithm with a block coordinate descent (BCD) strategy, decomposing the full problem into smaller subproblems per iteration and reusing basis variables to ensure feasibility. We prove that BCDNS terminates in a finite number of iterations with an exact optimal solution, and we characterize its per-iteration complexity as O(s N), where s is a user-defined parameter in (0,1) and N is the total number of variables. Numerical experiments demonstrate that BCDNS matches the classical NS method in solution accuracy, reduces memory footprint compared to the Sinkhorn algorithm, achieves speed-ups of up to tens of times over the classical NS method, and exhibits runtime comparable to a high-precision Sinkhorn implementation.
☆ Quantum Adaptive Search: A Hybrid Quantum-Classical Algorithm for Global Optimization of Multivariate Functions
This work presents Quantum Adaptive Search (QAGS), a hybrid quantum-classical algorithm for the global optimization of multivariate functions. The method employs an adaptive mechanism that dynamically narrows the search space based on a quantum-estimated probability distribution of the objective function. A quantum state encodes information about solution quality through an appropriate complex amplitude mapping, enabling the identification of the most promising regions, and thus progressively tightening the search bounds; then a classical optimizer performs local refinement of the solution. The analysis demonstrates that QAGS ensures a contraction of the search space toward global optima, with controlled computational complexity. The numerical results on the benchmark functions show that, compared to the classical methods, QAGS achieves higher accuracy while offering advantages in both time and space complexity.
☆ Faster Fixed-Point Methods for Multichain MDPs
We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.
☆ Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.
♻ ☆ Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.
♻ ☆ Improving Stochastic Cubic Newton with Momentum
We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
♻ ☆ Extended mean field control: a global numerical solution via finite-dimensional approximation
We present a finite-dimensional numerical approximation for a class of extended mean field control problems. Our algorithm learns the value function on the whole Wasserstein domain, as opposed to a fixed initial condition. We leverage the approximation of the mean field problem by a finite-player cooperative optimisation problem, due to the propagation of chaos, together with the usage of finite-dimensional solvers. This avoids the need to directly approximate functions on an infinite-dimensional domain, and allows for more efficient memory usage and faster computation.
♻ ☆ Stochastic homogenisation of nonlinear minimum-cost flow problems
This paper deals with the large-scale behaviour of nonlinear minimum-cost flow problems on random graphs. In such problems, a random nonlinear cost functional is minimised among all flows (discrete vector-fields) with a prescribed net flux through each vertex. On a stationary random graph embedded in $\mathbb{R}^d$, our main result asserts that these problems converge, in the large-scale limit, to a continuous minimisation problem where an effective cost functional is minimised among all vector fields with prescribed divergence. Our main result is formulated using $\Gamma$-convergence and applies to multi-species problems. The proof employs the blow-up technique by Fonseca and M\"uller in a discrete setting. One of the main challenges overcome is the construction of the homogenised energy density on random graphs without a periodic structure.
♻ ☆ Quantum optimal transport with convex regularization
The goal of this paper is to settle the study of non-commutative optimal transport problems with convex regularization, in their static and finite-dimensional formulations. We consider both the balanced and unbalanced problem and show in both cases a duality result, characterizations of minimizers (for the primal) and maximizers (for the dual). An important tool we define is a non-commutative version of the classical $(c,\psi)$-transforms associated with a general convex regularization, which we employ to prove the convergence of Sinkhorn iterations in the balanced case. Finally, we show the convergence of the unbalanced transport problems towards the balanced one, as well as the convergence of transforms, as the marginal penalization parameters go to $+\infty$.
♻ ☆ Towards Event-Triggered NMPC for Efficient 6G Communications: Experimental Results and Open Problems
Networked control systems enable real-time control and coordination of distributed systems, leveraging the low latency, high reliability, and massive connectivity offered by 5G and future 6G networks. Applications include autonomous vehicles, robotics, industrial automation, and smart grids. Despite networked control algorithms admitting nominal stability guarantees even in the presence of delays and packet dropouts, their practical performance still heavily depends on the specific characteristics and conditions of the underlying network. To achieve the desired performance while efficiently using communication resources, co-design of control and communication is pivotal. Although periodic schemes, where communication instances are fixed, can provide reliable control performance, unnecessary transmissions, when updates are not needed, result in inefficient usage of network resources. In this paper, we investigate the potential for co-design of model predictive control and network communication. To this end, we design and implement an event-triggered nonlinear model predictive controller for stabilizing a Furuta pendulum communicating over a tailored open radio access network 6G research platform. We analyze the control performance as well as network utilization under varying channel conditions and event-triggering criteria. Additionally, we analyze the network-induced delay pattern and its interaction with the event-triggered controller. Our results show that the event-triggered control scheme achieves similar performance to periodic control with reduced communication demand.
comment: Accepted for presentation at IEEE ICCA 2025
♻ ☆ Fast convergence of a primal-dual dynamical system with implicit Hessian damping and Tikhonov regularization
This paper proposes two primal-dual dynamical systems for solving linear equality constrained convex optimization problems: one with implicit Hessian damping only, and the other further incorporating Tikhonov regularization. We analyze the fast convergence properties of both dynamical systems and show that they achieve the same convergence rates. Moreover, we show that the trajectory generated by the dynamical system with Tikhonov regularization converges strongly to the minimum-norm solution of the underlying problem. Finally, numerical experiments are conducted to validate the theoretical findings. Interestingly, the trajectories exhibit smooth behavior even when the objective function is only continuously differentiable.
comment: 25 pages, 9 figures
♻ ☆ Parallel Polyhedral Projection Method for the Convex Feasibility Problem
In this paper, we introduce and study the Parallel Polyhedral Projection Method (3PM) and the Approximate Parallel Polyhedral Projection Method (A3PM) for finding a point in the intersection of finitely many closed convex sets. Each iteration has two phases: parallel projections onto the target sets (exact in 3PM, approximate in A3PM), followed by an exact or approximate projection onto a polyhedron defined by supporting half-spaces. These strategies appear novel, as existing methods largely focus on parallel schemes like Cimmino's method. Numerical experiments demonstrate that A3PM often outperforms both classical and recent projection-based methods when the number of sets is greater than two. Theoretically, we establish global convergence for both 3PM and A3PM without regularity assumptions. Under a Slater condition or error bound, we prove linear convergence, even with inexact projections. Additionally, we show that 3PM achieves superlinear convergence under suitable geometric assumptions.
♻ ☆ Lagrangian Index Policy for Restless Bandits with Average Reward
We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.
♻ ☆ On the Characteristics of the Conjugate Function Enabling Effective Dual Decomposition Methods
We investigate a novel characteristic of the conjugate function associated to a generic convex optimization problem, which can subsequently be leveraged for efficient dual decomposition methods. In particular, under mild assumptions, we show that there is a specific region in the domain of the conjugate function such that for any point in the region, there is always a ray originating from that point along which the gradients of the conjugate remain constant. We refer to this characteristic as a fixed gradient over rays (FGOR). We further show that this characteristic is inherited by the corresponding dual function. Then we provide a thorough exposition of the application of the FGOR characteristic to dual subgradient methods. More importantly, we leverage FGOR to devise a simple stepsize rule that can be prepended with state-of-the-art stepsize methods enabling them to be more efficient. Furthermore, we investigate how the FGOR characteristic is used when solving the global consensus problem, a prevalent formulation in diverse application domains. We show that FGOR can be exploited not only to expedite the convergence of the dual decomposition methods but also to reduce the communication overhead. Numerical experiments using quadratic objectives and a regularized least squares regression with a real dataset are conducted. The results show that FGOR can significantly improve the performance of existing stepsize methods and outperform the state-of-the-art splitting methods on average in terms of both convergence behavior and communication efficiency.
comment: This work has been submitted to the IEEE for possible publication
♻ ☆ Multi-convex Programming for Discrete Latent Factor Models Prototyping
Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.
♻ ☆ Solving Inverse Problem for Multi-armed Bandits via Convex Optimization
We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.
♻ ☆ Inverse Reinforcement Learning via Convex Optimization
We consider the inverse reinforcement learning (IRL) problem, where an unknown reward function of some Markov decision process is estimated based on observed expert demonstrations. In most existing approaches, IRL is formulated and solved as a nonconvex optimization problem, posing challenges in scenarios where robustness and reproducibility are critical. We discuss a convex formulation of the IRL problem (CIRL) initially proposed by Ng and Russel, and reformulate the problem such that the domain-specific language CVXPY can be applied directly to specify and solve the convex problem. We also extend the CIRL problem to scenarios where the expert policy is not given analytically but by trajectory as state-action pairs, which can be strongly inconsistent with optimality, by augmenting some of the constraints. Theoretical analysis and practical implementation for hyperparameter auto-selection are introduced. This note helps the users to easily apply CIRL for their problems, without background knowledge on convex optimization.
♻ ☆ Proximal Point Method for Online Saddle Point Problem
This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method (OPPM), the Optimistic OPPM (OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric. The technical appendix and code can be found at https://github.com/qingxin6174/PPM-for-OSP.
♻ ☆ Split-Merge: A Difference-based Approach for Dominant Eigenvalue Problem
The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous optimization-driven applications. Traditional methods, typically based on the \textit{Quotient} formulation, often suffer from challenges related to computational efficiency and reliance on prior spectral knowledge. In this work, we leverage the alternative \textit{Difference} formulation to reinterpret the classical power method as a first-order optimization algorithm. This perspective allows for a novel convergence analysis and facilitates the development of accelerated variants with larger step-sizes, achieving faster convergence without additional computational cost. Building on this insight, we introduce a generalized family of Difference-based methods, with the power method as a special case. Within this family, we propose Split-Merge, an algorithm that attains accelerated convergence without requiring spectral knowledge and operates solely via matrix-vector products. Extensive experiments on both synthetic and real-world datasets demonstrate that Split-Merge consistently outperforms state-of-the-art methods in both efficiency and scalability. In particular, it achieves more than a $\boldsymbol{10\times}$ speedup over the classical power method, underscoring its practical effectiveness for large-scale problems.
♻ ☆ On the equivalence between functionally affine LPV state-space representations and LFT models
We propose a transformation algorithm for a class of Linear Parameter-Varying (LPV) systems with functional affine dependence on parameters, where the system matrices depend affinely on nonlinear functions of the scheduling varable, into Linear Fractional Transformation (LFT) systems. The transformation preserves input-output behavior and minimality, and the uncertainity block of the resulting LFT system is linear in the scheduling variables of the LPV system.
♻ ☆ Cornell University Uses Integer Programming to Optimize Final Exam Scheduling
This paper presents an integer programming-based optimization framework designed to effectively address the complex final exam scheduling challenges encountered at Cornell University. With high flexibility, the framework is specifically tailored to accommodate a variety of different constraints, including the front-loading of large courses and the exclusion of specific time slots during the exam period. By generating multiple scheduling model variants and incorporating heuristic approaches, our framework enables comprehensive comparisons of different schedules. This empowers the University Registrar to make informed decisions, considering trade-offs in terms of schedule comfort measured by different levels of exam conflicts. Our results demonstrate significant advantage over the historical lecture time-based approach, providing time and effort savings for the university administration while enhancing student and faculty satisfaction.
Robotics 46
☆ Model-Based Real-Time Pose and Sag Estimation of Overhead Power Lines Using LiDAR for Drone Inspection
Drones can inspect overhead power lines while they remain energized, significantly simplifying the inspection process. However, localizing a drone relative to all conductors using an onboard LiDAR sensor presents several challenges: (1) conductors provide minimal surface for LiDAR beams limiting the number of conductor points in a scan, (2) not all conductors are consistently detected, and (3) distinguishing LiDAR points corresponding to conductors from other objects, such as trees and pylons, is difficult. This paper proposes an estimation approach that minimizes the error between LiDAR measurements and a single geometric model representing the entire conductor array, rather than tracking individual conductors separately. Experimental results, using data from a power line drone inspection, demonstrate that this method achieves accurate tracking, with a solver converging under 50 ms per frame, even in the presence of partial observations, noise, and outliers. A sensitivity analysis shows that the estimation approach can tolerate up to twice as many outlier points as valid conductors measurements.
comment: Submitted to IEEE case 2025
☆ Online Planning for Cooperative Air-Ground Robot Systems with Unknown Fuel Requirements RSS
We consider an online variant of the fuel-constrained UAV routing problem with a ground-based mobile refueling station (FCURP-MRS), where targets incur unknown fuel costs. We develop a two-phase solution: an offline heuristic-based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real-time fuel consumption during target processing. Preliminary Gazebo simulations demonstrate the feasibility of our approach in maintaining UAV-UGV path validity, ensuring mission completion. Link to video: https://youtu.be/EmpVj-fjqNY
comment: Submitted to RSS (MRS Workshop)
☆ IMA-Catcher: An IMpact-Aware Nonprehensile Catching Framework based on Combined Optimization and Learning
Robotic catching of flying objects typically generates high impact forces that might lead to task failure and potential hardware damages. This is accentuated when the object mass to robot payload ratio increases, given the strong inertial components characterizing this task. This paper aims to address this problem by proposing an implicitly impact-aware framework that accomplishes the catching task in both pre- and post-catching phases. In the first phase, a motion planner generates optimal trajectories that minimize catching forces, while in the second, the object's energy is dissipated smoothly, minimizing bouncing. In particular, in the pre-catching phase, a real-time optimal planner is responsible for generating trajectories of the end-effector that minimize the velocity difference between the robot and the object to reduce impact forces during catching. In the post-catching phase, the robot's position, velocity, and stiffness trajectories are generated based on human demonstrations when catching a series of free-falling objects with unknown masses. A hierarchical quadratic programming-based controller is used to enforce the robot's constraints (i.e., joint and torque limits) and create a stack of tasks that minimizes the reflected mass at the end-effector as a secondary objective. The initial experiments isolate the problem along one dimension to accurately study the effects of each contribution on the metrics proposed. We show how the same task, without velocity matching, would be infeasible due to excessive joint torques resulting from the impact. The addition of reflected mass minimization is then investigated, and the catching height is increased to evaluate the method's robustness. Finally, the setup is extended to catching along multiple Cartesian axes, to prove its generalization in space.
comment: 25 pages, 17 figures, accepted by International Journal of Robotics Research (IJRR)
☆ How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction?
Gestures enable non-verbal human-robot communication, especially in noisy environments like agile production. Traditional deep learning-based gesture recognition relies on task-specific architectures using images, videos, or skeletal pose estimates as input. Meanwhile, Vision Foundation Models (VFMs) and Vision Language Models (VLMs) with their strong generalization abilities offer potential to reduce system complexity by replacing dedicated task-specific modules. This study investigates adapting such models for dynamic, full-body gesture recognition, comparing V-JEPA (a state-of-the-art VFM), Gemini Flash 2.0 (a multimodal VLM), and HD-GCN (a top-performing skeleton-based approach). We introduce NUGGET, a dataset tailored for human-robot communication in intralogistics environments, to evaluate the different gesture recognition approaches. In our experiments, HD-GCN achieves best performance, but V-JEPA comes close with a simple, task-specific classification head - thus paving a possible way towards reducing system complexity, by using it as a shared multi-task model. In contrast, Gemini struggles to differentiate gestures based solely on textual descriptions in the zero-shot setting, highlighting the need of further research on suitable input representations for gestures.
☆ ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations
Vision and touch are two fundamental sensory modalities for robots, offering complementary information that enhances perception and manipulation tasks. Previous research has attempted to jointly learn visual-tactile representations to extract more meaningful information. However, these approaches often rely on direct combination, such as feature addition and concatenation, for modality fusion, which tend to result in poor feature integration. In this paper, we propose ConViTac, a visual-tactile representation learning network designed to enhance the alignment of features during fusion using contrastive representations. Our key contribution is a Contrastive Embedding Conditioning (CEC) mechanism that leverages a contrastive encoder pretrained through self-supervised contrastive learning to project visual and tactile inputs into unified latent embeddings. These embeddings are used to couple visual-tactile feature fusion through cross-modal attention, aiming at aligning the unified representations and enhancing performance on downstream tasks. We conduct extensive experiments to demonstrate the superiority of ConViTac in real world over current state-of-the-art methods and the effectiveness of our proposed CEC mechanism, which improves accuracy by up to 12.0% in material classification and grasping prediction tasks.
☆ DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy
We propose DemoDiffusion, a simple and scalable method for enabling robots to perform manipulation tasks in natural environments by imitating a single human demonstration. Our approach is based on two key insights. First, the hand motion in a human demonstration provides a useful prior for the robot's end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Our approach avoids the need for online reinforcement learning or paired human-robot data, enabling robust adaptation to new tasks and scenes with minimal manual effort. Experiments in both simulation and real-world settings show that DemoDiffusion outperforms both the base policy and the retargeted trajectory, enabling the robot to succeed even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/
comment: Preprint(17 pages). Under Review
☆ A Computationally Aware Multi Objective Framework for Camera LiDAR Calibration
Accurate extrinsic calibration between LiDAR and camera sensors is important for reliable perception in autonomous systems. In this paper, we present a novel multi-objective optimization framework that jointly minimizes the geometric alignment error and computational cost associated with camera-LiDAR calibration. We optimize two objectives: (1) error between projected LiDAR points and ground-truth image edges, and (2) a composite metric for computational cost reflecting runtime and resource usage. Using the NSGA-II \cite{deb2002nsga2} evolutionary algorithm, we explore the parameter space defined by 6-DoF transformations and point sampling rates, yielding a well-characterized Pareto frontier that exposes trade-offs between calibration fidelity and resource efficiency. Evaluations are conducted on the KITTI dataset using its ground-truth extrinsic parameters for validation, with results verified through both multi-objective and constrained single-objective baselines. Compared to existing gradient-based and learned calibration methods, our approach demonstrates interpretable, tunable performance with lower deployment overhead. Pareto-optimal configurations are further analyzed for parameter sensitivity and innovation insights. A preference-based decision-making strategy selects solutions from the Pareto knee region to suit the constraints of the embedded system. The robustness of calibration is tested across variable edge-intensity weighting schemes, highlighting optimal balance points. Although real-time deployment on embedded platforms is deferred to future work, this framework establishes a scalable and transparent method for calibration under realistic misalignment and resource-limited conditions, critical for long-term autonomy, particularly in SAE L3+ vehicles receiving OTA updates.
comment: 16 pages, 10 figures
☆ Task Allocation of UAVs for Monitoring Missions via Hardware-in-the-Loop Simulation and Experimental Validation
This study addresses the optimisation of task allocation for Unmanned Aerial Vehicles (UAVs) within industrial monitoring missions. The proposed methodology integrates a Genetic Algorithms (GA) with a 2-Opt local search technique to obtain a high-quality solution. Our approach was experimentally validated in an industrial zone to demonstrate its efficacy in real-world scenarios. Also, a Hardware-in-the-loop (HIL) simulator for the UAVs team is introduced. Moreover, insights about the correlation between the theoretical cost function and the actual battery consumption and time of flight are deeply analysed. Results show that the considered costs for the optimisation part of the problem closely correlate with real-world data, confirming the practicality of the proposed approach.
☆ Learning-Based Distance Estimation for 360° Single-Sensor Setups
Accurate distance estimation is a fundamental challenge in robotic perception, particularly in omnidirectional imaging, where traditional geometric methods struggle with lens distortions and environmental variability. In this work, we propose a neural network-based approach for monocular distance estimation using a single 360{\deg} fisheye lens camera. Unlike classical trigonometric techniques that rely on precise lens calibration, our method directly learns and infers the distance of objects from raw omnidirectional inputs, offering greater robustness and adaptability across diverse conditions. We evaluate our approach on three 360{\deg} datasets (LOAF, ULM360, and a newly captured dataset Boat360), each representing distinct environmental and sensor setups. Our experimental results demonstrate that the proposed learning-based model outperforms traditional geometry-based methods and other learning baselines in both accuracy and robustness. These findings highlight the potential of deep learning for real-time omnidirectional distance estimation, making our approach particularly well-suited for low-cost applications in robotics, autonomous navigation, and surveillance.
comment: Submitted to ECMR 2025
☆ Communication-Aware Map Compression for Online Path-Planning: A Rate-Distortion Approach
This paper addresses the problem of collaborative navigation in an unknown environment, where two robots, referred to in the sequel as the Seeker and the Supporter, traverse the space simultaneously. The Supporter assists the Seeker by transmitting a compressed representation of its local map under bandwidth constraints to support the Seeker's path-planning task. We introduce a bit-rate metric based on the expected binary codeword length to quantify communication cost. Using this metric, we formulate the compression design problem as a rate-distortion optimization problem that determines when to communicate, which regions of the map should be included in the compressed representation, and at what resolution (i.e., quantization level) they should be encoded. Our formulation allows different map regions to be encoded at varying quantization levels based on their relevance to the Seeker's path-planning task. We demonstrate that the resulting optimization problem is convex, and admits a closed-form solution known in the information theory literature as reverse water-filling, enabling efficient, low-computation, and real-time implementation. Additionally, we show that the Seeker can infer the compression decisions of the Supporter independently, requiring only the encoded map content and not the encoding policy itself to be transmitted, thereby reducing communication overhead. Simulation results indicate that our method effectively constructs compressed, task-relevant map representations, both in content and resolution, that guide the Seeker's planning decisions even under tight bandwidth limitations.
☆ HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
Real-time human perception is crucial for effective human-robot interaction (HRI). Large vision-language models (VLMs) offer promising generalizable perceptual capabilities but often suffer from high latency, which negatively impacts user experience and limits VLM applicability in real-world scenarios. To systematically study VLM capabilities in human perception for HRI and performance-latency trade-offs, we introduce HRIBench, a visual question-answering (VQA) benchmark designed to evaluate VLMs across a diverse set of human perceptual tasks critical for HRI. HRIBench covers five key domains: (1) non-verbal cue understanding, (2) verbal instruction understanding, (3) human-robot object relationship understanding, (4) social navigation, and (5) person identification. To construct HRIBench, we collected data from real-world HRI environments to curate questions for non-verbal cue understanding, and leveraged publicly available datasets for the remaining four domains. We curated 200 VQA questions for each domain, resulting in a total of 1000 questions for HRIBench. We then conducted a comprehensive evaluation of both state-of-the-art closed-source and open-source VLMs (N=11) on HRIBench. Our results show that, despite their generalizability, current VLMs still struggle with core perceptual capabilities essential for HRI. Moreover, none of the models within our experiments demonstrated a satisfactory performance-latency trade-off suitable for real-time deployment, underscoring the need for future research on developing smaller, low-latency VLMs with improved human perception capabilities. HRIBench and our results can be found in this Github repository: https://github.com/interaction-lab/HRIBench.
comment: Accepted to the 19th International Symposium on Experimental Robotics (ISER 2025)
☆ Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation
Learning-based robotic systems demand rigorous validation to assure reliable performance, but extensive real-world testing is often prohibitively expensive, and if conducted may still yield insufficient data for high-confidence guarantees. In this work, we introduce a general estimation framework that leverages paired data across test platforms, e.g., paired simulation and real-world observations, to achieve better estimates of real-world metrics via the method of control variates. By incorporating cheap and abundant auxiliary measurements (for example, simulator outputs) as control variates for costly real-world samples, our method provably reduces the variance of Monte Carlo estimates and thus requires significantly fewer real-world samples to attain a specified confidence bound on the mean performance. We provide theoretical analysis characterizing the variance and sample-efficiency improvement, and demonstrate empirically in autonomous driving and quadruped robotics settings that our approach achieves high-probability bounds with markedly improved sample efficiency. Our technique can lower the real-world testing burden for validating the performance of the stack, thereby enabling more efficient and cost-effective experimental evaluation of robotic systems.
☆ Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos
Modern image-based object detection models, such as YOLOv7, primarily process individual frames independently, thus ignoring valuable temporal context naturally present in videos. Meanwhile, existing video-based detection methods often introduce complex temporal modules, significantly increasing model size and computational complexity. In practical applications such as surveillance and autonomous driving, transient challenges including motion blur, occlusions, and abrupt appearance changes can severely degrade single-frame detection performance. To address these issues, we propose a straightforward yet highly effective strategy: stacking multiple consecutive frames as input to a YOLO-based detector while supervising only the output corresponding to a single target frame. This approach leverages temporal information with minimal modifications to existing architectures, preserving simplicity, computational efficiency, and real-time inference capability. Extensive experiments on the challenging MOT20Det and our BOAT360 datasets demonstrate that our method improves detection robustness, especially for lightweight models, effectively narrowing the gap between compact and heavy detection networks. Additionally, we contribute the BOAT360 benchmark dataset, comprising annotated fisheye video sequences captured from a boat, to support future research in multi-frame video object detection in challenging real-world scenarios.
comment: Submitted to ECMR 2025
☆ Critical Anatomy-Preserving & Terrain-Augmenting Navigation (CAPTAiN): Application to Laminectomy Surgical Education
Surgical training remains a crucial milestone in modern medicine, with procedures such as laminectomy exemplifying the high risks involved. Laminectomy drilling requires precise manual control to mill bony tissue while preserving spinal segment integrity and avoiding breaches in the dura: the protective membrane surrounding the spinal cord. Despite unintended tears occurring in up to 11.3% of cases, no assistive tools are currently utilized to reduce this risk. Variability in patient anatomy further complicates learning for novice surgeons. This study introduces CAPTAiN, a critical anatomy-preserving and terrain-augmenting navigation system that provides layered, color-coded voxel guidance to enhance anatomical awareness during spinal drilling. CAPTAiN was evaluated against a standard non-navigated approach through 110 virtual laminectomies performed by 11 orthopedic residents and medical students. CAPTAiN significantly improved surgical completion rates of target anatomy (87.99% vs. 74.42%) and reduced cognitive load across multiple NASA-TLX domains. It also minimized performance gaps across experience levels, enabling novices to perform on par with advanced trainees. These findings highlight CAPTAiN's potential to optimize surgical execution and support skill development across experience levels. Beyond laminectomy, it demonstrates potential for broader applications across various surgical and drilling procedures, including those in neurosurgery, otolaryngology, and other medical fields.
Behavior Foundation Model: Towards Next-Generation Whole-Body Control System of Humanoid Robots
Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pretraining to learn reusable primitive skills and behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and long-term list of BFM papers and projects to facilitate more subsequent research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.
comment: 19 pages, 8 figures
☆ EANS: Reducing Energy Consumption for UAV with an Environmental Adaptive Navigation Strategy
Unmanned Aerial Vehicles (UAVS) are limited by the onboard energy. Refinement of the navigation strategy directly affects both the flight velocity and the trajectory based on the adjustment of key parameters in the UAVS pipeline, thus reducing energy consumption. However, existing techniques tend to adopt static and conservative strategies in dynamic scenarios, leading to inefficient energy reduction. Dynamically adjusting the navigation strategy requires overcoming the challenges including the task pipeline interdependencies, the environmental-strategy correlations, and the selecting parameters. To solve the aforementioned problems, this paper proposes a method to dynamically adjust the navigation strategy of the UAVS by analyzing its dynamic characteristics and the temporal characteristics of the autonomous navigation pipeline, thereby reducing UAVS energy consumption in response to environmental changes. We compare our method with the baseline through hardware-in-the-loop (HIL) simulation and real-world experiments, showing our method 3.2X and 2.6X improvements in mission time, 2.4X and 1.6X improvements in energy, respectively.
☆ A Review of Personalisation in Human-Robot Collaboration and Future Perspectives Towards Industry 5.0
The shift in research focus from Industry 4.0 to Industry 5.0 (I5.0) promises a human-centric workplace, with social and well-being values at the centre of technological implementation. Human-Robot Collaboration (HRC) is a core aspect of I5.0 development, with an increase in adaptive and personalised interactions and behaviours. This review investigates recent advancements towards personalised HRC, where user-centric adaption is key. There is a growing trend for adaptable HRC research, however there lacks a consistent and unified approach. The review highlights key research trends on which personal factors are considered, workcell and interaction design, and adaptive task completion. This raises various key considerations for future developments, particularly around the ethical and regulatory development of personalised systems, which are discussed in detail.
comment: Accepted by the 2025 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
☆ Learn to Position -- A Novel Meta Method for Robotic Positioning
Absolute positioning accuracy is a vital specification for robots. Achieving high position precision can be challenging due to the presence of various sources of errors. Meanwhile, accurately depicting these errors is difficult due to their stochastic nature. Vision-based methods are commonly integrated to guide robotic positioning, but their performance can be highly impacted by inevitable occlusions or adverse lighting conditions. Drawing on the aforementioned considerations, a vision-free, model-agnostic meta-method for compensating robotic position errors is proposed, which maximizes the probability of accurate robotic position via interactive feedback. Meanwhile, the proposed method endows the robot with the capability to learn and adapt to various position errors, which is inspired by the human's instinct for grasping under uncertainties. Furthermore, it is a self-learning and self-adaptive method able to accelerate the robotic positioning process as more examples are incorporated and learned. Empirical studies validate the effectiveness of the proposed method. As of the writing of this paper, the proposed meta search method has already been implemented in a robotic-based assembly line for odd-form electronic components.
☆ Multimodal Behaviour Trees for Robotic Laboratory Task Automation ICRA 2025
Laboratory robotics offer the capability to conduct experiments with a high degree of precision and reproducibility, with the potential to transform scientific research. Trivial and repeatable tasks; e.g., sample transportation for analysis and vial capping are well-suited for robots; if done successfully and reliably, chemists could contribute their efforts towards more critical research activities. Currently, robots can perform these tasks faster than chemists, but how reliable are they? Improper capping could result in human exposure to toxic chemicals which could be fatal. To ensure that robots perform these tasks as accurately as humans, sensory feedback is required to assess the progress of task execution. To address this, we propose a novel methodology based on behaviour trees with multimodal perception. Along with automating robotic tasks, this methodology also verifies the successful execution of the task, a fundamental requirement in safety-critical environments. The experimental evaluation was conducted on two lab tasks: sample vial capping and laboratory rack insertion. The results show high success rate, i.e., 88% for capping and 92% for insertion, along with strong error detection capabilities. This ultimately proves the robustness and reliability of our approach and that using multimodal behaviour trees should pave the way towards the next generation of robotic chemists.
comment: 7 pages, 5 figures, accepted and presented in ICRA 2025
☆ SPARK: Graph-Based Online Semantic Integration System for Robot Task Planning
The ability to update information acquired through various means online during task execution is crucial for a general-purpose service robot. This information includes geometric and semantic data. While SLAM handles geometric updates on 2D maps or 3D point clouds, online updates of semantic information remain unexplored. We attribute the challenge to the online scene graph representation, for its utility and scalability. Building on previous works regarding offline scene graph representations, we study online graph representations of semantic information in this work. We introduce SPARK: Spatial Perception and Robot Knowledge Integration. This framework extracts semantic information from environment-embedded cues and updates the scene graph accordingly, which is then used for subsequent task planning. We demonstrate that graph representations of spatial relationships enhance the robot system's ability to perform tasks in dynamic environments and adapt to unconventional spatial cues, like gestures.
☆ Enhanced Robotic Navigation in Deformable Environments using Learning from Demonstration and Dynamic Modulation IROS 2025
This paper presents a novel approach for robot navigation in environments containing deformable obstacles. By integrating Learning from Demonstration (LfD) with Dynamical Systems (DS), we enable adaptive and efficient navigation in complex environments where obstacles consist of both soft and hard regions. We introduce a dynamic modulation matrix within the DS framework, allowing the system to distinguish between traversable soft regions and impassable hard areas in real-time, ensuring safe and flexible trajectory planning. We validate our method through extensive simulations and robot experiments, demonstrating its ability to navigate deformable environments. Additionally, the approach provides control over both trajectory and velocity when interacting with deformable objects, including at intersections, while maintaining adherence to the original DS trajectory and dynamically adapting to obstacles for smooth and reliable navigation.
comment: Accepted to IROS 2025
☆ CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition
We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring that robots correctly recognize and track actors, objects, and their interactions over time. To achieve this, CARMA uniquely identifies physical instances of such entities in the real world and organizes them into grounded triplets of actors, objects, and actions. To validate our approach, we conducted three experiments, where multiple humans and a robot interact: collaborative pouring, handovers, and sorting. These scenarios allow the assessment of the system's capabilities as to role distinction, multi-actor awareness, and consistent instance identification. Our experiments demonstrate that the system can reliably generate accurate actor-action-object triplets, providing a structured and robust foundation for applications requiring spatiotemporal reasoning and situated decision-making in collaborative settings.
☆ PIMBS: Efficient Body Schema Learning for Musculoskeletal Humanoids with Physics-Informed Neural Networks
Musculoskeletal humanoids are robots that closely mimic the human musculoskeletal system, offering various advantages such as variable stiffness control, redundancy, and flexibility. However, their body structure is complex, and muscle paths often significantly deviate from geometric models. To address this, numerous studies have been conducted to learn body schema, particularly the relationships among joint angles, muscle tension, and muscle length. These studies typically rely solely on data collected from the actual robot, but this data collection process is labor-intensive, and learning becomes difficult when the amount of data is limited. Therefore, in this study, we propose a method that applies the concept of Physics-Informed Neural Networks (PINNs) to the learning of body schema in musculoskeletal humanoids, enabling high-accuracy learning even with a small amount of data. By utilizing not only data obtained from the actual robot but also the physical laws governing the relationship between torque and muscle tension under the assumption of correct joint structure, more efficient learning becomes possible. We apply the proposed method to both simulation and an actual musculoskeletal humanoid and discuss its effectiveness and characteristics.
comment: Accepted at IEEE Robotics and Automation Letters
Building Forest Inventories with Autonomous Legged Robots -- System, Lessons, and Challenges Ahead
Legged robots are increasingly being adopted in industries such as oil, gas, mining, nuclear, and agriculture. However, new challenges exist when moving into natural, less-structured environments, such as forestry applications. This paper presents a prototype system for autonomous, under-canopy forest inventory with legged platforms. Motivated by the robustness and mobility of modern legged robots, we introduce a system architecture which enabled a quadruped platform to autonomously navigate and map forest plots. Our solution involves a complete navigation stack for state estimation, mission planning, and tree detection and trait estimation. We report the performance of the system from trials executed over one and a half years in forests in three European countries. Our results with the ANYmal robot demonstrate that we can survey plots up to 1 ha plot under 30 min, while also identifying trees with typical DBH accuracy of 2cm. The findings of this project are presented as five lessons and challenges. Particularly, we discuss the maturity of hardware development, state estimation limitations, open problems in forest navigation, future avenues for robotic forest inventory, and more general challenges to assess autonomous systems. By sharing these lessons and challenges, we offer insight and new directions for future research on legged robots, navigation systems, and applications in natural environments. Additional videos can be found in https://dynamic.robots.ox.ac.uk/projects/legged-robots
comment: 20 pages, 13 figures. Pre-print version of the accepted paper for IEEE Transactions on Field Robotics (T-FR)
☆ Near Time-Optimal Hybrid Motion Planning for Timber Cranes ICRA 2025
Efficient, collision-free motion planning is essential for automating large-scale manipulators like timber cranes. They come with unique challenges such as hydraulic actuation constraints and passive joints-factors that are seldom addressed by current motion planning methods. This paper introduces a novel approach for time-optimal, collision-free hybrid motion planning for a hydraulically actuated timber crane with passive joints. We enhance the via-point-based stochastic trajectory optimization (VP-STO) algorithm to include pump flow rate constraints and develop a novel collision cost formulation to improve robustness. The effectiveness of the enhanced VP-STO as an optimal single-query global planner is validated by comparison with an informed RRT* algorithm using a time-optimal path parameterization (TOPP). The overall hybrid motion planning is formed by combination with a gradient-based local planner that is designed to follow the global planner's reference and to systematically consider the passive joint dynamics for both collision avoidance and sway damping.
comment: Accepted at ICRA 2025
☆ Real-Time Obstacle Avoidance Algorithms for Unmanned Aerial and Ground Vehicles
The growing use of mobile robots in sectors such as automotive, agriculture, and rescue operations reflects progress in robotics and autonomy. In unmanned aerial vehicles (UAVs), most research emphasizes visual SLAM, sensor fusion, and path planning. However, applying UAVs to search and rescue missions in disaster zones remains underexplored, especially for autonomous navigation. This report develops methods for real-time and secure UAV maneuvering in complex 3D environments, crucial during forest fires. Building upon past research, it focuses on designing navigation algorithms for unfamiliar and hazardous environments, aiming to improve rescue efficiency and safety through UAV-based early warning and rapid response. The work unfolds in phases. First, a 2D fusion navigation strategy is explored, initially for mobile robots, enabling safe movement in dynamic settings. This sets the stage for advanced features such as adaptive obstacle handling and decision-making enhancements. Next, a novel 3D reactive navigation strategy is introduced for collision-free movement in forest fire simulations, addressing the unique challenges of UAV operations in such scenarios. Finally, the report proposes a unified control approach that integrates UAVs and unmanned ground vehicles (UGVs) for coordinated rescue missions in forest environments. Each phase presents challenges, proposes control models, and validates them with mathematical and simulation-based evidence. The study offers practical value and academic insights for improving the role of UAVs in natural disaster rescue operations.
☆ Why Robots Are Bad at Detecting Their Mistakes: Limitations of Miscommunication Detection in Human-Robot Dialogue
Detecting miscommunication in human-robot interaction is a critical function for maintaining user engagement and trust. While humans effortlessly detect communication errors in conversations through both verbal and non-verbal cues, robots face significant challenges in interpreting non-verbal feedback, despite advances in computer vision for recognizing affective expressions. This research evaluates the effectiveness of machine learning models in detecting miscommunications in robot dialogue. Using a multi-modal dataset of 240 human-robot conversations, where four distinct types of conversational failures were systematically introduced, we assess the performance of state-of-the-art computer vision models. After each conversational turn, users provided feedback on whether they perceived an error, enabling an analysis of the models' ability to accurately detect robot mistakes. Despite using state-of-the-art models, the performance barely exceeds random chance in identifying miscommunication, while on a dataset with more expressive emotional content, they successfully identified confused states. To explore the underlying cause, we asked human raters to do the same. They could also only identify around half of the induced miscommunications, similarly to our model. These results uncover a fundamental limitation in identifying robot miscommunications in dialogue: even when users perceive the induced miscommunication as such, they often do not communicate this to their robotic conversation partner. This knowledge can shape expectations of the performance of computer vision models and can help researchers to design better human-robot conversations by deliberately eliciting feedback where needed.
comment: Accepted at the 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2025)
☆ Generating and Customizing Robotic Arm Trajectories using Neural Networks
We introduce a neural network approach for generating and customizing the trajectory of a robotic arm, that guarantees precision and repeatability. To highlight the potential of this novel method, we describe the design and implementation of the technique and show its application in an experimental setting of cognitive robotics. In this scenario, the NICO robot was characterized by the ability to point to specific points in space with precise linear movements, increasing the predictability of the robotic action during its interaction with humans. To achieve this goal, the neural network computes the forward kinematics of the robot arm. By integrating it with a generator of joint angles, another neural network was developed and trained on an artificial dataset created from suitable start and end poses of the robotic arm. Through the computation of angular velocities, the robot was characterized by its ability to perform the movement, and the quality of its action was evaluated in terms of shape and accuracy. Thanks to its broad applicability, our approach successfully generates precise trajectories that could be customized in their shape and adapted to different settings.
comment: The code is released at https://github.com/andylucny/nico2/tree/main/generate
☆ Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning
With the advent of Industry 5.0, manufacturers are increasingly prioritizing worker well-being alongside mass customization. Stress-aware Human-Robot Collaboration (HRC) plays a crucial role in this paradigm, where robots must adapt their behavior to human mental states to improve collaboration fluency and safety. This paper presents a novel framework that integrates Federated Learning (FL) to enable personalized mental state evaluation while preserving user privacy. By leveraging physiological signals, including EEG, ECG, EDA, EMG, and respiration, a multimodal model predicts an operator's stress level, facilitating real-time robot adaptation. The FL-based approach allows distributed on-device training, ensuring data confidentiality while improving model generalization and individual customization. Results demonstrate that the deployment of an FL approach results in a global model with performance in stress prediction accuracy comparable to a centralized training approach. Moreover, FL allows for enhancing personalization, thereby optimizing human-robot interaction in industrial settings, while preserving data privacy. The proposed framework advances privacy-preserving, adaptive robotics to enhance workforce well-being in smart manufacturing.
☆ PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
We propose PSALM-V, the first autonomous neuro-symbolic learning system able to induce symbolic action semantics (i.e., pre- and post-conditions) in visual environments through interaction. PSALM-V bootstraps reliable symbolic planning without expert action definitions, using LLMs to generate heuristic plans and candidate symbolic semantics. Previous work has explored using large language models to generate action semantics for Planning Domain Definition Language (PDDL)-based symbolic planners. However, these approaches have primarily focused on text-based domains or relied on unrealistic assumptions, such as access to a predefined problem file, full observability, or explicit error messages. By contrast, PSALM-V dynamically infers PDDL problem files and domain action semantics by analyzing execution outcomes and synthesizing possible error explanations. The system iteratively generates and executes plans while maintaining a tree-structured belief over possible action semantics for each action, iteratively refining these beliefs until a goal state is reached. Simulated experiments of task completion in ALFRED demonstrate that PSALM-V increases the plan success rate from 37% (Claude-3.7) to 74% in partially observed setups. Results on two 2D game environments, RTFM and Overcooked-AI, show that PSALM-V improves step efficiency and succeeds in domain induction in multi-agent settings. PSALM-V correctly induces PDDL pre- and post-conditions for real-world robot BlocksWorld tasks, despite low-level manipulation failures from the robot.
♻ ☆ Using Explainable AI and Hierarchical Planning for Outreach with Robots
Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts, such as K-12 students, the complexities of robot planning can be challenging. This work presents an open-source platform, JEDAI.Ed, that simplifies the process using a visual interface that abstracts the details of various planning processes that robots use for performing complex mobile manipulation tasks. Using principles developed in the field of explainable AI, this intuitive platform enables students to use a high-level intuitive instruction set to perform complex tasks, visualize them on an in-built simulator, and to obtain helpful hints and natural language explanations for errors. Finally, JEDAI.Ed, includes an adaptive curriculum generation method that provides students with customized learning ramps. This platform's efficacy was tested through a user study with university students who had little to no computer science background. Our results show that JEDAI.Ed is highly effective in increasing student engagement, teaching robotics programming, and decreasing the time need to solve tasks as compared to baselines.
♻ ☆ ReLink: Computational Circular Design of Planar Linkage Mechanisms Using Available Standard Parts
The Circular Economy framework emphasizes sustainability by reducing resource consumption and waste through the reuse of components and materials. This paper presents ReLink, a computational framework for the circular design of planar linkage mechanisms using available standard parts. Unlike most mechanism design methods, which assume the ability to create custom parts and infinite part availability, ReLink prioritizes the reuse of discrete, standardized components, thus minimizing the need for new parts. The framework consists of two main components: design generation, where a generative design algorithm generates mechanisms from an inventory of available parts, and inverse design, which uses optimization methods to identify designs that match a user-defined trajectory curve. The paper also examines the trade-offs between kinematic performance and CO2 footprint when incorporating new parts. Challenges such as the combinatorial nature of the design problem and the enforcement of valid solutions are addressed. By combining sustainability principles with kinematic synthesis, ReLink lays the groundwork for further research into computational circular design to support the development of systems that integrate reused components into mechanical products.
comment: 29 pages, 18 figures, Submitted
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement. Furthermore, DSRL avoids many of the challenges associated with finetuning diffusion policies, obviating the need to modify the weights of the base policy at all. We demonstrate DSRL on simulated benchmarks, real-world robotic tasks, and for adapting pretrained generalist policies, illustrating its sample efficiency and effective performance at real-world policy improvement.
♻ ☆ FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation
Handling delicate and fragile objects remains a major challenge for robotic manipulation, especially for rigid parallel grippers. While the simplicity and versatility of parallel grippers have led to widespread adoption, these grippers are limited by their heavy reliance on visual feedback. Tactile sensing and soft robotics can add responsiveness and compliance. However, existing methods typically involve high integration complexity or suffer from slow response times. In this work, we introduce FORTE, a tactile sensing system embedded in compliant gripper fingers. FORTE uses 3D-printed fin-ray grippers with internal air channels to provide low-latency force and slip feedback. FORTE applies just enough force to grasp objects without damaging them, while remaining easy to fabricate and integrate. We find that FORTE can accurately estimate grasping forces from 0-8 N with an average error of 0.2 N, and detect slip events within 100 ms of occurring. We demonstrate FORTE's ability to grasp a wide range of slippery, fragile, and deformable objects. In particular, FORTE grasps fragile objects like raspberries and potato chips with a 98.6% success rate, and achieves 93% accuracy in detecting slip events. These results highlight FORTE's potential as a robust and practical solution for enabling delicate robotic manipulation. Project page: https://merge-lab.github.io/FORTE
♻ ☆ BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles
This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition, followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact that CNNs are inherently effective at extracting distinctive features from LiDAR BEV images. Remarkably, keypoints of two BEV images with large translations can be effectively matched using CNN-extracted features. Building on this insight, we design a Rotation Equivariant Module (REM) to obtain distinctive features while enhancing robustness to rotational changes. A Rotation Equivariant and Invariant Network (REIN) is then developed by cascading REM and a descriptor generator, NetVLAD, to sequentially generate rotation equivariant local features and rotation invariant global descriptors. The global descriptors are used first to achieve robust place recognition, and then local features are used for accurate pose estimation. \revise{Experimental results on seven public datasets and our UGV platform demonstrate that BEVPlace++, even when trained on a small dataset (3000 frames of KITTI) only with place labels, generalizes well to unseen environments, performs consistently across different days and years, and adapts to various types of LiDAR scanners.} BEVPlace++ achieves state-of-the-art performance in multiple tasks, including place recognition, loop closure detection, and global localization. Additionally, BEVPlace++ is lightweight, runs in real-time, and does not require accurate pose supervision, making it highly convenient for deployment. \revise{The source codes are publicly available at https://github.com/zjuluolun/BEVPlace2.
comment: Accepted to IEEE Transactions on Robotics
♻ ☆ A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation tasks into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact points and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.
♻ ☆ Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains
The human-robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot's movement, the human operator is often out of the robot's field of view. Furthermore, in multi-robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi-robot system's learning. Furthermore, our multi-robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.
comment: version 2
♻ ☆ Physics-informed Imitative Reinforcement Learning for Real-world Driving
Recent advances in imitative reinforcement learning (IRL) have considerably enhanced the ability of autonomous agents to assimilate expert demonstrations, leading to rapid skill acquisition in a range of demanding tasks. However, such learning-based agents face significant challenges when transferring knowledge to highly dynamic closed-loop environments. Their performance is significantly impacted by the conflicting optimization objectives of imitation learning (IL) and reinforcement learning (RL), sample inefficiency, and the complexity of uncovering the hidden world model and physics. To address this challenge, we propose a physics-informed IRL that is entirely data-driven. It leverages both expert demonstration data and exploratory data with a joint optimization objective, allowing the underlying physical principles of vehicle dynamics to emerge naturally from the training process. The performance is evaluated through empirical experiments and results exceed popular IL, RL and IRL algorithms in closed-loop settings on Waymax benchmark. Our approach exhibits 37.8% reduction in collision rate and 22.2% reduction in off-road rate compared to the baseline method.
♻ ☆ Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration
Neural field-based SLAM methods typically employ a single, monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings, we propose a novel RGB-D neural mapping framework in which the scene is represented by a collection of lightweight neural fields which are dynamically anchored to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures, while requiring only minimal reintegration. Furthermore, we verify the scalability of our approach by demonstrating successful building-scale mapping taking multiple loop closures into account during the optimization, and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available open-source at https://github.com/KTH-RPL/neural_graph_mapping.
comment: WACV 2025, Project page: https://kth-rpl.github.io/neural_graph_mapping/
♻ ☆ Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning ICML 2025
Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.
comment: ICML 2025
♻ ☆ Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models IROS 2025
Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Real-world experiments are also conducted to further validate its effectiveness. Our open-source project page can be found at https://andycao1125.github.io/mamba_policy/.
comment: Accepted to IROS 2025
♻ ☆ Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain IROS 2025
Achieving robust locomotion on complex terrains remains a challenge due to high dimensional control and environmental uncertainties. This paper introduces a teacher prior framework based on the teacher student paradigm, integrating imitation and auxiliary task learning to improve learning efficiency and generalization. Unlike traditional paradigms that strongly rely on encoder-based state embeddings, our framework decouples the network design, simplifying the policy network and deployment. A high performance teacher policy is first trained using privileged information to acquire generalizable motion skills. The teacher's motion distribution is transferred to the student policy, which relies only on noisy proprioceptive data, via a generative adversarial mechanism to mitigate performance degradation caused by distributional shifts. Additionally, auxiliary task learning enhances the student policy's feature representation, speeding up convergence and improving adaptability to varying terrains. The framework is validated on a humanoid robot, showing a great improvement in locomotion stability on dynamic terrains and significant reductions in development costs. This work provides a practical solution for deploying robust locomotion strategies in humanoid robots.
comment: 8 pages, 6 figures, 6 tables, IROS 2025
FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion
3D gaussian splatting has advanced simultaneous localization and mapping (SLAM) technology by enabling real-time positioning and the construction of high-fidelity maps. However, the uncertainty in gaussian position and initialization parameters introduces challenges, often requiring extensive iterative convergence and resulting in redundant or insufficient gaussian representations. To address this, we introduce a novel adaptive densification method based on Fourier frequency domain analysis to establish gaussian priors for rapid convergence. Additionally, we propose constructing independent and unified sparse and dense maps, where a sparse map supports efficient tracking via Generalized Iterative Closest Point (GICP) and a dense map creates high-fidelity visual representations. This is the first SLAM system leveraging frequency domain analysis to achieve high-quality gaussian mapping in real-time. Experimental results demonstrate an average frame rate of 36 FPS on Replica and TUM RGB-D datasets, achieving competitive accuracy in both localization and mapping.
♻ ☆ IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model
Solving Inverse Kinematics (IK) problems is fundamental to robotics, but has primarily been successful with single serial manipulators. For multi-arm robotic systems, IK remains challenging due to complex self-collisions, coupled joints, and high-dimensional redundancy. These complexities make traditional IK solvers slow, prone to failure, and lacking in solution diversity. In this paper, we present IKDiffuser, a diffusion-based model designed for fast and diverse IK solution generation for multi-arm robotic systems. IKDiffuser learns the joint distribution over the configuration space, capturing complex dependencies and enabling seamless generalization to multi-arm robotic systems of different structures. In addition, IKDiffuser can incorporate additional objectives during inference without retraining, offering versatility and adaptability for task-specific requirements. In experiments on 6 different multi-arm systems, the proposed IKDiffuser achieves superior solution accuracy, precision, diversity, and computational efficiency compared to existing solvers. The proposed IKDiffuser framework offers a scalable, unified approach to solving multi-arm IK problems, facilitating the potential of multi-arm robotic systems in real-time manipulation tasks.
comment: under review
♻ ☆ EvDetMAV: Generalized MAV Detection from Moving Event Cameras
Existing micro aerial vehicle (MAV) detection methods mainly rely on the target's appearance features in RGB images, whose diversity makes it difficult to achieve generalized MAV detection. We notice that different types of MAVs share the same distinctive features in event streams due to their high-speed rotating propellers, which are hard to see in RGB images. This paper studies how to detect different types of MAVs from an event camera by fully exploiting the features of propellers in the original event stream. The proposed method consists of three modules to extract the salient and spatio-temporal features of the propellers while filtering out noise from background objects and camera motion. Since there are no existing event-based MAV datasets, we introduce a novel MAV dataset for the community. This is the first event-based MAV dataset comprising multiple scenarios and different types of MAVs. Without training, our method significantly outperforms state-of-the-art methods and can deal with challenging scenarios, achieving a precision rate of 83.0\% (+30.3\%) and a recall rate of 81.5\% (+36.4\%) on the proposed testing dataset. The dataset and code are available at: https://github.com/WindyLab/EvDetMAV.
comment: 8 pages, 7 figures. This paper is accepted by IEEE Robotics and Automation Letters
♻ ☆ AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation
We present AnchorDP3, a diffusion policy framework for dual-arm robotic manipulation that achieves state-of-the-art performance in highly randomized environments. AnchorDP3 integrates three key innovations: (1) Simulator-Supervised Semantic Segmentation, using rendered ground truth to explicitly segment task-critical objects within the point cloud, which provides strong affordance priors; (2) Task-Conditioned Feature Encoders, lightweight modules processing augmented point clouds per task, enabling efficient multi-task learning through a shared diffusion-based action expert; (3) Affordance-Anchored Keypose Diffusion with Full State Supervision, replacing dense trajectory prediction with sparse, geometrically meaningful action anchors, i.e., keyposes such as pre-grasp pose, grasp pose directly anchored to affordances, drastically simplifying the prediction space; the action expert is forced to predict both robot joint angles and end-effector poses simultaneously, which exploits geometric consistency to accelerate convergence and boost accuracy. Trained on large-scale, procedurally generated simulation data, AnchorDP3 achieves a 98.7% average success rate in the RoboTwin benchmark across diverse tasks under extreme randomization of objects, clutter, table height, lighting, and backgrounds. This framework, when integrated with the RoboTwin real-to-sim pipeline, has the potential to enable fully autonomous generation of deployable visuomotor policies from only scene and instruction, totally eliminating human demonstrations from learning manipulation skills.
Systems and Control 32
☆ Resilience Through Escalation: A Graph-Based PACE Architecture for Satellite Threat Response
Satellite systems increasingly face operational risks from jamming, cyberattacks, and electromagnetic disruptions. Traditional redundancy strategies often fail against dynamic, multi-vector threats. This paper introduces a resilience-by-design framework grounded in the PACE (Primary, Alternate, Contingency, Emergency) methodology, originally developed for tactical communications in military operations, adapting it to satellite systems through a layered state-transition model informed by threat scoring frameworks such as CVSS, DREAD, and NASA's risk matrix. We define a dynamic resilience index to quantify system adaptability and implement three PACE variants: static, adaptive, and softmax-based decision models, to evaluate resilience under diverse disruption scenarios. The proposed approach highlights the effectiveness of lightweight, decision-aware fallback mechanisms in improving survivability and operational continuity for next-generation space assets.
☆ DPLib: A Standard Benchmark Library for Distributed Power System Analysis and Optimization
\textit{DPLib} is an open-source MATLAB-based benchmark library created to support research and development in distributed and decentralized power system analysis and optimization. Distributed and decentralized methods offer scalability, privacy preservation, and resilience to single points of failure, making them increasingly important for modern power systems. However, unlike centralized tools such as MATPOWER, no general-purpose, reproducible data library package currently exists for distributed power system studies. DPLib fills this gap by providing a standard power system library featuring over 20 multi-region benchmark test cases of varying sizes, along with a graph-based partitioning toolkit that decomposes any MATPOWER test system into multiple electrically coherent regions. The partitioning toolkit, an easy-to-use MATLAB code, generates standardized \texttt{.mat} and \texttt{.m} files, along with region visualizations for intuitive understanding. We also provide modular, easy-to-use distributed optimal power flow (OPF) solvers: an alternating direction method of multipliers(ADMM)-based DC-OPF solver implemented in YALMIP, and an ADMM-based AC-OPF solver leveraging IPOPT. These solvers validate the generated test systems for distributed optimization applications. Numerical results validate the generated test cases, establishing DPLib as a foundation for reproducible distributed power system research.
☆ Structural System Identification via Validation and Adaptation
Estimating the governing equation parameter values is essential for integrating experimental data with scientific theory to understand, validate, and predict the dynamics of complex systems. In this work, we propose a new method for structural system identification (SI), uncertainty quantification, and validation directly from data. Inspired by generative modeling frameworks, a neural network maps random noise to physically meaningful parameters. These parameters are then used in the known equation of motion to obtain fake accelerations, which are compared to real training data via a mean square error loss. To simultaneously validate the learned parameters, we use independent validation datasets. The generated accelerations from these datasets are evaluated by a discriminator network, which determines whether the output is real or fake, and guides the parameter-generator network. Analytical and real experiments show the parameter estimation accuracy and model validation for different nonlinear structural systems.
☆ Noise-Tolerant Hybrid Approach for Data-Driven Predictive Control
This paper focuses on a key challenge in hybrid data-driven predictive control: the effect of measurement noise on Hankel matrices. While noise is handled in direct and indirect methods, hybrid approaches often overlook its impact during trajectory estimation. We propose a Noise-Tolerant Data-Driven Predictive Control (NTDPC) framework that integrates singular value decomposition to separate system dynamics from noise within reduced-order Hankel matrices. This enables accurate prediction with shorter data horizons and lower computational effort. A sensitivity index is introduced to support horizon selection under different noise levels. Simulation results indicate improved robustness and efficiency compared to existing hybrid methods.
☆ Distributed Lyapunov Functions for Nonlinear Networks
Nonlinear networks are often multistable, exhibiting coexisting stable states with competing regions of attraction (ROAs). As a result, ROAs can have complex "tentacle-like" morphologies that are challenging to characterize analytically or computationally. In addition, the high dimensionality of the state space prohibits the automated construction of Lyapunov functions using state-of-the-art optimization methods, such as sum-of-squares (SOS) programming. In this letter, we propose a distributed approach for the construction of Lyapunov functions based solely on local information. To this end, we establish an augmented comparison lemma that characterizes the existence conditions of partial Lyapunov functions, while also accounting for residual effects caused by the associated dimensionality reduction. These theoretical results allow us to formulate an SOS optimization that iteratively constructs such partial functions, whose aggregation forms a composite Lyapunov function. The resulting composite function provides accurate convex approximations of both the volumes and shapes of the ROAs. We validate our method on networks of van der Pol and Ising oscillators, demonstrating its effectiveness in characterizing high-dimensional systems with non-convex ROAs.
comment: Codes are available at our GitHub repository https://github.com/YimingSci/Distributed-Lya-Func
☆ Identifiability and Maximum Likelihood Estimation for System Identification of Networks of Dynamical Systems
In this paper we investigate identifiability and maximum likelihood estimation for direct system identification of networks of dynamical systems. We provide necessary and sufficient conditions for network identifiability in terms of Gr\"obner bases. We show that the maximum likelihood approach is both consistent and efficient, which is in contrast to existing prediction error approaches. Moreover, our approach has wider applicability, i.e., it is applicable whenever network identifiability holds. Finally, we show that we can formulate the maximum likelihood problem without the use of a predictor, which is the key to numerically being able to solve it efficiently.
comment: This work has been submitted to the IEEE for possible publication. Submitted to IEEE Transactions on Automatic Control
☆ Task Allocation of UAVs for Monitoring Missions via Hardware-in-the-Loop Simulation and Experimental Validation
This study addresses the optimisation of task allocation for Unmanned Aerial Vehicles (UAVs) within industrial monitoring missions. The proposed methodology integrates a Genetic Algorithms (GA) with a 2-Opt local search technique to obtain a high-quality solution. Our approach was experimentally validated in an industrial zone to demonstrate its efficacy in real-world scenarios. Also, a Hardware-in-the-loop (HIL) simulator for the UAVs team is introduced. Moreover, insights about the correlation between the theoretical cost function and the actual battery consumption and time of flight are deeply analysed. Results show that the considered costs for the optimisation part of the problem closely correlate with real-world data, confirming the practicality of the proposed approach.
☆ Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges
The global demand for sustainable protein sources has accelerated the need for intelligent tools that can rapidly process and synthesise domain-specific scientific knowledge. In this study, we present a proof-of-concept multi-agent Artificial Intelligence (AI) framework designed to support sustainable protein production research, with an initial focus on microbial protein sources. Our Retrieval-Augmented Generation (RAG)-oriented system consists of two GPT-based LLM agents: (1) a literature search agent that retrieves relevant scientific literature on microbial protein production for a specified microbial strain, and (2) an information extraction agent that processes the retrieved content to extract relevant biological and chemical information. Two parallel methodologies, fine-tuning and prompt engineering, were explored for agent optimisation. Both methods demonstrated effectiveness at improving the performance of the information extraction agent in terms of transformer-based cosine similarity scores between obtained and ideal outputs. Mean cosine similarity scores were increased by up to 25%, while universally reaching mean scores of $\geq 0.89$ against ideal output text. Fine-tuning overall improved the mean scores to a greater extent (consistently of $\geq 0.94$) compared to prompt engineering, although lower statistical uncertainties were observed with the latter approach. A user interface was developed and published for enabling the use of the multi-agent AI system, alongside preliminary exploration of additional chemical safety-based search capabilities
☆ Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control
Traditional wind farm control operates each turbine independently to maximize individual power output. However, coordinated wake steering across the entire farm can substantially increase the combined wind farm energy production. Although dynamic closed-loop control has proven effective in flow control applications, wind farm optimization has relied primarily on static, low-fidelity simulators that ignore critical turbulent flow dynamics. In this work, we present the first reinforcement learning (RL) controller integrated directly with high-fidelity large-eddy simulation (LES), enabling real-time response to atmospheric turbulence through collaborative, dynamic control strategies. Our RL controller achieves a 4.30% increase in wind farm power output compared to baseline operation, nearly doubling the 2.19% gain from static optimal yaw control obtained through Bayesian optimization. These results establish dynamic flow-responsive control as a transformative approach to wind farm optimization, with direct implications for accelerating renewable energy deployment to net-zero targets.
☆ Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation
Industrial Non-Intrusive Load Monitoring (NILM) is limited by the scarcity of high-quality datasets and the complex variability of industrial energy consumption patterns. To address data scarcity and privacy issues, we introduce the Synthetic Industrial Dataset for Energy Disaggregation (SIDED), an open-source dataset generated using Digital Twin simulations. SIDED includes three types of industrial facilities across three different geographic locations, capturing diverse appliance behaviors, weather conditions, and load profiles. We also propose the Appliance-Modulated Data Augmentation (AMDA) method, a computationally efficient technique that enhances NILM model generalization by intelligently scaling appliance power contributions based on their relative impact. We show in experiments that NILM models trained with AMDA-augmented data significantly improve the disaggregation of energy consumption of complex industrial appliances like combined heat and power systems. Specifically, in our out-of-sample scenarios, models trained with AMDA achieved a Normalized Disaggregation Error of 0.093, outperforming models trained without data augmentation (0.451) and those trained with random data augmentation (0.290). Data distribution analyses confirm that AMDA effectively aligns training and test data distributions, enhancing model generalization.
☆ Analyzing the Impact of Strategic Bidding on the Reserve Capacity via a Bi-Level Model
The growing integration of renewable energy sources necessitates adequate reserve capacity to maintain power balance. However, in market clearing, power companies with flexible resources may submit strategic bids to maximize profits, potentially compromising system reserves. This paper examines the effects of such strategic behavior by modeling the market as a bi-level problem. The upper level represents a strategic company aiming to maximize profit, while the lower level simulates the system operator clearing the market based on submitted offers. To enable duality-based solution methods, we approximate unit commitments with a continuous reserve capacity calculation. Case studies indicate that, in an imperfectly competitive market, more units are incentivized to operate,enhancing system reserves. However, some units go online mainly for profit, ultimately raising electricity costs for consumers. These findings highlight the importance of market design in managing the trade-off between reserve adequacy and economic efficiency in the presence of strategic bidding behavior.
☆ A Decomposition Method for Finite-Time Stabilization of Bilinear Systems with Applications to Parabolic and Hyperbolic Equations
In this work, we address the problem of finite-time stabilization for a class of bilinear system. We propose a decomposition-based approach in which the nominal system is split into two subsystems, one of which is inherently finite-time stable without control. This allows the stabilization analysis to focus solely on the remaining subsystem. To ensure the well-posedness of the closed-loop system, we establish sufficient conditions on the system and control operators. The stabilization results are then derived using a suitable Lyapunov function and an observation condition. The effectiveness of the proposed approach is demonstrated through examples involving both parabolic and hyperbolic infinite-dimensional systems.
☆ Learning-based safety lifting monitoring system for cranes on construction sites
Lifting on construction sites, as a frequent operation, works still with safety risks, especially for modular integrated construction (MiC) lifting due to its large weight and size, probably leading to accidents, causing damage to the modules, or more critically, posing safety hazards to on-site workers. Aiming to reduce the safety risks in lifting scenarios, we design an automated safe lifting monitoring algorithm pipeline based on learning-based methods, and deploy it on construction sites. This work is potentially to increase the safety and efficiency of MiC lifting process via automation technologies. A dataset is created consisting of 1007 image-point cloud pairs (37 MiC liftings). Advanced object detection models are trained for automated two-dimensional (2D) detection of MiCs and humans. Fusing the 2D detection results with the point cloud information allows accurate determination of the three-dimensional (3D) positions of MiCs and humans. The system is designed to automatically trigger alarms that notify individuals in the MiC lifting danger zone, while providing the crane operator with real-time lifting information and early warnings. The monitoring process minimizes the human intervention and no or less signal men are required on real sites assisted by our system. A quantitative analysis is conducted to evaluate the effectiveness of the algorithmic pipeline. The pipeline shows promising results in MiC and human perception with the mean distance error of 1.5640 m and 0.7824 m respectively. Furthermore, the developed system successfully executes safety risk monitoring and alarm functionalities during the MiC lifting process with limited manual work on real construction sites.
comment: 20 pages, 10 figures
☆ A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem
Multi-agent-based simulations (MABS) of electric vehicle (EV) home charging ecosystems generate large, complex, and stochastic time-series datasets that capture interactions between households, grid infrastructure, and energy markets. These interactions can lead to unexpected system-level events, such as transformer overloads or consumer dissatisfaction, that are difficult to detect and explain through static post-processing. This paper presents a modular, Python-based dashboard framework, built using Dash by Plotly, that enables efficient, multi-level exploration and root-cause analysis of emergent behavior in MABS outputs. The system features three coordinated views (System Overview, System Analysis, and Consumer Analysis), each offering high-resolution visualizations such as time-series plots, spatial heatmaps, and agent-specific drill-down tools. A case study simulating full EV adoption with smart charging in a Danish residential network demonstrates how the dashboard supports rapid identification and contextual explanation of anomalies, including clustered transformer overloads and time-dependent charging failures. The framework facilitates actionable insight generation for researchers and distribution system operators, and its architecture is adaptable to other distributed energy resources and complex energy systems.
☆ Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design
This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark, demonstrating the effectiveness of the proposed schemes.
comment: 16 pages, 7 figures, submitted to IEEE Transactions on Automatic Control (under review)
☆ Real-Time Obstacle Avoidance Algorithms for Unmanned Aerial and Ground Vehicles
The growing use of mobile robots in sectors such as automotive, agriculture, and rescue operations reflects progress in robotics and autonomy. In unmanned aerial vehicles (UAVs), most research emphasizes visual SLAM, sensor fusion, and path planning. However, applying UAVs to search and rescue missions in disaster zones remains underexplored, especially for autonomous navigation. This report develops methods for real-time and secure UAV maneuvering in complex 3D environments, crucial during forest fires. Building upon past research, it focuses on designing navigation algorithms for unfamiliar and hazardous environments, aiming to improve rescue efficiency and safety through UAV-based early warning and rapid response. The work unfolds in phases. First, a 2D fusion navigation strategy is explored, initially for mobile robots, enabling safe movement in dynamic settings. This sets the stage for advanced features such as adaptive obstacle handling and decision-making enhancements. Next, a novel 3D reactive navigation strategy is introduced for collision-free movement in forest fire simulations, addressing the unique challenges of UAV operations in such scenarios. Finally, the report proposes a unified control approach that integrates UAVs and unmanned ground vehicles (UGVs) for coordinated rescue missions in forest environments. Each phase presents challenges, proposes control models, and validates them with mathematical and simulation-based evidence. The study offers practical value and academic insights for improving the role of UAVs in natural disaster rescue operations.
☆ Cooperative Sensing and Communication Beamforming Design for Low-Altitude Economy
To empower the low-altitude economy with high-accuracy sensing and high-rate communication, this paper proposes a cooperative integrated sensing and communication (ISAC) framework for aerial-ground networks. In the proposed system, the ground base stations (BSs) cooperatively serve the unmanned aerial vehicles (UAVs), which are equipped for either joint communication and sensing or sensing-only operations. The BSs employ coordinated beamforming to simultaneously transmit communication and sensing signals, while the UAVs execute their missions. To maximize the weighted sum rate under the sensing signal-to-interference-plus-noise ratio (SINR) constraints, we jointly optimize the transmit beamforming, receive filtering, and UAV trajectory. The resulting non-convex problem is solved using an alternating optimization framework incorporating semidefinite relaxation (SDR) and successive convex approximation (SCA). Simulation results demonstrate that the proposed joint design achieves higher communication throughput while ensuring required sensing robustness. Additionally, the sensing SINR threshold and the UAV altitude have a significant impact on the trajectory design, highlighting the necessity of adaptive deployment strategies in practical applications.
☆ A Data-Driven Approach for Topology Correction in Low Voltage Networks with DERs
This paper introduces a data-driven topology identification and correction approach for low-voltage distribution networks (LVDNs) combined with a time-based smart meter data selection strategy, aiming to correct outdated recordings and identify the missed recordings. The proposed approach solely relies on voltage magnitude measurements, releasing privacy concerns and measurement burdens. It enables the distribution system operators to identify switch states through supervised learning algorithms, as well as determine user-feeder connections and phase labels of customers by a modified Hierarchical Clustering algorithm. To address the similarity among smart meter (SM) data caused by distributed photovoltaic (PV) systems, a time-based SM data selection strategy is combined with the proposed correlation analysis. The feasibility and robustness of the proposed approach are validated using modified real-world LVDNs and multiple incomplete SM datasets collected from customers in the Netherlands. The results demonstrate that the time-based SM data selection strategy effectively mitigates their impact on phase identification, and the corrected topology not only improves network observability but also supports network operators in load balancing and PV consumption.
☆ Causal discovery in deterministic discrete LTI-DAE systems
Discovering pure causes or driver variables in deterministic LTI systems is of vital importance in the data-driven reconstruction of causal networks. A recent work by Kathari and Tangirala, proposed in 2022, formulated the causal discovery method as a constraint identification problem. The constraints are identified using a dynamic iterative PCA (DIPCA)-based approach for dynamical systems corrupted with Gaussian measurement errors. The DIPCA-based method works efficiently for dynamical systems devoid of any algebraic relations. However, several dynamical systems operate under feedback control and/or are coupled with conservation laws, leading to differential-algebraic (DAE) or mixed causal systems. In this work, a method, namely the partition of variables (PoV), for causal discovery in LTI-DAE systems is proposed. This method is superior to the method that was presented by Kathari and Tangirala (2022), as PoV also works for pure dynamical systems, which are devoid of algebraic equations. The proposed method identifies the causal drivers up to a minimal subset. PoV deploys DIPCA to first determine the number of algebraic relations ($n_a$), the number of dynamical relations ($n_d$) and the constraint matrix. Subsequently, the subsets are identified through an admissible partitioning of the constraint matrix by finding the condition number of it. Case studies are presented to demonstrate the effectiveness of the proposed method.
☆ Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox
The convergence of IT and OT has created hyper-connected ICS, exposing critical infrastructure to a new class of adaptive, intelligent adversaries that render static defenses obsolete. Existing security paradigms often fail to address a foundational "Trinity of Trust," comprising the fidelity of the system model, the integrity of synchronizing data, and the resilience of the analytical engine against sophisticated evasion. This paper introduces the ARC framework, a method for achieving analytical resilience through an autonomous, closed-loop hardening process. ARC establishes a perpetual co-evolutionary arms race within the high-fidelity sandbox of a F-SCDT. A DRL agent, the "Red Agent," is formalized and incentivized to autonomously discover stealthy, physically-plausible attack paths that maximize process disruption while evading detection. Concurrently, an ensemble-based "Blue Agent" defender is continuously hardened via adversarial training against the evolving threats discovered by its adversary. This co-evolutionary dynamic forces both agents to become progressively more sophisticated, enabling the system to autonomously probe and patch its own vulnerabilities. Experimental validation on both the TEP and the SWaT testbeds demonstrates the framework's superior performance. A comprehensive ablation study, supported by extensive visualizations including ROC curves and SHAP plots, reveals that the co-evolutionary process itself is responsible for a significant performance increase in detecting novel attacks. By integrating XAI to ensure operator trust and proposing a scalable F-ARC architecture, this work presents ARC not merely as an improvement, but as a necessary paradigm shift toward dynamic, self-improving security for the future of critical infrastructure.
comment: 17 pages, 2 figures, 4 equations, 2 algorithms, 4 tables, to be published in ISPACS Conference 2025, unabridged version
☆ First experimental demonstration of plasma shape control in a tokamak through Model Predictive Control
In this work, a Model Predictive Controller (MPC) is proposed to control the plasma shape in the Tokamak \`a Configuration Variable (TCV). The proposed controller relies on models obtained by coupling linearized plasma response models, derived from the \texttt{fge} code of the Matlab EQuilibrium toolbox (MEQ) suite, with a state-space description of the core TCV magnetic control system. It optimizes the reference signals fed to this inner control loop in order to achieve the desired plasma shape while also enforcing constraints on the plant outputs. To this end, a suitable Quadratic Programming (QP) problem is formulated and solved in real-time. The effectiveness of the proposed controller is illustrated through a combination of simulations and experimental results. To the best of our knowledge, this is the first time that a plasma shape control solution based on MPC has been experimentally tested on a real tokamak.
comment: 6 pages, accepted for CCTA2025
♻ ☆ Observability and Generalized Sensor Placement for Nonlinear Quality Models in Drinking Water Networks
This paper studies the problem of optimal placement of water quality (WQ) sensors in water distribution networks (WDNs), with a focus on chlorine transport, decay, and reaction models. Such models are traditionally used as suitable proxies for WQ. The literature on this topic is inveterate, but has a key limitation: it utilizes simplified single-species decay and reaction models that do not capture WQ transients for nonlinear, multi-species interactions. This results in sensor placements (SP) that do not account for nonlinear WQ dynamics. Furthermore, as WQ simulations are parameterized by hydraulic profiles and demand patterns, the placement of sensors are often hydraulics-dependent. This study produces a greedy algorithm that addresses the two aforementioned limitations. The algorithm is grounded in nonlinear dynamic systems and observability theory, and yields SPs that are submodular and robust to hydraulic changes. Case studies on benchmark water networks are provided. The key findings provide practical recommendations for WDN operators.
♻ ☆ Controller Adaptation via Learning Solutions of Contextual Bayesian Optimization RAL
In this work, we propose a framework for adapting the controller's parameters based on learning optimal solutions from contextual black-box optimization problems. We consider a class of control design problems for dynamical systems operating in different environments or conditions represented by contextual parameters. The overarching goal is to identify the controller parameters that maximize the controlled system's performance, given different realizations of the contextual parameters.We formulate a contextual Bayesian optimization problem in which the solution is actively learned using Gaussian processes to approximate the controller adaptation strategy. We demonstrate the efficacy of the proposed framework with a sim-to-real example. We learn the optimal weighting strategy of a model predictive control for connected and automated vehicles interacting with human-driven vehicles from simulations and then deploy it in a real-time experiment.
comment: final version to RAL
♻ ☆ A Passivity Analysis for Nonlinear Consensus on Balanced Digraphs
This work deals with the output consensus problem for multiagent systems over balanced digraphs by passivity analysis. As the standard diffusive coupling structure only models the undirected interconnection, we propose a general approach capable of processing directed coupling and performing passivity analysis. To mitigate the complexity arising from the nonlinearity and directed interconnections, we reformulate the output consensus problem as a convergence analysis on a submanifold. We provide passivity analysis and establish a sufficient condition based on passivity for achieving output agreement in multi-agent systems over balanced digraphs. The results are supported by a numerical example.
comment: 9 pages, 2 figures, accepted by ECC 2025, and selected to EJC
♻ ☆ Inference and Learning of Nonlinear LFR State-Space Models
Estimating the parameters of nonlinear block-oriented state-space models from input-output data typically involves solving a highly non-convex optimization problem, which is prone to poor local minima and slow convergence. This paper presents a computationally efficient initialization method for nonlinear linear fractional representation (NL-LFR) models using periodic data. By first inferring the latent signals and subsequently estimating the model parameters, the approach generates initial estimates for use in a later nonlinear optimization step. The proposed method shows robustness against poor local minima, and achieves a twofold error reduction compared to the state-of-the-art on a challenging benchmark dataset.
comment: Final, published paper in IEEE Xplore: https://ieeexplore.ieee.org/abstract/document/11037476/
♻ ☆ Nonparametric Steady-State Learning for Nonlinear Output Feedback Regulation
This article addresses the nonadaptive and robust output regulation problem of the general nonlinear output feedback system with error output. The global robust output regulation problem for a class of general output feedback nonlinear systems with an uncertain exosystem and high relative degree can be tackled by constructing a linear generic internal model, provided that a continuous nonlinear mapping exists. Leveraging the proposed nonadaptive framework facilitates the conversion of the nonlinear robust output regulation problem into a robust nonadaptive stabilization formulation for the augmented system endowed with Input-to-State Stable dynamics. This approach removes the need for constructing a specific Lyapunov function with positive semi-definite derivatives and avoids the common assumption of linear parameterization of the nonlinear system. The nonadaptive approach is extended by incorporating the nonparametric learning framework to ensure the feasibility of the nonlinear mapping, which can be tackled using a data-driven method. Moreover, the introduced nonparametric learning framework allows the controlled system to learn the dynamics of the steady-state input behaviour from the signal generated from the internal model with the output error as the feedback. As a result, the nonadaptive/nonparametric approach can be advantageous to guarantee the convergence of the estimation and tracking error even when the underlying controlled system dynamics are complex or poorly understood. The effectiveness of the theoretical results is illustrated for a benchmark example: a controlled duffing system and two practical examples: a continuously stirred tank reactor and a continuous bioreactor.
comment: 16 pages, 18 figures
♻ ☆ Age-of-information minimization under energy harvesting and non-stationary environment
This work focuses on minimizing the age of information for multiple energy harvesting sources that sample data and transmit it to a sink node. At each time, the central scheduler selects one of the sources to probe the quality of its channel to the sink node, and then the assessed channel quality is utilized to determine whether a source will sample and send the packet. For a single source case, we assume that the probed channel quality is known at each time instant, model the problem of AoI minimization as a Markov decision process, and prove the optimal sampling policy threshold structure. We then use this threshold structure and propose an AEC-SW-UCRL2 algorithm to handle unknown and time varying energy harvesting rate and channel statistics, motivated by the popular SWUCRL2 algorithm for non stationary reinforcement learning. This algorithm is applicable when an upper bound is available for the total variation of each of these quantities over a time horizon. Furthermore, in situations where these variation budgets are not accessible, we introduce the AEC-BORL algorithm, motivated by the well known BORL algorithm. For the multiple source case, we demonstrate that the AoI minimization problem can be formulated as a constrained MDP, which can be relaxed using a Lagrange multiplier and decoupled into sub problems across source nodes. We also derive Whittle index based source scheduling policy for probing and an optimal threshold policy for source sampling. We next leverage this Whittle index and threshold structure to develop the WIT-SW-UCRL2 algorithm for unknown time varying energy harvesting rates and channel statistics under their respective variation budgets. Moreover, we also proposed a Whittle index and threshold based bandit over reinforcement learning (WIT-BORL) algorithm for unknown variation budgets. Finally, we numerically demonstrate the efficacy of our algorithms.
comment: 15 pages, 4 figures
♻ ☆ Thermal Modelling of Battery Cells for Optimal Tab and Surface Cooling Control
Optimal cooling that minimises thermal gradients and the average temperature is essential for enhanced battery safety and health. This work presents a new modelling approach for battery cells of different shapes by integrating Chebyshev spectral-Galerkin method and model component decomposition. As a result, a library of reduced-order computationally efficient battery thermal models is obtained, characterised by different numbers of states. These models are validated against a high-fidelity finite element model and are compared with a thermal equivalent circuit (TEC) model under real-world vehicle driving and battery cooling scenarios. Illustrative results demonstrate that the proposed model with four states can faithfully capture the two-dimensional thermal dynamics, while the model with only one state significantly outperforms the widely-used two-state TEC model in both accuracy and computational efficiency, reducing computation time by 28.7%. Furthermore, our developed models allow for independent control of tab and surface cooling channels, enabling effective thermal performance optimisation. Additionally, the proposed model's versatility and effectiveness are demonstrated through various applications, including the evaluation of different cooling scenarios, closed-loop temperature control, and cell design optimisation.
comment: 13 pages, 6 figures, 3 tables
♻ ☆ Nonlinear Online Optimization for Vehicle-Home-Grid Integration including Household Load Prediction and Battery Degradation
This paper investigates the economic impact of vehicle-home-grid integration, by proposing an online energy management algorithm that optimizes energy flows between an electric vehicle (EV), a household, and the electrical grid. The algorithm leverages vehicle-to-home (V2H) for self-consumption and vehicle-to-grid (V2G) for energy trading, adapting to real-time conditions through a hybrid long short-term memory (LSTM) neural network for accurate household load prediction, alongside a comprehensive nonlinear battery degradation model accounting for both cycle and calendar aging. Simulation results reveal significant economic advantages: compared to smart unidirectional charging, the proposed method yields an annual economic benefit of up to EUR 3046.81, despite a modest 1.96% increase in battery degradation. Even under unfavorable market conditions, where V2G energy selling generates no revenue, V2H alone ensures yearly savings of EUR 425.48. A systematic sensitivity analysis investigates how variations in battery capacity, household load, and price ratios affect economic outcomes, confirming the consistent benefits of bidirectional energy exchange. These findings highlight the potential of EVs as active energy nodes, enabling sustainable energy management and cost-effective battery usage in real-world conditions.
comment: Submitted to the 2025 IEEE Conference on Decision and Control (CDC)
♻ ☆ Multi-Class Stackelberg Games for the Co-Design of Networked Systems
We investigate a co-design problem, encompassing simultaneous design of system infrastructure and control, through a game-theoretical framework. To this end, we propose the co-design problem as a two-layer hierarchical strategic interaction. At the upper layer, a leader (or multiple leaders) determines system design parameters, while at the lower layer, a follower (or multiple followers) optimizes the control strategy. To capture this hierarchy, we propose four novel classes of Stackelberg games that integrate diverse strategic behaviors, including combinations of cooperative and non-cooperative interactions across two different layers. Notably, the leaders' interactions are represented using a normal-form game, whereas the followers' interactions are modeled by different games (dynamic games in discrete time). These distinct game structures result in a Stackelberg game that accommodates different game types per layer, and/or supports heterogeneous strategic behaviors involving cooperation and non-cooperation simultaneously. Learning algorithms using the best-response dynamics are used to solve the game problems when considering a discrete strategic space for the leaders. The efficacy of the proposed approach is demonstrated through an application to the co-design of the Barcelona drinking water network.
♻ ☆ Operator Learning for Robust Stabilization of Linear Markov-Jumping Hyperbolic PDEs
In this paper, we address the problem of robust stabilization for linear hyperbolic Partial Differential Equations (PDEs) with Markov-jumping parameter uncertainty. We consider a 2 x 2 heterogeneous hyperbolic PDE and propose a control law using operator learning and the backstepping method. Specifically, the backstepping kernels used to construct the control law are approximated with neural operators (NO) in order to improve computational efficiency. The key challenge lies in deriving the stability conditions with respect to the Markov-jumping parameter uncertainty and NO approximation errors. The mean-square exponential stability of the stochastic system is achieved through Lyapunov analysis, indicating that the system can be stabilized if the random parameters are sufficiently close to the nominal parameters on average, and NO approximation errors are small enough. The theoretical results are applied to freeway traffic control under stochastic upstream demands and then validated through numerical simulations.
♻ ☆ Aggregative games with bilevel structures: Distributed algorithms and convergence analysis
In this paper, the problem of distributively seeking the equilibria of aggregative games with bilevel structures is studied. Different from the traditional aggregative games, here the aggregation is determined by the minimizer of a virtual leader's objective function in the inner level, which depends on the actions of the players in the outer level. Moreover, the global objective function of the virtual leader is formed by the sum of some local functions with two arguments, each of which is strongly convex with respect to the second argument. When making decisions, each player in the outer level only has access to a local part of the virtual leader's objective function. To handle this problem, first, we propose a second order gradient-based distributed algorithm, where the Hessian matrices associated with the objective functions of the leader are involved. By the algorithm, players update their actions while cooperatively minimizing the objective function of the virtual leader to estimate the aggregation by communicating with their neighbors via a connected graph. Under mild assumptions on the graph and cost functions, we prove that the actions of players asymptotically converge to the Nash equilibrium point. Then, for the case where the Hessian matrices associated with the objective functions of the virtual leader are not available, we propose a first order gradient-based distributed algorithm, where a distributed two-point estimate strategy is developed to estimate the gradients of players' cost functions in the outer level. Under the same conditions, we prove that the convergence errors of players' actions to the Nash equilibrium point are linear with respect to the estimate parameters. Finally, simulations are provided to demonstrate the effectiveness of our theoretical results.
Optimization and Control 34
☆ Robust and Flexible Microtransit Design: Chance-Constrained Dial-a-Ride Problem with Soft Time Windows
Microtransit offers a promising blend of rideshare flexibility and public transit efficiency. In practice, it faces unanticipated but spatially aligned requests, passengers seeking to join ongoing schedules, leading to underutilized capacity and degraded service if not properly managed. At the same time, it must accommodate diverse passenger needs, from routine errands to time-sensitive trips such as medical appointments. To meet these expectations, incorporating time flexibility is essential. However, existing models seldom consider both spontaneous and heterogeneous demand, limiting their real-world applicability. We propose a robust and flexible microtransit framework that integrates time flexibility and demand uncertainty via a Chance-Constrained Dial-A-Ride Problem with Soft Time Windows (CCDARP-STW). Demand uncertainty is captured through nonlinear chance constraints with controllable violation probabilities, while time flexibility is modeled with soft time windows and penalized cost. We develop a bounded-support relaxation using limited statistical information to linearize the chance constraints and solve the model using a tailored Branch-and-Cut-and-Price (BCP) algorithm with a probabilistic dominance rule. This rule improves computational efficiency, reducing explored labels by 17.40% and CPU time by 22.27% in robust cases. A case study based on real-world Chicago data shows our framework yields 11.55 minutes and 11.13 miles of savings versus conventional microtransit, and achieves the highest service reliability (96.46%) among robust models.
comment: 27 pages, 10 figures, arXiv:2402.01265 [math.OC] (v1, Feb 2 2024); Plan to submit to Journal
☆ Control and optimization for Neural Partial Differential Equations in Supervised Learning
Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the coefficients of the associated operators within such systems has not yet been thoroughly explored. In this work, we aim to initiate a line of research in control theory focused on optimizing and controlling the coefficients of these operators-a problem that naturally arises in the context of neural networks and supervised learning. In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: neural networks can be interpreted as partial differential equations (PDEs). From this viewpoint, the control problem traditionally studied in the context of ordinary differential equations (ODEs) is reformulated as a control problem for PDEs, specifically targeting the optimization and control of coefficients in parabolic and hyperbolic operators. To the best of our knowledge, this specific problem has not yet been systematically addressed in the control theory of PDEs. To this end, we propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers. Finally, we investigate the control problem associated with hyperbolic PDEs and prove the existence of solutions for a corresponding approximated control problem.
☆ A High-Dimensional Statistical Theory for Convex and Nonconvex Matrix Sensing
The problem of matrix sensing, or trace regression, is a problem wherein one wishes to estimate a low-rank matrix from linear measurements perturbed with noise. A number of existing works have studied both convex and nonconvex approaches to this problem, establishing minimax error rates when the number of measurements is sufficiently large relative to the rank and dimension of the low-rank matrix, though a precise comparison of these procedures still remains unexplored. In this work we provide a high-dimensional statistical analysis for symmetric low-rank matrix sensing observed under Gaussian measurements and noise. Our main result describes a novel phenomenon: in this statistical model and in an appropriate asymptotic regime, the behavior of any local minimum of the nonconvex factorized approach (with known rank) is approximately equivalent to that of the matrix hard-thresholding of a corresponding matrix denoising problem, and the behavior of the convex nuclear-norm regularized least squares approach is approximately equivalent to that of matrix soft-thresholding of the same matrix denoising problem. Here "approximately equivalent" is understood in the sense of concentration of Lipchitz functions. As a consequence, the nonconvex procedure uniformly dominates the convex approach in mean squared error. Our arguments are based on a matrix operator generalization of the Convex Gaussian Min-Max Theorem (CGMT) together with studying the interplay between local minima of the convex and nonconvex formulations and their "debiased" counterparts, and several of these results may be of independent interest.
☆ First-order methods for stochastic and finite-sum convex optimization with deterministic constraints
In this paper, we study a class of stochastic and finite-sum convex optimization problems with deterministic constraints. Existing methods typically aim to find an $\epsilon$-$expectedly\ feasible\ stochastic\ optimal$ solution, in which the expected constraint violation and expected optimality gap are both within a prescribed tolerance $\epsilon$. However, in many practical applications, constraints must be nearly satisfied with certainty, rendering such solutions potentially unsuitable due to the risk of substantial violations. To address this issue, we propose stochastic first-order methods for finding an $\epsilon$-$surely\ feasible\ stochastic\ optimal$ ($\epsilon$-SFSO) solution, where the constraint violation is deterministically bounded by $\epsilon$ and the expected optimality gap is at most $\epsilon$. Our methods apply an accelerated stochastic gradient (ASG) scheme or a modified variance-reduced ASG scheme $only\ once$ to a sequence of quadratic penalty subproblems with appropriately chosen penalty parameters. We establish first-order oracle complexity bounds for the proposed methods in computing an $\epsilon$-SFSO solution. As a byproduct, we also derive first-order oracle complexity results for sample average approximation method in computing an $\epsilon$-SFSO solution of the stochastic optimization problem using our proposed methods to solve the sample average problem.
comment: 41 pages
☆ On the convergence of critical points on real algebraic sets and applications to optimization
Let $F \in \R[X_1,\ldots,X_n]$ and the zero set $V=\zero(\mathcal{P},\R^n)$, where $\mathcal{P}:=\{P_1,\ldots,P_s\} \subset \R[X_1,\ldots,X_n]$ is a finite set of polynomials. We investigate existence of critical points of $F$ on an infinitesimal perturbation $V_{\xi} = \zero(\{P_1-\xi_1.\ldots,P_s-\xi_s\},\R^n)$. Our motivation is to understand the limiting behavior of central paths in polynomial optimization, which is an important problem in the theory of interior point methods for polynomial optimization. We establish different sets of conditions that ensure existence, finiteness, boundedness, and non-degeneracy of critical points of $F$ on $V_{\xi}$, respectively. These lead to new conditions for the existence, convergence, and smoothness of central paths of polynomial optimization and its extension to non-linear optimization problems involving definable sets and functions in an o-minimal structure. In particular, for nonlinear programs defined by real globally analytic functions, our extension provides a stronger form of the convergence result obtained by Drummond and Peterzil.
comment: 45 Pages, 4 figures
☆ A Zeroth-Order Extra-Gradient Method For Black-Box Constrained Optimization
Non-analytical objectives and constraints often arise in control systems, particularly in problems with complex dynamics, which are challenging yet lack efficient solution methods. In this work, we consider general constrained optimization problems involving black-box objectives and constraints. To solve it, we reformulate it as a min-max problem and propose a zeroth-order extra gradient (ZOEG) algorithm that combines the extra gradient method with a feedback-based stochastic zeroth-order gradient estimator. Then, we apply another coordinate gradient estimator to design the zeroth-order coordinate extra gradient algorithm (ZOCEG) to further improve efficiency. The theoretical analysis shows that ZOEG can achieve the best-known oracle complexity of $\mathcal{O}(d\epsilon^{-2})$ to get an $\epsilon$-optimal solution ($d$ is the dimension of decision space), and ZOCEG can improve it to $\mathcal{O}(d\epsilon^{-1})$. Furthermore, we develop a variant of ZOCEG, which applies block coordinate updates to enhance the efficiency of single-step gradient estimation. Finally, numerical experiments on a load tracking problem validate our theoretical results and the effectiveness of the proposed algorithms.
☆ Demonstration of effective UCB-based routing in skill-based queues on real-world data
This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.
☆ Conformal Rigidity and Spectral Embeddings of Graphs
We investigate the structure of conformally rigid graphs. Graphs are conformally rigid if introducing edge weights cannot increase (decrease) the second (last) eigenvalue of the Graph Laplacian. Edge-transitive graphs and distance-regular graphs are known to be conformally rigid. We establish new results using the connection between conformal rigidity and edge-isometric spectral embeddings of the graph. All $1$-walk regular graphs are conformally rigid, a consequence of a stronger property of their embeddings. Using symmetries of the graph, we establish two related characterizations of when a vertex-transitive graph is conformally rigid. This provides a necessary and sufficient condition for a Cayley graph on an abelian group to be conformally rigid. As an application we exhibit an infinite family of conformally rigid circulants. Our symmetry technique can be interpreted in the language of semidefinite programming which provides another criterion for conformal rigidity in terms of edge orbits. The paper also describes a number of explicit conformally rigid graphs whose conformal rigidity is not yet explained by the existing theory.
☆ Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery
Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.
☆ A Decomposition Method for Finite-Time Stabilization of Bilinear Systems with Applications to Parabolic and Hyperbolic Equations
In this work, we address the problem of finite-time stabilization for a class of bilinear system. We propose a decomposition-based approach in which the nominal system is split into two subsystems, one of which is inherently finite-time stable without control. This allows the stabilization analysis to focus solely on the remaining subsystem. To ensure the well-posedness of the closed-loop system, we establish sufficient conditions on the system and control operators. The stabilization results are then derived using a suitable Lyapunov function and an observation condition. The effectiveness of the proposed approach is demonstrated through examples involving both parabolic and hyperbolic infinite-dimensional systems.
☆ A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization
Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form expression of all critical points. Building on this, we establish precise conditions under which a critical point is a local minimizer, a global minimizer, a strict saddle point, or a non-strict saddle point. Leveraging these results, we derive a necessary and sufficient condition under which each critical point is either a local minimizer or a strict saddle point. This provides insights into why gradient-based methods almost always converge to a local minimizer of the regularized DMF problem. Finally, we conduct numerical experiments to visualize its loss landscape under different settings to support our theory.
comment: 35 pages, 3 figures
☆ CLARSTA: A random subspace trust-region algorithm for convex-constrained derivative-free optimization
This paper proposes a random subspace trust-region algorithm for general convex-constrained derivative-free optimization (DFO) problems. Similar to previous random subspace DFO methods, the convergence of our algorithm requires a certain accuracy of models and a certain quality of subspaces. For model accuracy, we define a new class of models that is only required to provide reasonable accuracy on the projection of the constraint set onto the subspace. We provide a new geometry measure to make these models easy to analyze, construct, and manage. For subspace quality, we use the concentration on the Grassmann manifold to provide a method to sample subspaces that preserve the first-order criticality measure by a certain percentage with a certain probability lower bound. Based on all these new theoretical results, we present an almost-sure global convergence and a worst-case complexity analysis of our algorithm. Numerical experiments on problems with dimensions up to 10000 demonstrate the reliable performance of our algorithm in high dimensions.
☆ On gradient descent-ascent flows in metric spaces
Gradient descent-ascent (GDA) flows play a central role in finding saddle points of bivariate functionals, with applications in optimization, game theory, and robust control. While they are well-understood in Hilbert and Banach spaces via maximal monotone operator theory, their extension to general metric spaces, particularly Wasserstein spaces, has remained largely unexplored. In this paper, we develop a mathematical theory of GDA flows on the product of two complete metric spaces, formulating them as solutions to a system of evolution variational inequalities (EVIs) driven by a proper, closed functional $\phi$. Under mild convex-concave and regularity assumptions on $\phi$, we prove the existence, uniqueness, and stability of the flows via a novel minimizing-maximizing movement scheme and a minimax theorem on metric spaces. We establish a $\lambda$-contraction property, derive a quantitative error estimate for the discrete scheme, and demonstrate regularization effects analogous to classical gradient flows. Moreover, we obtain an exponential decay bound for the Nikaid\^o--Isoda duality gap along the flow. Focusing on Wasserstein spaces over Hilbert spaces, we show the global existence in time and the exponential convergence of the Wasserstein GDA flow to the unique saddle point for strongly convex-concave functionals. Our framework unifies and extends existing analyses, offering a metric-geometric perspective on GDA dynamics in nonlinear and non-smooth settings.
comment: 34 pages, comments are wellcome!
☆ Speed-Aware Network Design: A Parametric Optimization Approach
Network design problems have been studied from the 1950s, as they can be used in a wide range of real-world applications, e.g., design of communication and transportation networks. In classical network design problems, the objective is to minimize the cost of routing the demand flow through a graph. In this paper, we introduce a generalized version of such a problem, where the objective is to tradeoff routing costs and delivery speed; we introduce the concept of speed-coverage, which is defined as the number of unique items that can be sent to destinations in less than 1-day. Speed-coverage is a function of both the network design and the inventory stored at origin nodes, e.g., an item can be delivered in 1-day if it is in-stock at an origin that can reach a destination within 24 hours. Modeling inventory is inherently complex, since inventory coverage is described by an integer function with a large number of points (exponential to the number of origin sites), each one to be evaluated using historical data. To bypass this complexity, we first leverage a parametric optimization approach, which converts the non-linear joint routing and speed-coverage optimization problem into an equivalent mixed-integer linear program. Then, we propose a sampling strategy to avoid evaluating all the points of the speed-coverage function. The proposed method is evaluated on a series of numerical tests with representative scenarios and network sizes. We show that when considering the routing costs and monetary gains resulting from speed-coverage, our approach outperforms the baseline by 8.36% on average.
☆ Fast entropy-regularized SDP relaxations for permutation synchronization
We introduce fast randomized algorithms for solving semidefinite programming (SDP) relaxations of the partial permutation synchronization (PPS) problem, a core task in multi-image matching with significant relevance to 3D reconstruction. Our methods build on recent advances in entropy-regularized semidefinite programming and are tailored to the unique structure of PPS, in which the unknowns are partial permutation matrices aligning sparse and noisy pairwise correspondences across images. We prove that entropy regularization resolves optimizer non-uniqueness in standard relaxations, and we develop a randomized solver with nearly optimal scaling in the number of observed correspondences. We also develop several rounding procedures for recovering combinatorial solutions from the implicitly represented primal solution variable, maintaining cycle consistency if desired without harming computational scaling. We demonstrate that our approach achieves state-of-the-art performance on synthetic and real-world datasets in terms of speed and accuracy. Our results highlight PPS as a paradigmatic setting in which entropy-regularized SDP admits both theoretical and practical advantages over traditional low-rank or spectral techniques.
☆ Instance Space Analysis for the Quadratic Assignment Problem
For any optimisation problem where diverse algorithmic approaches are available, the task of predicting algorithm performance and selecting the algorithm most likely to perform well on a given instance holds great practical interest. However, if our test instances do not adequately represent the potential problem space, we may be misled about the strengths and weaknesses of an algorithm. In this paper we consider algorithm prediction and selection for the Quadratic Assignment Problem (QAP). We identify and eliminate superficial differences between QAP instances which do not affect problem difficulty and propose several new features for quantifying the characteristics of a particular instance. We then apply Instance Space Analysis to compare the performance of evolutionary and ant colony-based algorithms. Using this analysis, we identify limitations of the existing instance libraries and generators which obscure the true performance profile of these algorithms. Finally, we propose new instance classes which expand the instance space and support new insights into the properties of the problem and algorithms.
comment: Number of pages: 46 (35 pages in main body, 11 pages in appendices); Number of figures: 15; For associated data, see https://matilda.unimelb.edu.au/matilda/problems/opt/qap#qap
♻ ☆ Next-token prediction capacity: general upper bounds and a lower bound for transformers
Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are equal up to a multiplicative constant. We prove these bounds in the general setting where next-token distributions can be arbitrary as well as the empirical setting where they are calculated from a finite number of document sequences. Our lower bounds are for one-layer multi-head decoder-only transformers and our proofs highlight an important injectivity property satisfied by self-attention. Furthermore, we provide numerical evidence that the minimal number of parameters for memorization is sufficient for being able to train the model to the entropy lower bound.
comment: V3: added two examples, a remark, and a second experiment where only the FNN layers are trained
♻ ☆ Universal subgradient and proximal bundle methods for convex and strongly convex hybrid composite optimization
This paper develops two parameter-free methods for solving convex and strongly convex hybrid composite optimization problems, namely, a composite subgradient type method and a proximal bundle type method. Functional complexity bounds for the two methods are established in terms of the unknown strong convexity parameter. The two proposed methods are universal with respect to all problem parameters, including the strong convexity one, and require no knowledge of the optimal value. Moreover, in contrast to previous works, they do not restart nor use multiple threads.
comment: 23 pages
♻ ☆ Greedy Learning to Optimize with Convergence Guarantees
Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown empirical advantages over classical optimization algorithms, memory restrictions often greatly limit the unroll length and learned algorithms usually do not provide convergence guarantees. In contrast, we introduce a novel method employing a greedy strategy that learns iteration-specific parameters by minimizing the function value at the next iteration. This enables training over significantly more iterations while maintaining constant device memory usage. We parameterize the update such that parameter learning is convex when the objective function is convex. In particular, we explore preconditioned gradient descent and an extension of Polyak's Heavy Ball Method with multiple parametrizations including a novel convolutional preconditioner. With our learned algorithms, convergence in the training set is proved even when the preconditioners are not necessarily symmetric nor positive definite. Convergence on a class of unseen functions is also obtained under certain assumptions, ensuring robust performance and generalization beyond the training data. We test our learned algorithms on two inverse problems, image deblurring and Computed Tomography, on which learned convolutional preconditioners demonstrate improved empirical performance over classical optimization algorithms such as Nesterov's Accelerated Gradient Method and the quasi-Newton method L-BFGS.
comment: 33 pages, 12 figures
♻ ☆ Dual certificates of primal cone membership
We discuss easily verifiable cone membership certificates, that is, certificates proving relations of the form $b\in K$ for convex cones $K$, that consist of vectors in the dual cone $K^*$. Vectors in the dual cone are usually associated with separating hyperplanes, and so they are interpreted as certificates of non-membership in the standard theory of duality. Complementing this, we present constructive certification schemes through which members of the dual cone can be interpreted as primal membership certificates. Every vector in the interior of $K$ is assigned a full-dimensional cone of certificates, making the numerical computation of rigorous certificates easy, provided that the dual cone has an efficiently computable logarithmically homogeneous self-concordant barrier. Of particular interest are cones that are low-dimensional linear images of much higher dimensional cones. In the context of optimization (as opposed to feasibility) problems, these certificates can be used to easily compute, using a closed-form formula, exact primal feasible solutions from suitable dual feasible solutions, with the guarantee that the closer the dual solutions are to optimality, the closer to optimality are the computed primal solutions, too. We demonstrate that the new certification scheme is applicable to virtually every tractable subcone of nonnegative polynomials commonly used in polynomial optimization (such as SOS, SONC, SAGE and SDSOS polynomials, among others), facilitating the computation of rigorous nonnegativity certificates using numerical algorithms.
comment: 24 pages; Clarified a notation and fixed minor typographical errors
♻ ☆ Approximations of Rockafellians, Lagrangians, and Dual Functions
Solutions of an optimization problem are sensitive to changes caused by approximations or parametric perturbations, especially in the nonconvex setting. This paper shows that solutions of substitute problems, constructed from Rockafellian functions, can be less sensitive to such changes. Unlike classical stability analysis focused on local changes around (local) minimizers, we employ epi-convergence to examine whether approximating or perturbed problems suitably approach an actual (unperturbed) problem globally. \redrevvv{We demonstrate that solutions derived from the Rockafellian-based substitute problems converge to solutions of the actual optimization problem under suitable conditions, providing a rigorous alternative to potentially unstable direct approximations.} We quantify the rates of convergence that often lead to Lipschitz-kind stability properties for the substitute problems.
♻ ☆ A Passivity Analysis for Nonlinear Consensus on Balanced Digraphs
This work deals with the output consensus problem for multiagent systems over balanced digraphs by passivity analysis. As the standard diffusive coupling structure only models the undirected interconnection, we propose a general approach capable of processing directed coupling and performing passivity analysis. To mitigate the complexity arising from the nonlinearity and directed interconnections, we reformulate the output consensus problem as a convergence analysis on a submanifold. We provide passivity analysis and establish a sufficient condition based on passivity for achieving output agreement in multi-agent systems over balanced digraphs. The results are supported by a numerical example.
comment: 9 pages, 2 figures, accepted by ECC 2025, and selected to EJC
♻ ☆ Vertex addition to a ball graph with application to reliability and area coverage in autonomous swarms
A unit ball graph consists of a set of vertices, labeled by points in Euclidean space, and edges joining all pairs of points within distance $1$. These geometric graphs are used to model a variety of spatial networks, including communication networks between agents in an autonomous swarm. In such an application, vertices and/or edges of the graph may not be perfectly reliable; an agent may experience failure or a communication link rendered inoperable. With the goal of designing robust swarm formations, or unit ball graphs with high reliability (probability of connectedness), in a preliminary conference paper we provided an algorithm with cubic time complexity to determine all possible changes to a unit ball graph by repositioning a single vertex. Using this algorithm and Monte Carlo simulations, one obtains an efficient method to modify a unit ball graph by moving a single vertex to a location which maximizes the reliability. Another important consideration in many swarm missions is area coverage, yet highly reliable ball graphs often contain clusters of vertices. Here, we generalize our previous algorithm to improve area coverage as well as reliability. Our algorithm determines a location to add or move a vertex within a unit ball graph which maximizes the reliability, under the constraint that no other vertices of the graph be within some fixed distance. We compare this method of obtaining graphs with high reliability and evenly distributed area coverage to another method which uses a modified Fruchterman-Reingold algorithm for ball graphs.
♻ ☆ Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls
In contextual optimization, a decision-maker leverages contextual information, often referred to as covariates, to better resolve uncertainty and make informed decisions. In this paper, we examine the challenges of contextual decision-making under covariate shift, a phenomenon where the distribution of covariates differs between the training and test environments. Such shifts can lead to inaccurate upstream estimations for test covariates that lie far from the training data, ultimately resulting in suboptimal downstream decisions. To tackle these challenges, we propose a novel approach called Intersection Wasserstein-balls DRO (IW-DRO), which integrates multiple estimation methods into the distributionally robust optimization (DRO) framework. At the core of our approach is an innovative ambiguity set defined as the intersection of two Wasserstein balls, with their centers constructed using appropriate nonparametric and parametric estimators. On the computational side, we reformulate the IW-DRO problem as a tractable convex program and develop an approximate algorithm tailored for large-scale problems to enhance computational efficiency. From a theoretical perspective, we demonstrate that IW-DRO achieves superior performance compared to single Wasserstein-ball DRO models. We further establish performance guarantees by analyzing the coverage of the intersection ambiguity set and the measure concentration of both estimators under the Wasserstein distance. Notably, we derive a finite-sample concentration result for the Nadaraya-Watson kernel estimator under covariate shift. The proposed IW-DRO framework offers practical value for decision-makers operating in uncertain environments affected by covariate shifts.
♻ ☆ Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization
We consider the minimization of a Lipschitz continuous and expectation-valued function defined as $f(\mathbf{x}) \triangleq \mathbb{E}[{\tilde f}(\mathbf{x}, \boldsymbol{\xi})]$, over a closed and convex set. Our focus lies on obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense) via zeroth-order schemes. We adopt a smoothing-based approach reliant on minimizing $f_{\eta}$ where $f_{\eta}(\mathbf{x}) = \mathbb{E}_{\mathbf{u}}[f(\mathbf{x}+\eta \mathbf{u})]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $\eta > 0$. It has been observed that a stationary point of the $\eta$-smoothed problem is a $2\eta$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two sets of schemes with promising empirical behavior. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework (VRG-ZO) and make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, allowing for making guarantees for $\eta$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ that ensures that the expected norm of the residual of the $\eta$-smoothed problem is within $\epsilon$ requires no greater than $O(\eta^{-1} \epsilon^{-2})$ projection steps and $ O\left(\eta^{-2} \epsilon^{-4}\right)$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on a combination of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are $ O\left(\eta^{-5}\epsilon^{-2}\right)$ and $ O\left(\eta^{-7}\epsilon^{-4}\right)$, respectively
♻ ☆ Discrete-Time Periodic Monotonicity Preserving Systems
Three nested classes of discrete-time linear time-invariant systems, which differ by the set of periodic signals that they leave invariant, are studied. The first class preserves the property of periodic monotonicity (period-wise unimodality). The second class is invariant to signals with at most two sign changes per period, and the third class results from the second by additionally requiring that periodic signals with zero sign-changes are mapped to the same kind. Tractable characterizations for each system class are derived by the use and extension of total positivity theory via geometric interpretations. Central to our results is the characterization of sequentially convex contours. Our characterizations also extend to the loop gain of Lur'e feedback systems as the considered signals sets are invariant under common static non-linearities, e.g., ideal relay, saturation, sigmoid function, quantizer, etc. In particular, our developments aim to form a base for signal-based fixed-point theorems towards the prediction of self-sustained oscillations. Our examples on relay feedback systems indicate how periodic monotonicity preservation give rise to useful insights towards this goal.
♻ ☆ Nonparametric Steady-State Learning for Nonlinear Output Feedback Regulation
This article addresses the nonadaptive and robust output regulation problem of the general nonlinear output feedback system with error output. The global robust output regulation problem for a class of general output feedback nonlinear systems with an uncertain exosystem and high relative degree can be tackled by constructing a linear generic internal model, provided that a continuous nonlinear mapping exists. Leveraging the proposed nonadaptive framework facilitates the conversion of the nonlinear robust output regulation problem into a robust nonadaptive stabilization formulation for the augmented system endowed with Input-to-State Stable dynamics. This approach removes the need for constructing a specific Lyapunov function with positive semi-definite derivatives and avoids the common assumption of linear parameterization of the nonlinear system. The nonadaptive approach is extended by incorporating the nonparametric learning framework to ensure the feasibility of the nonlinear mapping, which can be tackled using a data-driven method. Moreover, the introduced nonparametric learning framework allows the controlled system to learn the dynamics of the steady-state input behaviour from the signal generated from the internal model with the output error as the feedback. As a result, the nonadaptive/nonparametric approach can be advantageous to guarantee the convergence of the estimation and tracking error even when the underlying controlled system dynamics are complex or poorly understood. The effectiveness of the theoretical results is illustrated for a benchmark example: a controlled duffing system and two practical examples: a continuously stirred tank reactor and a continuous bioreactor.
comment: 16 pages, 18 figures
♻ ☆ Finite-time horizon, stopper vs. singular-controller games on the half-line
We prove existence of a value for two-player zero-sum stopper vs. singular-controller games on finite-time horizon, when the underlying dynamics is one-dimensional, diffusive and bound to evolve in $[0,\infty)$. We show that the value is the maximal solution of a variational inequality with both obstacle and gradient constraint and satisfying a Dirichlet boundary condition at $[0,T)\times\{0\}$. Moreover, we obtain an optimal strategy for the stopper. In order to achieve our goals, we rely on new probabilistic methods, yielding gradient bounds and equi-continuity for the solutions of penalised partial differential equations that approximate the variational inequality.
comment: 45 pages
♻ ☆ Joint Order Selection, Allocation, Batching and Picking for Large Scale Warehouses
Order picking is the single most cost-intensive activity in picker-to-parts warehouses, and as such has garnered large interest from the scientific community which led to multiple problem formulations and a plethora of algorithms published. Unfortunately, most of them are not applicable at the scale of really large warehouses like those operated by Zalando, a leading European online fashion retailer. Based on our experience in operating Zalando's batching system, we propose a novel batching problem formulation for mixed-shelves, large scale warehouses with zoning. It brings the selection of orders to be batched into the scope of the problem, making it more realistic while at the same time increasing the optimization potential. We present two baseline algorithms and compare them on a set of generated instances. Our results show that first, even a basic greedy algorithm requires significant runtime to solve real-world instances and second, including order selection in the studied problem shows large potential for improved solution quality.
♻ ☆ Nonlinear Online Optimization for Vehicle-Home-Grid Integration including Household Load Prediction and Battery Degradation
This paper investigates the economic impact of vehicle-home-grid integration, by proposing an online energy management algorithm that optimizes energy flows between an electric vehicle (EV), a household, and the electrical grid. The algorithm leverages vehicle-to-home (V2H) for self-consumption and vehicle-to-grid (V2G) for energy trading, adapting to real-time conditions through a hybrid long short-term memory (LSTM) neural network for accurate household load prediction, alongside a comprehensive nonlinear battery degradation model accounting for both cycle and calendar aging. Simulation results reveal significant economic advantages: compared to smart unidirectional charging, the proposed method yields an annual economic benefit of up to EUR 3046.81, despite a modest 1.96% increase in battery degradation. Even under unfavorable market conditions, where V2G energy selling generates no revenue, V2H alone ensures yearly savings of EUR 425.48. A systematic sensitivity analysis investigates how variations in battery capacity, household load, and price ratios affect economic outcomes, confirming the consistent benefits of bidirectional energy exchange. These findings highlight the potential of EVs as active energy nodes, enabling sustainable energy management and cost-effective battery usage in real-world conditions.
comment: Submitted to the 2025 IEEE Conference on Decision and Control (CDC)
♻ ☆ Efficient Saddle Point Escape in High Dimensions via Adaptive Perturbation and Subspace Descent
We investigate high-dimensional non-convex optimization, focusing on the algorithmic difficulties posed by saddle points and regions of flat curvature. We develop a unified framework that integrates stochastic perturbations, curvature-adaptive learning rates, and randomized subspace descent to improve escape efficiency and scalability. Our theoretical analysis shows that gradient flow almost surely avoids strict saddles, with escape likelihood increasing exponentially in the ambient dimension. For noise-perturbed gradient descent, we derive explicit escape-time bounds that depend on curvature and noise magnitude. Adaptive step sizes further reduce escape time by adjusting to local gradient variance. To improve scalability, we establish global convergence rates for randomized subspace descent using projections of logarithmic dimension that preserve descent direction with high probability. Numerical experiments on nonlinear and constrained objectives validate these results and demonstrate practical robustness in large-scale settings.
♻ ☆ Multi-Class Stackelberg Games for the Co-Design of Networked Systems
We investigate a co-design problem, encompassing simultaneous design of system infrastructure and control, through a game-theoretical framework. To this end, we propose the co-design problem as a two-layer hierarchical strategic interaction. At the upper layer, a leader (or multiple leaders) determines system design parameters, while at the lower layer, a follower (or multiple followers) optimizes the control strategy. To capture this hierarchy, we propose four novel classes of Stackelberg games that integrate diverse strategic behaviors, including combinations of cooperative and non-cooperative interactions across two different layers. Notably, the leaders' interactions are represented using a normal-form game, whereas the followers' interactions are modeled by different games (dynamic games in discrete time). These distinct game structures result in a Stackelberg game that accommodates different game types per layer, and/or supports heterogeneous strategic behaviors involving cooperation and non-cooperation simultaneously. Learning algorithms using the best-response dynamics are used to solve the game problems when considering a discrete strategic space for the leaders. The efficacy of the proposed approach is demonstrated through an application to the co-design of the Barcelona drinking water network.
♻ ☆ Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes
Active learning methods aim to improve sample complexity in machine learning. In this work, we investigate an active learning scheme via a novel gradient-free cutting-plane training method for ReLU networks of arbitrary depth and develop a convergence theory. We demonstrate, for the first time, that cutting-plane algorithms, traditionally used in linear models, can be extended to deep neural networks despite their nonconvexity and nonlinear decision boundaries. Moreover, this training method induces the first deep active learning scheme known to achieve convergence guarantees, revealing a geometric contraction rate of the feasible set. We exemplify the effectiveness of our proposed active learning method against popular deep active learning baselines via both synthetic data experiments and sentimental classification task on real datasets.
♻ ☆ A Parallelizable Quaternion Higher-Order Singular Value Decomposition with Applications
Higher-order singular value decomposition (HOSVD) is a celebrated tool for tensor data analysis. The sequential HOSVD was recently generalized to the quaternion domain, while a naive quaternion extension of the classical HOSVD% by De Lathauwer et al., which can be excecuted in parallel, incurs issues. To leverage the power of parallel computing, this work introduces a two-sided quaternion HOSVD (TS-QHOSVD) that can be parallelized on two processors. It is proved that TS-QHOSVD (i) preserves the HOSVD ordering property, (ii) inherits the orthogonality property at the first and the last modes, and (iii) satisfies the weak orthogonality at all modes. The truncated TS-QHOSVD is then developed, with its error bound being established. We apply the proposed model on color video denoising as well as scientific data compression arising from 3D Navier-Stokes equation and Lorentz system to demonstrate its efficacy.
comment: replace the previous version
Robotics 48
☆ Robust Robotic Exploration and Mapping Using Generative Occupancy Map Synthesis
We present a novel approach for enhancing robotic exploration by using generative occupancy mapping. We introduce SceneSense, a diffusion model designed and trained for predicting 3D occupancy maps given partial observations. Our proposed approach probabilistically fuses these predictions into a running occupancy map in real-time, resulting in significant improvements in map quality and traversability. We implement SceneSense onboard a quadruped robot and validate its performance with real-world experiments to demonstrate the effectiveness of the model. In these experiments, we show that occupancy maps enhanced with SceneSense predictions better represent our fully observed ground truth data (24.44% FID improvement around the robot and 75.59% improvement at range). We additionally show that integrating SceneSense-enhanced maps into our robotic exploration stack as a "drop-in" map improvement, utilizing an existing off-the-shelf planner, results in improvements in robustness and traversability time. Finally we show results of full exploration evaluations with our proposed system in two dissimilar environments and find that locally enhanced maps provide more consistent exploration results than maps constructed only from direct sensor measurements.
comment: arXiv admin note: text overlap with arXiv:2409.10681
☆ Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion
We propose a novel hierarchical reinforcement learning framework for quadruped locomotion over challenging terrain. Our approach incorporates a two-layer hierarchy in which a high-level policy (HLP) selects optimal goals for a low-level policy (LLP). The LLP is trained using an on-policy actor-critic RL algorithm and is given footstep placements as goals. We propose an HLP that does not require any additional training or environment samples and instead operates via an online optimization process over the learned value function of the LLP. We demonstrate the benefits of this framework by comparing it with an end-to-end reinforcement learning (RL) approach. We observe improvements in its ability to achieve higher rewards with fewer collisions across an array of different terrains, including terrains more difficult than any encountered during training.
☆ Robust Embodied Self-Identification of Morphology in Damaged Multi-Legged Robots
Multi-legged robots (MLRs) are vulnerable to leg damage during complex missions, which can impair their performance. This paper presents a self-modeling and damage identification algorithm that enables autonomous adaptation to partial or complete leg loss using only data from a low-cost IMU. A novel FFT-based filter is introduced to address time-inconsistent signals, improving damage detection by comparing body orientation between the robot and its model. The proposed method identifies damaged legs and updates the robot's model for integration into its control system. Experiments on uneven terrain validate its robustness and computational efficiency.
☆ Evolutionary Gait Reconfiguration in Damaged Legged Robots
Multi-legged robots deployed in complex missions are susceptible to physical damage in their legs, impairing task performance and potentially compromising mission success. This letter presents a rapid, training-free damage recovery algorithm for legged robots subject to partial or complete loss of functional legs. The proposed method first stabilizes locomotion by generating a new gait sequence and subsequently optimally reconfigures leg gaits via a developed differential evolution algorithm to maximize forward progression while minimizing body rotation and lateral drift. The algorithm successfully restores locomotion in a 24-degree-of-freedom hexapod within one hour, demonstrating both high efficiency and robustness to structural damage.
☆ Unified Vision-Language-Action Model
Vision-language-action models (VLAs) have garnered significant attention for their potential in advancing robotic manipulation. However, previous approaches predominantly rely on the general comprehension capabilities of vision-language models (VLMs) to generate action signals, often overlooking the rich temporal and causal structure embedded in visual observations. In this paper, we present UniVLA, a unified and native multimodal VLA model that autoregressively models vision, language, and action signals as discrete token sequences. This formulation enables flexible multimodal tasks learning, particularly from large-scale video data. By incorporating world modeling during post-training, UniVLA captures causal dynamics from videos, facilitating effective transfer to downstream policy learning--especially for long-horizon tasks. Our approach sets new state-of-the-art results across several widely used simulation benchmarks, including CALVIN, LIBERO, and Simplenv-Bridge, significantly surpassing previous methods. For example, UniVLA achieves 95.5% average success rate on LIBERO benchmark, surpassing pi0-FAST's 85.5%. We further demonstrate its broad applicability on real-world ALOHA manipulation and autonomous driving.
comment: technical report
☆ ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model
Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via Gaussian world model for single-arm settings, which ignores the interaction of multiple embodiments for dual-arm systems with significant performance drop. In this paper, we propose ManiGaussian++, an extension of ManiGaussian framework that improves multi-task bimanual manipulation by digesting multi-body scene dynamics through a hierarchical Gaussian world model. To be specific, we first generate task-oriented Gaussian Splatting from intermediate visual features, which aims to differentiate acting and stabilizing arms for multi-body spatiotemporal dynamics modeling. We then build a hierarchical Gaussian world model with the leader-follower architecture, where the multi-body spatiotemporal dynamics is mined for intermediate visual representation via future scene prediction. The leader predicts Gaussian Splatting deformation caused by motions of the stabilizing arm, through which the follower generates the physical consequences resulted from the movement of the acting arm. As a result, our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and achieves 60% success rate on average in 9 challenging real-world tasks. Our code is available at https://github.com/April-Yz/ManiGaussian_Bimanual.
☆ Look to Locate: Vision-Based Multisensory Navigation with 3-D Digital Maps for GNSS-Challenged Environments
In Global Navigation Satellite System (GNSS)-denied environments such as indoor parking structures or dense urban canyons, achieving accurate and robust vehicle positioning remains a significant challenge. This paper proposes a cost-effective, vision-based multi-sensor navigation system that integrates monocular depth estimation, semantic filtering, and visual map registration (VMR) with 3-D digital maps. Extensive testing in real-world indoor and outdoor driving scenarios demonstrates the effectiveness of the proposed system, achieving sub-meter accuracy of 92% indoors and more than 80% outdoors, with consistent horizontal positioning and heading average root mean-square errors of approximately 0.98 m and 1.25 {\deg}, respectively. Compared to the baselines examined, the proposed solution significantly reduced drift and improved robustness under various conditions, achieving positioning accuracy improvements of approximately 88% on average. This work highlights the potential of cost-effective monocular vision systems combined with 3D maps for scalable, GNSS-independent navigation in land vehicles.
☆ CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong generalization across manipulation tasks. However, they remain constrained by a single-frame observation paradigm and cannot fully benefit from the motion information offered by aggregated multi-frame historical observations, as the large vision-language backbone introduces substantial computational cost and inference latency. We propose CronusVLA, a unified framework that extends single-frame VLA models to the multi-frame paradigm through an efficient post-training stage. CronusVLA comprises three key components: (1) single-frame pretraining on large-scale embodied datasets with autoregressive action tokens prediction, which establishes an embodied vision-language foundation; (2) multi-frame encoding, adapting the prediction of vision-language backbones from discrete action tokens to motion features during post-training, and aggregating motion features from historical frames into a feature chunking; (3) cross-frame decoding, which maps the feature chunking to accurate actions via a shared decoder with cross-attention. By reducing redundant token computation and caching past motion features, CronusVLA achieves efficient inference. As an application of motion features, we further propose an action adaptation mechanism based on feature-action retrieval to improve model performance during finetuning. CronusVLA achieves state-of-the-art performance on SimplerEnv with 70.9% success rate, and 12.7% improvement over OpenVLA on LIBERO. Real-world Franka experiments also show the strong performance and robustness.
comment: 36 pages, 21 figures
☆ Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. However, accurately detecting human poses in fisheye images is challenging due to the curved distortions inherent to fisheye optics. While various methods for undistorting fisheye images have been proposed, their effectiveness and limitations for poses that cover a wide FOV has not been systematically evaluated in the context of absolute human pose estimation from monocular fisheye images. To address this gap, we evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy. We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose. The usage of advanced fisheye models like the double sphere model significantly enhances 3D human pose estimation accuracy. We propose a heuristic for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality. Additionally, we introduce and evaluate on our novel dataset FISHnCHIPS, which features 3D human skeleton annotations in fisheye images, including images from unconventional angles, such as extreme close-ups, ground-mounted cameras, and wide-FOV poses, available at: https://www.vision.rwth-aachen.de/fishnchips
comment: Presented at IEEE International Conference on Robotics and Automation 2025
☆ Estimating Spatially-Dependent GPS Errors Using a Swarm of Robots
External factors, including urban canyons and adversarial interference, can lead to Global Positioning System (GPS) inaccuracies that vary as a function of the position in the environment. This study addresses the challenge of estimating a static, spatially-varying error function using a team of robots. We introduce a State Bias Estimation Algorithm (SBE) whose purpose is to estimate the GPS biases. The central idea is to use sensed estimates of the range and bearing to the other robots in the team to estimate changes in bias across the environment. A set of drones moves in a 2D environment, each sampling data from GPS, range, and bearing sensors. The biases calculated by the SBE at estimated positions are used to train a Gaussian Process Regression (GPR) model. We use a Sparse Gaussian process-based Informative Path Planning (IPP) algorithm that identifies high-value regions of the environment for data collection. The swarm plans paths that maximize information gain in each iteration, further refining their understanding of the environment's positional bias landscape. We evaluated SBE and IPP in simulation and compared the IPP methodology to an open-loop strategy.
comment: 6 pages, 7 figures, 2025 IEEE 21st International Conference on Automation Science and Engineering
☆ UniTac-NV: A Unified Tactile Representation For Non-Vision-Based Tactile Sensors IROS
Generalizable algorithms for tactile sensing remain underexplored, primarily due to the diversity of sensor modalities. Recently, many methods for cross-sensor transfer between optical (vision-based) tactile sensors have been investigated, yet little work focus on non-optical tactile sensors. To address this gap, we propose an encoder-decoder architecture to unify tactile data across non-vision-based sensors. By leveraging sensor-specific encoders, the framework creates a latent space that is sensor-agnostic, enabling cross-sensor data transfer with low errors and direct use in downstream applications. We leverage this network to unify tactile data from two commercial tactile sensors: the Xela uSkin uSPa 46 and the Contactile PapillArray. Both were mounted on a UR5e robotic arm, performing force-controlled pressing sequences against distinct object shapes (circular, square, and hexagonal prisms) and two materials (rigid PLA and flexible TPU). Another more complex unseen object was also included to investigate the model's generalization capabilities. We show that alignment in latent space can be implicitly learned from joint autoencoder training with matching contacts collected via different sensors. We further demonstrate the practical utility of our approach through contact geometry estimation, where downstream models trained on one sensor's latent representation can be directly applied to another without retraining.
comment: 7 pages, 8 figures. Accepted version to appear in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
☆ A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
☆ Probabilistic modelling and safety assurance of an agriculture robot providing light-treatment
Continued adoption of agricultural robots postulates the farmer's trust in the reliability, robustness and safety of the new technology. This motivates our work on safety assurance of agricultural robots, particularly their ability to detect, track and avoid obstacles and humans. This paper considers a probabilistic modelling and risk analysis framework for use in the early development phases. Starting off with hazard identification and a risk assessment matrix, the behaviour of the mobile robot platform, sensor and perception system, and any humans present are captured using three state machines. An auto-generated probabilistic model is then solved and analysed using the probabilistic model checker PRISM. The result provides unique insight into fundamental development and engineering aspects by quantifying the effect of the risk mitigation actions and risk reduction associated with distinct design concepts. These include implications of adopting a higher performance and more expensive Object Detection System or opting for a more elaborate warning system to increase human awareness. Although this paper mainly focuses on the initial concept-development phase, the proposed safety assurance framework can also be used during implementation, and subsequent deployment and operation phases.
☆ Soft Robotic Delivery of Coiled Anchors for Cardiac Interventions
Trans-catheter cardiac intervention has become an increasingly available option for high-risk patients without the complications of open heart surgery. However, current catheterbased platforms suffer from a lack of dexterity, force application, and compliance required to perform complex intracardiac procedures. An exemplary task that would significantly ease minimally invasive intracardiac procedures is the implantation of anchor coils, which can be used to fix and implant various devices in the beating heart. We introduce a robotic platform capable of delivering anchor coils. We develop a kineto-statics model of the robotic platform and demonstrate low positional error. We leverage the passive compliance and high force output of the actuator in a multi-anchor delivery procedure against a motile in-vitro simulator with millimeter level accuracy.
comment: This work has been submitted to the IEEE for possible publication
☆ Robotics Under Construction: Challenges on Job Sites ICRA
As labor shortages and productivity stagnation increasingly challenge the construction industry, automation has become essential for sustainable infrastructure development. This paper presents an autonomous payload transportation system as an initial step toward fully unmanned construction sites. Our system, based on the CD110R-3 crawler carrier, integrates autonomous navigation, fleet management, and GNSS-based localization to facilitate material transport in construction site environments. While the current system does not yet incorporate dynamic environment adaptation algorithms, we have begun fundamental investigations into external-sensor based perception and mapping system. Preliminary results highlight the potential challenges, including navigation in evolving terrain, environmental perception under construction-specific conditions, and sensor placement optimization for improving autonomy and efficiency. Looking forward, we envision a construction ecosystem where collaborative autonomous agents dynamically adapt to site conditions, optimizing workflow and reducing human intervention. This paper provides foundational insights into the future of robotics-driven construction automation and identifies critical areas for further technological development.
comment: Workshop on Field Robotics, ICRA
☆ Adaptive Domain Modeling with Language Models: A Multi-Agent Approach to Task Planning
We introduce TAPAS (Task-based Adaptation and Planning using AgentS), a multi-agent framework that integrates Large Language Models (LLMs) with symbolic planning to solve complex tasks without the need for manually defined environment models. TAPAS employs specialized LLM-based agents that collaboratively generate and adapt domain models, initial states, and goal specifications as needed using structured tool-calling mechanisms. Through this tool-based interaction, downstream agents can request modifications from upstream agents, enabling adaptation to novel attributes and constraints without manual domain redefinition. A ReAct (Reason+Act)-style execution agent, coupled with natural language plan translation, bridges the gap between dynamically generated plans and real-world robot capabilities. TAPAS demonstrates strong performance in benchmark planning domains and in the VirtualHome simulated real-world environment.
☆ Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects
Robotic scene understanding increasingly relies on vision-language models (VLMs) to generate natural language descriptions of the environment. In this work, we present a comparative study of captioning strategies for tabletop scenes captured by a robotic arm equipped with an RGB camera. The robot collects images of objects from multiple viewpoints, and we evaluate several models that generate scene descriptions. We compare the performance of various captioning models, like BLIP and VLMs. Our experiments examine the trade-offs between single-view and multi-view captioning, and difference between recognising real-world and 3D printed objects. We quantitatively evaluate object identification accuracy, completeness, and naturalness of the generated captions. Results show that VLMs can be used in robotic settings where common objects need to be recognised, but fail to generalise to novel representations. Our findings provide practical insights into deploying foundation models for embodied agents in real-world settings.
☆ T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models NeurIPS 2025
Building a general robotic manipulation system capable of performing a wide variety of tasks in real-world settings is a challenging task. Vision-Language Models (VLMs) have demonstrated remarkable potential in robotic manipulation tasks, primarily due to the extensive world knowledge they gain from large-scale datasets. In this process, Spatial Representations (such as points representing object positions or vectors representing object orientations) act as a bridge between VLMs and real-world scene, effectively grounding the reasoning abilities of VLMs and applying them to specific task scenarios. However, existing VLM-based robotic approaches often adopt a fixed spatial representation extraction scheme for various tasks, resulting in insufficient representational capability or excessive extraction time. In this work, we introduce T-Rex, a Task-Adaptive Framework for Spatial Representation Extraction, which dynamically selects the most appropriate spatial representation extraction scheme for each entity based on specific task requirements. Our key insight is that task complexity determines the types and granularity of spatial representations, and Stronger representational capabilities are typically associated with Higher overall system operation costs. Through comprehensive experiments in real-world robotic environments, we show that our approach delivers significant advantages in spatial understanding, efficiency, and stability without additional training.
comment: submitted to NeurIPS 2025
Ground-Effect-Aware Modeling and Control for Multicopters
The ground effect on multicopters introduces several challenges, such as control errors caused by additional lift, oscillations that may occur during near-ground flight due to external torques, and the influence of ground airflow on models such as the rotor drag and the mixing matrix. This article collects and analyzes the dynamics data of near-ground multicopter flight through various methods, including force measurement platforms and real-world flights. For the first time, we summarize the mathematical model of the external torque of multicopters under ground effect. The influence of ground airflow on rotor drag and the mixing matrix is also verified through adequate experimentation and analysis. Through simplification and derivation, the differential flatness of the multicopter's dynamic model under ground effect is confirmed. To mitigate the influence of these disturbance models on control, we propose a control method that combines dynamic inverse and disturbance models, ensuring consistent control effectiveness at both high and low altitudes. In this method, the additional thrust and variations in rotor drag under ground effect are both considered and compensated through feedforward models. The leveling torque of ground effect can be equivalently represented as variations in the center of gravity and the moment of inertia. In this way, the leveling torque does not explicitly appear in the dynamic model. The final experimental results show that the method proposed in this paper reduces the control error (RMSE) by \textbf{45.3\%}. Please check the supplementary material at: https://github.com/ZJU-FAST-Lab/Ground-effect-controller.
☆ Is an object-centric representation beneficial for robotic manipulation ?
Object-centric representation (OCR) has recently become a subject of interest in the computer vision community for learning a structured representation of images and videos. It has been several times presented as a potential way to improve data-efficiency and generalization capabilities to learn an agent on downstream tasks. However, most existing work only evaluates such models on scene decomposition, without any notion of reasoning over the learned representation. Robotic manipulation tasks generally involve multi-object environments with potential inter-object interaction. We thus argue that they are a very interesting playground to really evaluate the potential of existing object-centric work. To do so, we create several robotic manipulation tasks in simulated environments involving multiple objects (several distractors, the robot, etc.) and a high-level of randomization (object positions, colors, shapes, background, initial positions, etc.). We then evaluate one classical object-centric method across several generalization scenarios and compare its results against several state-of-the-art hollistic representations. Our results exhibit that existing methods are prone to failure in difficult scenarios involving complex scene structures, whereas object-centric methods help overcome these challenges.
☆ A Survey on Soft Robot Adaptability: Implementations, Applications, and Prospects
Soft robots, compared to rigid robots, possess inherent advantages, including higher degrees of freedom, compliance, and enhanced safety, which have contributed to their increasing application across various fields. Among these benefits, adaptability is particularly noteworthy. In this paper, adaptability in soft robots is categorized into external and internal adaptability. External adaptability refers to the robot's ability to adjust, either passively or actively, to variations in environments, object properties, geometries, and task dynamics. Internal adaptability refers to the robot's ability to cope with internal variations, such as manufacturing tolerances or material aging, and to generalize control strategies across different robots. As the field of soft robotics continues to evolve, the significance of adaptability has become increasingly pronounced. In this review, we summarize various approaches to enhancing the adaptability of soft robots, including design, sensing, and control strategies. Additionally, we assess the impact of adaptability on applications such as surgery, wearable devices, locomotion, and manipulation. We also discuss the limitations of soft robotics adaptability and prospective directions for future research. By analyzing adaptability through the lenses of implementation, application, and challenges, this paper aims to provide a comprehensive understanding of this essential characteristic in soft robotics and its implications for diverse applications.
comment: 12 pages, 4 figures, accepted by IEEE Robotics & Automation Magazine
☆ Zero-Shot Parameter Learning of Robot Dynamics Using Bayesian Statistics and Prior Knowledge
Inertial parameter identification of industrial robots is an established process, but standard methods using Least Squares or Machine Learning do not consider prior information about the robot and require extensive measurements. Inspired by Bayesian statistics, this paper presents an identification method with improved generalization that incorporates prior knowledge and is able to learn with only a few or without additional measurements (Zero-Shot Learning). Furthermore, our method is able to correctly learn not only the inertial but also the mechanical and base parameters of the MABI Max 100 robot while ensuring physical feasibility and specifying the confidence intervals of the results. We also provide different types of priors for serial robots with 6 degrees of freedom, where datasheets or CAD models are not available.
comment: Carsten Reiners and Minh Trinh contributed equally to this work
☆ Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference
Inferring physical properties can significantly enhance robotic manipulation by enabling robots to handle objects safely and efficiently through adaptive grasping strategies. Previous approaches have typically relied on either tactile or visual data, limiting their ability to fully capture properties. We introduce a novel cross-modal perception framework that integrates visual observations with tactile representations within a multimodal vision-language model. Our physical reasoning framework, which employs a hierarchical feature alignment mechanism and a refined prompting strategy, enables our model to make property-specific predictions that strongly correlate with ground-truth measurements. Evaluated on 35 diverse objects, our approach outperforms existing baselines and demonstrates strong zero-shot generalization. Keywords: tactile perception, visual-tactile fusion, physical property inference, multimodal integration, robot perception
comment: This paper has been accepted by the 2025 International Conference on Climbing and Walking Robots (CLAWAR). These authors contributed equally to this work: Zexiang Guo, Hengxiang Chen, Xinheng Mai
☆ Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding
Automated waterway environment perception is crucial for enabling unmanned surface vessels (USVs) to understand their surroundings and make informed decisions. Most existing waterway perception models primarily focus on instance-level object perception paradigms (e.g., detection, segmentation). However, due to the complexity of waterway environments, current perception datasets and models fail to achieve global semantic understanding of waterways, limiting large-scale monitoring and structured log generation. With the advancement of vision-language models (VLMs), we leverage image captioning to introduce WaterCaption, the first captioning dataset specifically designed for waterway environments. WaterCaption focuses on fine-grained, multi-region long-text descriptions, providing a new research direction for visual geo-understanding and spatial scene cognition. Exactly, it includes 20.2k image-text pair data with 1.8 million vocabulary size. Additionally, we propose Da Yu, an edge-deployable multi-modal large language model for USVs, where we propose a novel vision-to-language projector called Nano Transformer Adaptor (NTA). NTA effectively balances computational efficiency with the capacity for both global and fine-grained local modeling of visual features, thereby significantly enhancing the model's ability to generate long-form textual outputs. Da Yu achieves an optimal balance between performance and efficiency, surpassing state-of-the-art models on WaterCaption and several other captioning benchmarks.
comment: 14 pages, 13 figures
☆ AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration
While multi-vehicular collaborative driving demonstrates clear advantages over single-vehicle autonomy, traditional infrastructure-based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X-Perception, a large-scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative or complement to fixed Road-Side Units (RSUs). Drones offer unique advantages over ground-based perception: complementary bird's-eye-views that reduce occlusions, dynamic positioning capabilities that enable hovering, patrolling, and escorting navigation rules, and significantly lower deployment costs compared to fixed infrastructure. Our dataset comprises 6.73 hours of drone-assisted driving scenarios across urban, suburban, and rural environments with varied weather and lighting conditions. The AirV2X-Perception dataset facilitates the development and standardized evaluation of Vehicle-to-Drone (V2D) algorithms, addressing a critical gap in the rapidly expanding field of aerial-assisted autonomous driving systems. The dataset and development kits are open-sourced at https://github.com/taco-group/AirV2X-Perception.
☆ Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control
The advancement of autonomous robotic systems has led to impressive capabilities in perception, localization, mapping, and control. Yet, a fundamental gap remains: existing frameworks excel at geometric reasoning and dynamic stability but fall short in representing and preserving relational semantics, contextual reasoning, and cognitive transparency essential for collaboration in dynamic, human-centric environments. This paper introduces a unified architecture comprising the Ontology Neural Network (ONN) and the Ontological Real-Time Semantic Fabric (ORTSF) to address this gap. The ONN formalizes relational semantic reasoning as a dynamic topological process. By embedding Forman-Ricci curvature, persistent homology, and semantic tensor structures within a unified loss formulation, ONN ensures that relational integrity and topological coherence are preserved as scenes evolve over time. The ORTSF transforms reasoning traces into actionable control commands while compensating for system delays. It integrates predictive and delay-aware operators that ensure phase margin preservation and continuity of control signals, even under significant latency conditions. Empirical studies demonstrate the ONN + ORTSF framework's ability to unify semantic cognition and robust control, providing a mathematically principled and practically viable solution for cognitive robotics.
comment: 12 pages, 5 figures, includes theoretical proofs and simulation results
Scaffolding Dexterous Manipulation with Vision-Language Models
Dexterous robotic hands are essential for performing complex manipulation tasks, yet remain difficult to train due to the challenges of demonstration collection and high-dimensional control. While reinforcement learning (RL) can alleviate the data bottleneck by generating experience in simulation, it typically relies on carefully designed, task-specific reward functions, which hinder scalability and generalization. Thus, contemporary works in dexterous manipulation have often bootstrapped from reference trajectories. These trajectories specify target hand poses that guide the exploration of RL policies and object poses that enable dense, task-agnostic rewards. However, sourcing suitable trajectories - particularly for dexterous hands - remains a significant challenge. Yet, the precise details in explicit reference trajectories are often unnecessary, as RL ultimately refines the motion. Our key insight is that modern vision-language models (VLMs) already encode the commonsense spatial and semantic knowledge needed to specify tasks and guide exploration effectively. Given a task description (e.g., "open the cabinet") and a visual scene, our method uses an off-the-shelf VLM to first identify task-relevant keypoints (e.g., handles, buttons) and then synthesize 3D trajectories for hand motion and object motion. Subsequently, we train a low-level residual RL policy in simulation to track these coarse trajectories or "scaffolds" with high fidelity. Across a number of simulated tasks involving articulated objects and semantic understanding, we demonstrate that our method is able to learn robust dexterous manipulation policies. Moreover, we showcase that our method transfers to real-world robotic hands without any human demonstrations or handcrafted rewards.
☆ Preserving Sense of Agency: User Preferences for Robot Autonomy and User Control across Household Tasks
Roboticists often design with the assumption that assistive robots should be fully autonomous. However, it remains unclear whether users prefer highly autonomous robots, as prior work in assistive robotics suggests otherwise. High robot autonomy can reduce the user's sense of agency, which represents feeling in control of one's environment. How much control do users, in fact, want over the actions of robots used for in-home assistance? We investigate how robot autonomy levels affect users' sense of agency and the autonomy level they prefer in contexts with varying risks. Our study asked participants to rate their sense of agency as robot users across four distinct autonomy levels and ranked their robot preferences with respect to various household tasks. Our findings revealed that participants' sense of agency was primarily influenced by two factors: (1) whether the robot acts autonomously, and (2) whether a third party is involved in the robot's programming or operation. Notably, an end-user programmed robot highly preserved users' sense of agency, even though it acts autonomously. However, in high-risk settings, e.g., preparing a snack for a child with allergies, they preferred robots that prioritized their control significantly more. Additional contextual factors, such as trust in a third party operator, also shaped their preferences.
comment: Accepted by the 2025 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
☆ The MOTIF Hand: A Robotic Hand for Multimodal Observations with Thermal, Inertial, and Force Sensors
Advancing dexterous manipulation with multi-fingered robotic hands requires rich sensory capabilities, while existing designs lack onboard thermal and torque sensing. In this work, we propose the MOTIF hand, a novel multimodal and versatile robotic hand that extends the LEAP hand by integrating: (i) dense tactile information across the fingers, (ii) a depth sensor, (iii) a thermal camera, (iv), IMU sensors, and (v) a visual sensor. The MOTIF hand is designed to be relatively low-cost (under 4000 USD) and easily reproducible. We validate our hand design through experiments that leverage its multimodal sensing for two representative tasks. First, we integrate thermal sensing into 3D reconstruction to guide temperature-aware, safe grasping. Second, we show how our hand can distinguish objects with identical appearance but different masses - a capability beyond methods that use vision only.
♻ ☆ COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty
Manipulation tasks require robots to reason about cause and effect when interacting with objects. Yet, many data-driven approaches lack causal semantics and thus only consider correlations. We introduce COBRA-PPM, a novel causal Bayesian reasoning architecture that combines causal Bayesian networks and probabilistic programming to perform interventional inference for robot manipulation under uncertainty. We demonstrate its capabilities through high-fidelity Gazebo-based experiments on an exemplar block stacking task, where it predicts manipulation outcomes with high accuracy (Pred Acc: 88.6%) and performs greedy next-best action selection with a 94.2% task success rate. We further demonstrate sim2real transfer on a domestic robot, showing effectiveness in handling real-world uncertainty from sensor noise and stochastic actions. Our generalised and extensible framework supports a wide range of manipulation scenarios and lays a foundation for future work at the intersection of robotics and causality.
comment: 8 pages, 7 figures, accepted to the 2025 IEEE European Conference on Mobile Robots (ECMR 2025)
♻ ☆ Toward Teach and Repeat Across Seasonal Deep Snow Accumulation
Teach and repeat is a rapid way to achieve autonomy in challenging terrain and off-road environments. A human operator pilots the vehicles to create a network of paths that are mapped and associated with odometry. Immediately after teaching, the system can drive autonomously within its tracks. This precision lets operators remain confident that the robot will follow a traversable route. However, this operational paradigm has rarely been explored in off-road environments that change significantly through seasonal variation. This paper presents preliminary field trials using lidar and radar implementations of teach and repeat. Using a subset of the data from the upcoming FoMo dataset, we attempted to repeat routes that were 4 days, 44 days, and 113 days old. Lidar teach and repeat demonstrated a stronger ability to localize when the ground points were removed. FMCW radar was often able to localize on older maps, but only with small deviations from the taught path. Additionally, we highlight specific cases where radar localization failed with recent maps due to the high pitch or roll of the vehicle. We highlight lessons learned during the field deployment and highlight areas to improve to achieve reliable teach and repeat with seasonal changes in the environment. Please follow the dataset at https://norlab-ulaval.github.io/FoMo-website for updates and information on the data release.
♻ ☆ Energy-Efficient Motion Planner for Legged Robots IROS 2025
We propose an online motion planner for legged robot locomotion with the primary objective of achieving energy efficiency. The conceptual idea is to leverage a placement set of footstep positions based on the robot's body position to determine when and how to execute steps. In particular, the proposed planner uses virtual placement sets beneath the hip joints of the legs and executes a step when the foot is outside of such placement set. Furthermore, we propose a parameter design framework that considers both energy-efficiency and robustness measures to optimize the gait by changing the shape of the placement set along with other parameters, such as step height and swing time, as a function of walking speed. We show that the planner produces trajectories that have a low Cost of Transport (CoT) and high robustness measure, and evaluate our approach against model-free Reinforcement Learning (RL) and motion imitation using biological dog motion priors as the reference. Overall, within low to medium velocity range, we show a 50.4% improvement in CoT and improved robustness over model-free RL, our best performing baseline. Finally, we show ability to handle slippery surfaces, gait transitions, and disturbances in simulation and hardware with the Unitree A1 robot.
comment: This paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 8 figures
♻ ☆ Robustness Assessment of Assemblies in Frictional Contact
This work establishes a solution to the problem of assessing the capacity of multi-object assemblies to withstand external forces without becoming unstable. Our physically-grounded approach handles arbitrary structures made from rigid objects of any shape and mass distribution without relying on heuristics or approximations. The result is a method that provides a foundation for autonomous robot decision-making when interacting with objects in frictional contact. Our strategy relies on a contact interface graph representation to reason about instabilities and makes use of object shape information to decouple sub-problems and improve efficiency. Our algorithm can be used by motion planners to produce safe assembly transportation plans, and by object placement planners to select better poses. Compared to prior work, our approach is more generally applicable than commonly used heuristics and more efficient than dynamics simulations.
comment: Submitted to IEEE Transactions on Automation Science and Engineering. Contains 14 pages, 16 figures, and 3 tables
♻ ☆ FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction
We propose end-to-end differentiable model that predicts robot trajectories on rough offroad terrain from camera images and/or lidar point clouds. The model integrates a learnable component that predicts robot-terrain interaction forces with a neural-symbolic layer that enforces the laws of classical mechanics and consequently improves generalization on out-of-distribution data. The neural-symbolic layer includes a differentiable physics engine that computes the robot's trajectory by querying these forces at the points of contact with the terrain. As the proposed architecture comprises substantial geometrical and physics priors, the resulting model can also be seen as a learnable physics engine conditioned on real sensor data that delivers $10^4$ trajectories per second. We argue and empirically demonstrate that this architecture reduces the sim-to-real gap and mitigates out-of-distribution sensitivity. The differentiability, in conjunction with the rapid simulation speed, makes the model well-suited for various applications including model predictive control, trajectory shooting, supervised and reinforcement learning, or SLAM.
comment: Code: https://github.com/ctu-vras/fusionforce
♻ ☆ ros2 fanuc interface: Design and Evaluation of a Fanuc CRX Hardware Interface in ROS2
This paper introduces the ROS2 control and the Hardware Interface (HW) integration for the Fanuc CRX- robot family. It explains basic implementation details and communication protocols, and its integration with the Moveit2 motion planning library. We conducted a series of experiments to evaluate relevant performances in the robotics field. We tested the developed ros2_fanuc_interface for four relevant robotics cases: step response, trajectory tracking, collision avoidance integrated with Moveit2, and dynamic velocity scaling, respectively. Results show that, despite a non-negligible delay between command and feedback, the robot can track the defined path with negligible errors (if it complies with joint velocity limits), ensuring collision avoidance. Full code is open source and available at https://github.com/paolofrance/ros2_fanuc_interface.
♻ ☆ DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models ICRA
An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in capturing the complex, multimodal nature of real-world dynamics. This work introduces DroneDiffusion, a novel framework that leverages conditional diffusion models to learn quadrotor dynamics, formulated as a sequence generation task. DroneDiffusion achieves superior generalization to unseen, complex scenarios by capturing the temporal nature of uncertainties and mitigating error propagation. We integrate the learned dynamics with an adaptive controller for trajectory tracking with stability guarantees. Extensive experiments in both simulation and real-world flights demonstrate the robustness of the framework across a range of scenarios, including unfamiliar flight paths and varying payloads, velocities, and wind disturbances.
comment: Accepted to the International Conference on Robotics and Automation (ICRA) 2025
SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM IROS 2025
We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment. By leveraging multi-frame semantic associations, this strategy enables joint optimization of 3D Gaussian representation and camera poses, resulting in low-drift tracking and accurate semantic mapping. Our SemGauss-SLAM demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.
comment: IROS 2025
♻ ☆ Fully distributed and resilient source seeking for robot swarms
We propose a self-contained, resilient and fully distributed solution for locating the maximum of an unknown scalar field using a swarm of robots that travel at a constant speed. Unlike conventional reactive methods relying on gradient information, our methodology enables the swarm to determine an ascending direction so that it approaches the source with an arbitrary precision. Our source-seeking solution consists of three distributed algorithms running simultaneously in a slow-fast closed-loop system. The fastest algorithm provides the centroid-relative coordinates of the robots and the next slower one provides the ascending direction to be tracked. The tracking of the ascending direction by single integrators is instantaneous; howeverin this paper we will also focus on 2D unicycle-like robots with a constant speed. The third algorithm, the slowest one since the speed of the robots can be chosen arbitrarily slow, is the individual control law for the unicycle to track the estimated ascending direction.We will show that the three distributed algorithms converge exponentially fast to their objectives, allowing for a feasible slow-fast closed-loop system. The robots are not constrained to any particular geometric formation, and we study both discrete and continuous distributions of robots.The swarm shape analysis reveals the resiliency of our approach as expected in robot swarms, i.e., by amassing robots we ensure the source-seeking functionality in the event of missing or misplaced individuals or even if the robot network splits in two or more disconnected subnetworks.We exploit such an analysis so that the swarm can adapt to unknown environments by morphing its shape and maneuvering while still following an ascending direction. We analyze our solution with robots as kinematic points in n-dimensional Euclidean spaces and extend the analysis to 2D unicycle-like robots with constant speeds.
comment: 16 pages, submitted version to T-TAC. Jesus Bautista and Antonio Acuaviva contributed equally to this work. arXiv admin note: text overlap with arXiv:2309.02937
♻ ☆ AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making NeurIPS 2025
Vision-Language Models (VLMs) encode knowledge and reasoning capabilities for robotic manipulation within high-dimensional representation spaces. However, current approaches often project them into compressed intermediate representations, discarding important task-specific information such as fine-grained spatial or semantic details. To address this, we propose AntiGrounding, a new framework that reverses the instruction grounding process. It lifts candidate actions directly into the VLM representation space, renders trajectories from multiple views, and uses structured visual question answering for instruction-based decision making. This enables zero-shot synthesis of optimal closed-loop robot trajectories for new tasks. We also propose an offline policy refinement module that leverages past experience to enhance long-term performance. Experiments in both simulation and real-world environments show that our method outperforms baselines across diverse robotic manipulation tasks.
comment: submitted to NeurIPS 2025
ContactDexNet: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping
The deep learning models has significantly advanced dexterous manipulation techniques for multi-fingered hand grasping. However, the contact information-guided grasping in cluttered environments remains largely underexplored. To address this gap, we have developed a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map. We introduce a contact semantic conditional variational autoencoder network (CoSe-CVAE) for creating comprehensive contact semantic map from object point cloud. We utilize grasp detection method to estimate hand grasp poses from the contact semantic map. Finally, an unified grasp evaluation model PointNetGPD++ is designed to assess grasp quality and collision probability, substantially improving the reliability of identifying optimal grasps in cluttered scenarios. Our grasp generation method has demonstrated remarkable success, outperforming state-of-the-art methods by at least 4.65% with 81.0% average grasping success rate in real-world single-object environment and 75.3% grasping success rate in cluttered scenes. We also proposed the multi-modal multi-fingered grasping dataset generation method. Our multi-fingered hand grasping dataset outperforms previous datasets in scene diversity, modality diversity. The dataset, code and supplementary materials can be found at https://sites.google.com/view/contact-dexnet.
comment: 8 pages
♻ ☆ Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation
Navigating in environments alongside humans requires agents to reason under uncertainty and account for the beliefs and intentions of those around them. Under a sequential decision-making framework, egocentric navigation can naturally be represented as a Markov Decision Process (MDP). However, social navigation additionally requires reasoning about the hidden beliefs of others, inherently leading to a Partially Observable Markov Decision Process (POMDP), where agents lack direct access to others' mental states. Inspired by Theory of Mind and Epistemic Planning, we propose (1) a neuro-symbolic model-based reinforcement learning architecture for social navigation, addressing the challenge of belief tracking in partially observable environments; and (2) a perspective-shift operator for belief estimation, leveraging recent work on Influence-based Abstractions (IBA) in structured multi-agent settings.
comment: Accepted as a regular paper at the 2025 IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). \c{opyright} 2025 IEEE. The final version will appear in IEEE Xplore (DOI TBD)
♻ ☆ Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
♻ ☆ Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams
The human-robot interaction (HRI) field has recognized the importance of enabling robots to interact with teams. Human teams rely on effective communication for successful collaboration in time-sensitive environments. Robots can play a role in enhancing team coordination through real-time assistance. Despite significant progress in human-robot teaming research, there remains an essential gap in how robots can effectively communicate with action teams using multimodal interaction cues in time-sensitive environments. This study addresses this knowledge gap in an experimental in-lab study to investigate how multimodal robot communication in action teams affects workload and human perception of robots. We explore team collaboration in a medical training scenario where a robotic crash cart (RCC) provides verbal and non-verbal cues to help users remember to perform iterative tasks and search for supplies. Our findings show that verbal cues for object search tasks and visual cues for task reminders reduce team workload and increase perceived ease of use and perceived usefulness more effectively than a robot with no feedback. Our work contributes to multimodal interaction research in the HRI field, highlighting the need for more human-robot teaming research to understand best practices for integrating collaborative robots in time-sensitive environments such as in hospitals, search and rescue, and manufacturing applications.
comment: This is the author's original submitted version of the paper accepted to the 2025 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). \c{opyright} 2025 IEEE. Personal use of this material is permitted. For any other use, please contact IEEE
♻ ☆ Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication
Healthcare workers (HCWs) encounter challenges in hospitals, such as retrieving medical supplies quickly from crash carts, which could potentially result in medical errors and delays in patient care. Robotic crash carts (RCCs) have shown promise in assisting healthcare teams during medical tasks through guided object searches and task reminders. Limited exploration has been done to determine what communication modalities are most effective and least disruptive to patient care in real-world settings. To address this gap, we conducted a between-subjects experiment comparing the RCC's verbal and non-verbal communication of object search with a standard crash cart in resuscitation scenarios to understand the impact of robot communication on workload and attitudes toward using robots in the workplace. Our findings indicate that verbal communication significantly reduced mental demand and effort compared to visual cues and with a traditional crash cart. Although frustration levels were slightly higher during collaborations with the robot compared to a traditional cart, these research insights provide valuable implications for human-robot teamwork in high-stakes environments.
comment: This is the author's original submitted version of the paper accepted to the 2025 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). \c{opyright} 2025 IEEE. Personal use of this material is permitted. For any other use, please contact IEEE
♻ ☆ TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning
Developing scalable and generalizable reward engineering for reinforcement learning (RL) is crucial for creating general-purpose agents, especially in the challenging domain of robotic manipulation. While recent advances in reward engineering with Vision-Language Models (VLMs) have shown promise, their sparse reward nature significantly limits sample efficiency. This paper introduces TeViR, a novel method that leverages a pre-trained text-to-video diffusion model to generate dense rewards by comparing the predicted image sequence with current observations. Experimental results across 11 complex robotic tasks demonstrate that TeViR outperforms traditional methods leveraging sparse rewards and other state-of-the-art (SOTA) methods, achieving better sample efficiency and performance without ground truth environmental rewards. TeViR's ability to efficiently guide agents in complex environments highlights its potential to advance reinforcement learning applications in robotic manipulation.
♻ ☆ DynNPC: Finding More Violations Induced by ADS in Simulation Testing through Dynamic NPC Behavior Generation
Recently, a number of simulation testing approaches have been proposed to generate diverse driving scenarios for autonomous driving systems (ADSs) testing. However, the behaviors of NPC vehicles in these scenarios generated by previous approaches are predefined and mutated before simulation execution, ignoring traffic signals and the behaviors of the Ego vehicle. Thus, a large number of the violations they found are induced by unrealistic behaviors of NPC vehicles, revealing no bugs of ADSs. Besides, the vast scenario search space of NPC behaviors during the iterative mutations limits the efficiency of previous approaches. To address these limitations, we propose a novel scenario-based testing framework, DynNPC, to generate more violation scenarios induced by the ADS. Specifically, DynNPC allows NPC vehicles to dynamically generate behaviors using different driving strategies during simulation execution based on traffic signals and the real-time behavior of the Ego vehicle. We compare DynNPC with five state-of-the-art scenario-based testing approaches. Our evaluation has demonstrated the effectiveness and efficiency of DynNPC in finding more violation scenarios induced by the ADS.
♻ ☆ Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation IROS 2025
3D point cloud semantic segmentation (PCSS) is a cornerstone for environmental perception in robotic systems and autonomous driving, enabling precise scene understanding through point-wise classification. While unsupervised domain adaptation (UDA) mitigates label scarcity in PCSS, existing methods critically overlook the inherent vulnerability to real-world perturbations (e.g., snow, fog, rain) and adversarial distortions. This work first identifies two intrinsic limitations that undermine current PCSS-UDA robustness: (a) unsupervised features overlap from unaligned boundaries in shared-class regions and (b) feature structure erosion caused by domain-invariant learning that suppresses target-specific patterns. To address the proposed problems, we propose a tripartite framework consisting of: 1) a robustness evaluation model quantifying resilience against adversarial attack/corruption types through robustness metrics; 2) an invertible attention alignment module (IAAM) enabling bidirectional domain mapping while preserving discriminative structure via attention-guided overlap suppression; and 3) a contrastive memory bank with quality-aware contrastive learning that progressively refines pseudo-labels with feature quality for more discriminative representations. Extensive experiments on SynLiDAR-to-SemanticPOSS adaptation demonstrate a maximum mIoU improvement of 14.3\% under adversarial attack.
comment: This paper has been accepted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration IROS 2025
Throwing is a fundamental skill that enables robots to manipulate objects in ways that extend beyond the reach of their arms. We present a control framework that combines learning and model-based control for prehensile whole-body throwing with legged mobile manipulators. Our framework consists of three components: a nominal tracking policy for the end-effector, a high-frequency residual policy to enhance tracking accuracy, and an optimization-based module to improve end-effector acceleration control. The proposed controller achieved the average of 0.28 m landing error when throwing at targets located 6 m away. Furthermore, in a comparative study with university students, the system achieved a velocity tracking error of 0.398 m/s and a success rate of 56.8%, hitting small targets randomly placed at distances of 3-5 m while throwing at a specified speed of 6 m/s. In contrast, humans have a success rate of only 15.2%. This work provides an early demonstration of prehensile throwing with quantified accuracy on hardware, contributing to progress in dynamic whole-body manipulation.
comment: 8 pages, IROS 2025
Systems and Control 43
☆ Development of an Open-Source Spacecraft Bus for the PULSE-A CubeSat
The undergraduate-led Polarization-modUlated Laser Satellite Experiment (PULSE-A) at the University of Chicago seeks to demonstrate the feasibility of circular polarization shift keyed satellite-to-ground laser communication. PULSE-A's low-cost open-source bus serves as the backbone of the mission and has been designed in tandem with the Payload, with design driven by strict requirements for pointing accuracy, component alignment, power demand, and thermal stability. This work presents the design and testing of the PULSE-A bus. The spacecraft bus was designed to fill two major needs: (1) to meet the requirements of the PULSE-A mission, and (2) to be easily configurable for future missions that desire enhanced capabilities over other low-cost open-source designs. At its core, the bus features dual BeagleBone Black Industrial compute units, selected for their flight heritage, integrated via a PC/104 header standard. PULSE-A implements Goddard Space Flight Center's core Flight System (cFS), which takes a modular software architecture approach and is built in C. The use of C as the primary language aligns with the expertise of the University of Chicago's Computer Science department, allowing for ease of development by PULSE-A's undergraduate flight software team. The CubeSat structure utilizes Gran Systems' 3U frame, modified to accommodate openings for various ports and deployable components. Inside, the avionics stack uses the PC/104 standard quad rails, which terminate in PULSE-A's custom-designed Payload Box that houses all of the Payload components and optical fiber runs. This work also covers the techniques and iterative engineering processes used to develop the thermal control and dissipation mechanisms for the specific requirements, under volume, mass, and temperature-range constraints.
comment: Submitted to Advanced Technologies II at the 2025 SmallSat Conference, reference number SSC25-P1-42
☆ Recursive-ARX for Grid-Edge Fault Detection
Future electrical grids will require new ways to identify faults as inverters are not capable of supplying large fault currents to support existing fault detection methods and because distributed resources may feed faults from the edge of the grid. This paper proposes the use of real-time system identification for online power-system fault detection. Specifically, we implement Recursive ARX (rARX) system identification on a grid-connected inverter. Experiments demonstrate that the proposed rARX method is able to both detect large faults quickly, and distinguish between high-impedance faults and large load increases. These results indicate that rARX grid-edge fault detection is a promising research direction for improving the reliability and safety of modern electric grids.
comment: 6 pages, 9 figures, to be presented at IEEE PowerTech 2025 (Kiel, Germany)
☆ A Hybrid Intrusion Detection System with a New Approach to Protect the Cybersecurity of Cloud Computing
Cybersecurity is one of the foremost challenges facing the world of cloud computing. Recently, the widespread adoption of smart devices in cloud computing environments that provide Internet-based services has become prevalent. Therefore, it is essential to consider the security threats in these environments. The use of intrusion detection systems can mitigate the vulnerabilities of these systems. Furthermore, hybrid intrusion detection systems can provide better protection compared to conventional intrusion detection systems. These systems manage issues related to complexity, dimensionality, and performance. This research aims to propose a Hybrid Intrusion Detection System (HyIDS) that identifies and mitigates initial threats. The main innovation of this research is the introduction of a new method for hybrid intrusion detection systems (HyIDS). For this purpose, an Energy-Valley Optimizer (EVO) is used to select an optimal feature set, which is then classified using supervised machine learning models. The proposed approach is evaluated using the CIC_DDoS2019, CSE_CIC_DDoS2018, and NSL-KDD datasets. For evaluation and testing, the proposed system has been run for a total of 32 times. The results of the proposed approach are compared with the Grey Wolf Optimizer (GWO). With the CIC_DDoS2019 dataset, the D_TreeEVO model achieves an accuracy of 99.13% and a detection rate of 98.941%. Furthermore, this result reaches 99.78% for the CSE_CIC_DDoS2018 dataset. In comparison to NSL-KDD, it has an accuracy of 99.50% and a detection rate (DT) of 99.48%. For feature selection, EVO outperforms GWO. The results of this research indicate that EVO yields better results as an optimizer for HyIDS performance.
comment: 1. Acknowledgment for: Supervisor: Prof. Dr. Alireza Rouhi Advisor: Prof. Dr. Einollah Pira 2. Thesis of MSc. degree for Azarbaijan Shahid Madani University Faculty of Information Technology and Computer Engineering 3. Number of pages: 103 4. Number of Figures: 66
☆ Thermodynamic free energy map for the non-oxidative glycolysis pathways
Designing reaction pathways that maximize the production of a target compound in a given metabolic network is a fundamental problem in systems biology. In this study, we systematically explore the non-oxidative glycolysis metabolic network, guided by the principle that reactions with negative Gibbs free energy differences are thermodynamically favored. We enumerate alternative pathways that implement the net non-oxidative glycolysis reaction, categorized by their length. Our analysis reveals several alternative thermodynamically favorable pathways beyond those reported in experiments. In addition, we identify molecules within the network, such as 3-hydroxypropionic acid, that may have significant potential for further investigation.
comment: 10 figures, 6 tables
☆ Adversarial Observability and Performance Tradeoffs in Optimal Control
We develop a feedback controller that minimizes the observability of a set of adversarial sensors of a linear system, while adhering to strict closed-loop performance constraints. We quantify the effectiveness of adversarial sensors using the trace of their observability Gramian and its inverse, capturing both average observability and the least observable state directions of the system. We derive theoretical lower bounds on these metrics under performance constraints, characterizing the fundamental limits of observability reduction as a function of the performance tradeoff. Finally, we show that the performance-constrained optimization of the Gramian's trace can be formulated as a one-shot semidefinite program, while we address the optimization of its inverse through sequential semidefinite programming. Simulations on an aircraft show how the proposed scheme yields controllers that deteriorate adversarial observability while having near-optimal closed-loop performance.
comment: 8 pages, 3 Figures
☆ MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control
This paper presents a Model-Inspired Distributionally Robust Data-enabled Predictive Control (MDR-DeePC) framework for systems with partially known and uncertain dynamics. The proposed method integrates model-based equality constraints for known dynamics with a Hankel matrix-based representation of unknown dynamics. A distributionally robust optimization problem is formulated to account for parametric uncertainty and stochastic disturbances. Simulation results on a triple-mass-spring-damper system demonstrate improved disturbance rejection, reduced output oscillations, and lower control cost compared to standard DeePC. The results validate the robustness and effectiveness of MDR-DeePC, with potential for real-time implementation pending further benchmarking.
comment: Submitted to MECC 2025
☆ Decision-Focused Learning for Neural Network-Constrained Optimization: Application to HVAC Management System
Heating, Ventilation, and Air Conditioning (HVAC) is a major electricity end-use with a substantial potential for grid services such as demand response. Harnessing this flexibility requires accurate modeling of the thermal dynamics of buildings, which is challenging due to their nonlinear and repetitive behavior (e.g., daily pattern), which reduce the value of historical data. To address this issue, this paper presents an HVAC management system formulated as a Mixed Integer Quadratic Program (MIQP), where Neural Network (NN) models of thermal dynamics are embedded as exact mixed-integer linear constraints. We employ Decision-Focused Learning (DFL) which tunes the NN parameters to improve the HVAC performance rather than prediction metrics. However, the discrete nature of the MIQP poses challenges for this approach, as it leads to gradients that are undefined or discontinuous, thus impeding standard gradient-based training. Here, we employ Stochastic Smoothing (SS), which enables efficient gradient computation without the need to differentiate through the MIQP. Experiments on a realistic five-zone building using a high-fidelity building simulator demonstrate that the proposed SS-DFL approach outperforms conventional two-stage and relaxed DFL methods in both cost savings and grid service performance, highlighting its potential for scalable, grid-interactive building control.
comment: 11 pages; submitted to IEEE Transactions on Smart Grid; Code will be made public soon
☆ Estimating Spatially-Dependent GPS Errors Using a Swarm of Robots
External factors, including urban canyons and adversarial interference, can lead to Global Positioning System (GPS) inaccuracies that vary as a function of the position in the environment. This study addresses the challenge of estimating a static, spatially-varying error function using a team of robots. We introduce a State Bias Estimation Algorithm (SBE) whose purpose is to estimate the GPS biases. The central idea is to use sensed estimates of the range and bearing to the other robots in the team to estimate changes in bias across the environment. A set of drones moves in a 2D environment, each sampling data from GPS, range, and bearing sensors. The biases calculated by the SBE at estimated positions are used to train a Gaussian Process Regression (GPR) model. We use a Sparse Gaussian process-based Informative Path Planning (IPP) algorithm that identifies high-value regions of the environment for data collection. The swarm plans paths that maximize information gain in each iteration, further refining their understanding of the environment's positional bias landscape. We evaluated SBE and IPP in simulation and compared the IPP methodology to an open-loop strategy.
comment: 6 pages, 7 figures, 2025 IEEE 21st International Conference on Automation Science and Engineering
☆ Learning to Solve Parametric Mixed-Integer Optimal Control Problems via Differentiable Predictive Control
We propose a novel approach to solving input- and state-constrained parametric mixed-integer optimal control problems using Differentiable Predictive Control (DPC). Our approach follows the differentiable programming paradigm by learning an explicit neural policy that maps control parameters to integer- and continuous-valued decision variables. This policy is optimized via stochastic gradient descent by differentiating the quadratic model predictive control objective through the closed-loop finite-horizon response of the system dynamics. To handle integrality constraints, we incorporate three differentiable rounding strategies. The approach is evaluated on a conceptual thermal energy system, comparing its performance with the optimal solution for different lengths of the prediction horizon. The simulation results indicate that our self-supervised learning approach can achieve near-optimal control performance while significantly reducing inference time by avoiding online optimization, thus implying its potential for embedded deployment even on edge devices.
comment: 7 pages, 2 figures, 1 algorithm, 1 table
☆ Resilience assessment framework for cyber-physical distribution power system based on coordinated cyber-physical attacks under dynamic game
Owing to the advanced communication networks and intelligent electronic devices, the cyber-physical distribution systems (CPDSs) possess the capability to perform flexible economic dispatch and achieve rapid self-healing from extreme events. Meanwhile, the deep integration of cyber and physical systems makes CPDS vulnerable to coordinated cyber-physical attacks. In this paper, a resilience assessment framework for the CPDS under coordinated cyber-physical attacks is proposed to investigate the impact of the coordinated attacks on load loss and service restoration in CPDS. First, a three-stage defender-attacker-defender dynamic game model considering fake base station (FBS) and physical attacks for CPDS is established, aiming at seeking the optimal defense resource deployment strategy to enhance the resilience of the CPDS. The physical attack is launched to cause faults on the power lines, and the FBS attack is employed to interrupt the service of wireless cellular network to hinder the self-healing process of the CPDS. The lognormal shadowing model and search theory are applied to quantitatively describe the process of the coordinated cyber-physical attacks. Further, the constructed three-stage dynamic game model is equivalently recast as a tri-level max-min-max optimization model, which is solved using column-and-constraint generation combined with enumeration method. Finally, the effectiveness of the proposed resilience assessment framework and solution strategy is demonstrated by conducting simulation analysis on the modified IEEE 33-node CPDS and a real-world 47-node CPDS in China.
☆ A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
☆ Probabilistic modelling and safety assurance of an agriculture robot providing light-treatment
Continued adoption of agricultural robots postulates the farmer's trust in the reliability, robustness and safety of the new technology. This motivates our work on safety assurance of agricultural robots, particularly their ability to detect, track and avoid obstacles and humans. This paper considers a probabilistic modelling and risk analysis framework for use in the early development phases. Starting off with hazard identification and a risk assessment matrix, the behaviour of the mobile robot platform, sensor and perception system, and any humans present are captured using three state machines. An auto-generated probabilistic model is then solved and analysed using the probabilistic model checker PRISM. The result provides unique insight into fundamental development and engineering aspects by quantifying the effect of the risk mitigation actions and risk reduction associated with distinct design concepts. These include implications of adopting a higher performance and more expensive Object Detection System or opting for a more elaborate warning system to increase human awareness. Although this paper mainly focuses on the initial concept-development phase, the proposed safety assurance framework can also be used during implementation, and subsequent deployment and operation phases.
☆ Robotics Under Construction: Challenges on Job Sites ICRA
As labor shortages and productivity stagnation increasingly challenge the construction industry, automation has become essential for sustainable infrastructure development. This paper presents an autonomous payload transportation system as an initial step toward fully unmanned construction sites. Our system, based on the CD110R-3 crawler carrier, integrates autonomous navigation, fleet management, and GNSS-based localization to facilitate material transport in construction site environments. While the current system does not yet incorporate dynamic environment adaptation algorithms, we have begun fundamental investigations into external-sensor based perception and mapping system. Preliminary results highlight the potential challenges, including navigation in evolving terrain, environmental perception under construction-specific conditions, and sensor placement optimization for improving autonomy and efficiency. Looking forward, we envision a construction ecosystem where collaborative autonomous agents dynamically adapt to site conditions, optimizing workflow and reducing human intervention. This paper provides foundational insights into the future of robotics-driven construction automation and identifies critical areas for further technological development.
comment: Workshop on Field Robotics, ICRA
☆ Implementing blind navigation through multi-modal sensing and gait guidance
By the year 2023, the global population of individuals with impaired vision has surpassed 220 million. People with impaired vision will find it difficult while finding path or avoiding obstacles, and must ask for auxiliary tools for help. Although traditional aids such as guide canes and guide dogs exist, they still have some shortcomings. In this paper, we present our wearable blind guiding device, what perform navigation guidance through our proposed Gait-based Guiding System. Our device innovatively integrates gait phase analysis for walking guide, and in terms of environmental perception, we use multimodal sensing to acquire diverse environment information. During the experiment, we conducted both indoor and outdoor experiments, and compared with the standard guide cane. The result shows superior performance of our device in blind guidance.
☆ Implementation and Analysis of Different Geomagnetic Field Models for Attitude Determination and Control System (ADCS) of a Satellite TRO
An Attitude Determination and Control System is essential for orientation stability and performance of slew maneuvers on the satellite. This research focuses on comparing two different geomagnetic field models, Direct Dipole Model and International Geomagnetic Reference Field Model, for modeling of magnetometer and magnetorquers. Both these magnetic field models are compared and analyzed for two satellite attitude cases: orientation stability and unloading of reaction wheels. Magnetometer modeling is utilized to get sensor data for attitude determination and control to attain orientation stability. Whereas, the magnetorquer model aids in reaction wheel unloading, by performing the required actuation on the satellite, upon interaction with the Earth's magnetic field. The study offers a comprehensive lookout on the impact of geomagnetic field models on the overall ADCS performance, incorporating both attitude estimation and control via the sensor and actuator modeling. Apart from this, valuable insights are gained into selecting optimal models based on specific mission requirements and available computational resources. Finally, this comparison and analysis results in unique findings for an actual future satellite mission, that is to be launched soon.
comment: 27 pages, 8 figures, accepted by ASTRODYNAMICS journal
☆ Finite-Horizon Strategy in Infinite-Horizon Linear-Quadratic Discrete-Time Dynamic Games
This paper explores a finite-horizon strategy, ``watching $T$ steps into the future and moving one step now,'' in an $N$-person infinite-horizon discrete-time linear-quadratic dynamic game. The game involves linear input/output/state dynamics and quadratic cost functions with heterogeneous discount factors. For the finite-horizon version, which forms the basis of the infinite-horizon game, we analyze the structure of the coupled generalized discrete Riccati difference equations related to the feedback Nash equilibrium (FNE) and derive a sufficient condition for the uniqueness of the finite-horizon FNE. Under this condition, the FNE can be efficiently computed via the proposed algorithm. In the infinite-horizon game, assume all players adopt this finite-horizon strategy. If the iterations of the coupled equations related to the FNE converge, and the invertibility and stability conditions hold, we prove the convergence of each player's total cost under the finite-horizon strategy, even when players use individual prediction horizons. Furthermore, we provide an explicit upper bound on the cost difference between the finite-horizon strategy and the infinite-horizon FNE associated with the limiting matrices, expressed via the distance between their feedback strategy matrices. This bound vanishes as $T$ tends to infinity, implying convergence to the infinite-horizon FNE cost. A non-scalar numerical example illustrates the convergence behavior.
comment: 10 pages, 2 figures
☆ Time-Constrained Interception of Seeker-Equipped Interceptors with Bounded Input
This paper presents a nonlinear guidance scheme designed to achieve precise interception of stationary targets at a pre-specified impact time. The proposed strategy essentially accounts for the constraints imposed by the interceptor's seeker field-of-view (FOV) and actuator limitations, which, if ignored, can degrade guidance performance. To address these challenges, the guidance law incorporates known actuator bounds directly into its design, thereby improving overall interceptor effectiveness. The proposed method utilizes an input-affine magnitude saturation model to effectively enforce these constraints. By appending this input saturation model to the interceptor's kinematic equations, a guidance law is derived that ensures interception at the desired impact time while accounting for the physical constraints of the sensor and actuator. The efficacy of the proposed strategies is demonstrated through comprehensive numerical simulations across various scenarios and is compared against an existing guidance strategy.
☆ Coherent and Noncoherent Detection in Dense Arrays: Can We Ignore Mutual Coupling?
This paper investigates the impact of mutual coupling on MIMO systems with densely deployed antennas. Leveraging multiport communication theory, we analyze both coherent and noncoherent detection approaches in a single-user uplink scenario where the receiver ignores mutual coupling effects. Simulation results indicate that while coherent detection is generally more accurate, it is highly sensitive to mismatches in the coupling model, leading to severe performance degradation when antennas are closely spaced, to the point of becoming unusable. Noncoherent detection, on the other hand, exhibits a higher error probability but is more robust to coupling model mismatches.
comment: Accepted version of the article submitted to EUSIPCO 2025
☆ Enhanced Fault Ride-Through Grid Forming with Transient Synchronisation Stability and Current Saturation
During grid faults, grid-forming converters are typically suggested to switch from a voltage-source to a current-source mode to limit the current and protect the electronics. This transition has the potential for the converter to transiently lose synchronization due to such current saturation. Therefore, this paper proposes an alternative current saturation algorithm to improve transient synchronization stability during mode switching. The algorithm is designed for grid-forming converters to meet low-voltage ride-through (LVRT) requirements and grid-fault standards in addition to transient synchronization stability. Moreover, it limits the converter output current during grid faults with a new control parameter. The presented method introduces converter output virtual fluxes to calculate the current references in the d- and q-axes for the current saturation algorithm to enhance LVRT performance and grid stability. The method exploits the correlation between the converter's virtual fluxes and currents to modify the current saturation levels through real-time converter virtual flux estimation. The adaptive saturation levels ensure precise control and high dynamics during grid faults and facilitate optimal power injection or absorption to support the grid. The proposed current-saturation algorithm is analytically evaluated. Further, hardware-in-the-loop (HIL) experiments validate the effectiveness of the proposed algorithm.
comment: 12 pages, 18 figures
☆ Beam Squint Mitigation in Wideband Hybrid Beamformers: Full-TTD, Sparse-TTD, or Non-TTD?
Beam squint poses a fundamental challenge in wideband hybrid beamforming, particularly for mmWave and THz systems that demand both ultra-wide bandwidth and high directional beams. While conventional phase shifter-based beamformers may offer partial mitigation, True Time Delay (TTD) units provide a fundamentally more effective solution by enabling frequency-independent beam steering. However, the high cost of TTD units has recently driven much interest in Sparse-TTD architectures, which combine a limited number of TTDs with a higher number of conventional phase shifters to balance performance and cost. This paper provides a critical examination of beam squint mitigation strategies in wideband hybrid beamformers, comparing Full-TTD, Sparse-TTD, and Non-TTD architectures. We analyze recent Non-TTD approaches, specifically the scheme leveraging the wideband beam gain (WBBG) concept, evaluating their performance and cost characteristics against TTD-based solutions. A key focus is placed on the practical limitations of Sparse-TTD architectures, particularly the often-overlooked requirement for wideband phase shifters operating alongside TTDs, which can significantly impact performance and implementation cost in real-world scenarios, especially for ultra-wideband applications. Finally, we conduct a cost-performance analysis to examine the trade-offs inherent in each architecture and provide guidance on selecting the most suitable hybrid beamforming structure for various fractional bandwidth regimes.
☆ Revisiting Power System Stabilizers with Increased Inverter-Based Generation: A Case Study
As power systems evolve with increasing production from Inverter-Based Resources (IBRs), their underlying dynamics are undergoing significant changes that can jeopardize system operation, leading to poorly damped oscillations or small-signal rotor angle instability. In this work, we investigate whether Power System Stabilizer (PSS) setting adjustments can effectively restore system stability and provide adequate damping in systems with increased IBR penetration, using the benchmark Kundur Two-Area System as a case study. Specifically, we evaluate the model-based Residues and P-Vref PSS tuning methods to examine their effectiveness under evolving grid conditions. Our findings indicate that the effectiveness of these tuning methods is not guaranteed, particularly when coordination is limited. Consequently, our case study motivates local and adaptive online PSS tuning methods.
☆ Partially Observable Residual Reinforcement Learning for PV-Inverter-Based Voltage Control in Distribution Grids
This paper introduces an efficient Residual Reinforcement Learning (RRL) framework for voltage control in active distribution grids. Voltage control remains a critical challenge in distribution grids, where conventional Reinforcement Learning (RL) methods often suffer from slow training convergence and inefficient exploration. To overcome these challenges, the proposed RRL approach learns a residual policy on top of a modified Sequential Droop Control (SDC) mechanism, ensuring faster convergence. Additionally, the framework introduces a Local Shared Linear (LSL) architecture for the Q-network and a Transformer-Encoder actor network, which collectively enhance overall performance. Unlike several existing approaches, the proposed method relies solely on inverters' measurements without requiring full state information of the power grid, rendering it more practical for real-world deployment. Simulation results validate the effectiveness of the RRL framework in achieving rapid convergence, minimizing active power curtailment, and ensuring reliable voltage regulation.
☆ Zero-Shot Parameter Learning of Robot Dynamics Using Bayesian Statistics and Prior Knowledge
Inertial parameter identification of industrial robots is an established process, but standard methods using Least Squares or Machine Learning do not consider prior information about the robot and require extensive measurements. Inspired by Bayesian statistics, this paper presents an identification method with improved generalization that incorporates prior knowledge and is able to learn with only a few or without additional measurements (Zero-Shot Learning). Furthermore, our method is able to correctly learn not only the inertial but also the mechanical and base parameters of the MABI Max 100 robot while ensuring physical feasibility and specifying the confidence intervals of the results. We also provide different types of priors for serial robots with 6 degrees of freedom, where datasheets or CAD models are not available.
comment: Carsten Reiners and Minh Trinh contributed equally to this work
☆ Peer-to-Peer Energy Markets With Uniform Pricing: A Dynamic Operating Envelope Approach
The recent widespread adoption of rooftop solar backed by battery storage is enabling energy customers to both produce and consume electricity (i.e., prosumers of electricity). To facilitate prosumer participation in the electric grid, new market mechanisms are required. In this paper, we design peer-to-peer energy markets where prosumers trade their excess energy with peers to gain profit while satisfying the overall balance in electricity supply and demand. We first consider a market structure, considering the case where voltage and/or thermal constraints are binding. When such grid constraints are binding, market clearing prices can vary across locations. However, heterogeneous prices may be considered by regulators to lack fairness. To ensure uniform pricing, we design two peer-to-peer energy markets with dynamic operating envelopes (DOEs). DOEs enable us to decompose global voltage and thermal constraints across the power grid into local constraints for each prosumer, resulting in uniform prices across the grid. By means of numerical simulations on an IEEE 13-node feeder, we benchmark the proposed market-based approaches in the presence of binding voltage constraints.
☆ FlightKooba: A Fast Interpretable FTP Model
The Koopman theory is a powerful and effective modeling tool for converting nonlinear systems into linear representations, and flight trajectory prediction (FTP) is a complex nonlinear system. However, current models applying the Koopman theory to FTP tasks are not very effective, model interpretability is indeed an issue, and the Koopman operators are computationally intensive, resulting in long training times. To address this issue, this paper proposes a new modeling and control framework based on the HIPPO method, the Koopman theory, and state space equations from cybernetics: FlightKooba. Inspired by the idea of structural state space equations, FlightKooba directly constructs the Koopman operators from data. This makes the framework highly interpretable and significantly reduces the number of trainable parameters in the module, thereby greatly reducing training time. Experiments have demonstrated the superiority of the FlightKooba modeling method in terms of time and memory consumption (training time comparable to the Mamba module without using CUDA-level acceleration; memory reduced by more than 50% on most datasets, with a tenfold reduction in the number of parameters), essentially completing the FTP task. It provides a new method for the fast computation of the Koopman operators, opening up new possibilities for the combination of time series forecasting and control.
comment: 7 figures
☆ Online Algorithms for Recovery of Low-Rank Parameter Matrix in Non-stationary Stochastic Systems
This paper presents a two-stage online algorithm for recovery of low-rank parameter matrix in non-stationary stochastic systems. The first stage applies the recursive least squares (RLS) estimator combined with its singular value decomposition to estimate the unknown parameter matrix within the system, leveraging RLS for adaptability and SVD to reveal low-rank structure. The second stage introduces a weighted nuclear norm regularization criterion function, where adaptive weights derived from the first-stage enhance low-rank constraints. The regularization criterion admits an explicit and online computable solution, enabling efficient online updates when new data arrive without reprocessing historical data. Under the non-stationary and the non-persistent excitation conditions on the systems, the algorithm provably achieves: (i) the true rank of the unknown parameter matrix can be identified with a finite number of observations, (ii) the values of the matrix components can be consistently estimated as the number of observations increases, and (iii) the asymptotical normality of the algorithm is established as well. Such properties are termed oracle properties in the literature. Numerical simulations validate performance of the algorithm in estimation accuracy.
☆ Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control
The advancement of autonomous robotic systems has led to impressive capabilities in perception, localization, mapping, and control. Yet, a fundamental gap remains: existing frameworks excel at geometric reasoning and dynamic stability but fall short in representing and preserving relational semantics, contextual reasoning, and cognitive transparency essential for collaboration in dynamic, human-centric environments. This paper introduces a unified architecture comprising the Ontology Neural Network (ONN) and the Ontological Real-Time Semantic Fabric (ORTSF) to address this gap. The ONN formalizes relational semantic reasoning as a dynamic topological process. By embedding Forman-Ricci curvature, persistent homology, and semantic tensor structures within a unified loss formulation, ONN ensures that relational integrity and topological coherence are preserved as scenes evolve over time. The ORTSF transforms reasoning traces into actionable control commands while compensating for system delays. It integrates predictive and delay-aware operators that ensure phase margin preservation and continuity of control signals, even under significant latency conditions. Empirical studies demonstrate the ONN + ORTSF framework's ability to unify semantic cognition and robust control, providing a mathematically principled and practically viable solution for cognitive robotics.
comment: 12 pages, 5 figures, includes theoretical proofs and simulation results
♻ ☆ Followerstopper Revisited: Phase-space Lagrangian Controller for Traffic Decongestion
This paper revisits Followerstopper, a phase-space-based control system that had demonstrated its ability to mitigate emergent traffic jams due to stop-and-go traffic during rush hour in the mixed-autonomy setting. Followerstopper was deployed on an autonomous vehicle. The controller attenuates the emanant traffic waves by regulating its velocity according to the relative distance and velocity of the leader car. While regulating the velocity, the controller also prevents the collision of the ego vehicle with the lead vehicle within the range specified by the controller's design parameter. The controller design is based on a configurable quadratic curve on relative distance-relative velocity phase-space that allows the transition of the regulated velocity from (i) no modification of input, (ii) decelerating to match the leader's velocity (iii) braking to avoid any imminent collision. In this paper, we explore the phase-space properties of Followerstopper and provide a detailed description of a nonlinear control law that regulates the reference input to Followerstopper within the physics-informed boundaries. We also provide a new discussion on the nominal control law that regulates the reference speed to Followerstopper to avoid unrealistic and unsafe acceleration.
comment: 10 figures, 11 pages. This version fixes typo in the title, and removes vector field from figures that otherwise adds confusion to the analysis
♻ ☆ Power-Capping Metric Evaluation for Improving Energy Efficiency in HPC Applications
With high-performance computing systems now running at exascale, optimizing power-scaling management and resource utilization has become more critical than ever. This paper explores runtime power-capping optimizations that leverage integrated CPU-GPU power management on architectures like the NVIDIA GH200 superchip. We evaluate energy-performance metrics that account for simultaneous CPU and GPU power-capping effects by using two complementary approaches: speedup-energy-delay and a Euclidean distance-based multi-objective optimization method. By targeting a mostly compute-bound exascale science application, the Locally Self-Consistent Multiple Scattering (LSMS), we explore challenging scenarios to identify potential opportunities for energy savings in exascale applications, and we recognize that even modest reductions in energy consumption can have significant overall impacts. Our results highlight how GPU task-specific dynamic power-cap adjustments combined with integrated CPU-GPU power steering can improve the energy utilization of certain GPU tasks, thereby laying the groundwork for future adaptive optimization strategies.
comment: 14 pages, 3 figures, 2 tables. Accepted at the Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices, EESP Workshop, in conjunction with ISC High Performance 2025
♻ ☆ Physics-Informed Neural Networks: a Plug and Play Integration into Power System Dynamic Simulations
Time-domain simulations are crucial for ensuring power system stability and avoiding critical scenarios that could lead to blackouts. The next-generation power systems require a significant increase in the computational cost and complexity of these simulations due to additional degrees of uncertainty, non-linearity and states. Physics-Informed Neural Networks (PINN) have been shown to accelerate single-component simulations by several orders of magnitude. However, their application to current time-domain simulation solvers has been particularly challenging since the system's dynamics depend on multiple components. Using a new training formulation, this paper introduces the first natural step to integrate PINNs into multi-component time-domain simulations. We propose PINNs as an alternative to other classical numerical methods for individual components. Once trained, these neural networks approximate component dynamics more accurately for longer time steps. Formulated as an implicit and consistent method with the transient simulation workflow, PINNs speed up simulation time by significantly increasing the time steps used. For explanation clarity, we demonstrate the training, integration, and simulation framework for several combinations of PINNs and numerical solution methods using the IEEE 9-bus system, although the method applies equally well to any power system size.
♻ ☆ Decentralized Parametric Stability Certificates for Grid-Forming Converter Control
We propose a decentralized framework for guaranteeing the small-signal stability of future power systems with grid-forming converters. Our approach leverages dynamic loop-shifting techniques to compensate for the lack of passivity in the network dynamics and establishes decentralized parametric stability certificates, depending on the local device-level controls and incorporating the effects of the network dynamics. By following practical tuning rules, we are able to ensure plug-and-play operation without centralized coordination. Unlike prior works, our approach accommodates coupled frequency and voltage dynamics, incorporates network dynamics, and does not rely on specific network configurations or operating points, offering a general and scalable solution for the integration of power-electronics-based devices into future power systems. We validate our theoretical stability results through numerical case studies in a high-fidelity simulation model.
comment: 12 pages, 14 figures
♻ ☆ On the Popov-Belevitch-Hautus tests for functional observability and output controllability
Functional observability and output controllability are properties that establish the conditions for the partial estimation and partial control of the system state, respectively. In the special case of full-state observability and controllability, the Popov-Belevitch-Hautus (PBH) tests provide conditions for the properties to hold based on the system eigenspace. Generalizations of the PBH test have been recently proposed for functional observability and output controllability, but thus far have only been proven valid for diagonalizable systems. Here, we rigorously establish the generalized PBH test for functional observability, extending its validity to a broader class of systems using Jordan decomposition. Likewise, we determine the class of systems under which the generalized PBH test is sufficient and necessary for output controllability. These results have immediate implications for observer and controller design, pole assignment, and optimal placement of sensors and drivers.
♻ ☆ Exponentially Stable Combined Adaptive Control under Finite Excitation Condition
The parameter convergence relies on a stringent persistent excitation (PE) condition in adaptive control. Several works have proposed a memory term in the last decade to translate the PE condition to a feasible finite excitation (FE) condition. This work proposes a combined model reference adaptive control for a class of uncertain nonlinear systems with an unknown control effectiveness vector. The closed-loop system is exponentially stable under the FE condition. The exponential rate of convergence is independent of the excitation level of the regressor vector and is lower-bounded in terms of the system parameters and user-designed gains. Numerical simulation is illustrated, validating the results obtained with the proposed adaptive control.
♻ ☆ Duality between controllability and observability for target control and estimation in networks
Output controllability and functional observability are properties that enable, respectively, the control and estimation of part of the state vector. These notions are of utmost importance in applications to high-dimensional systems, such as large-scale networks, in which only a target subset of variables (nodes) is sought to be controlled or estimated. Although the duality between full-state controllability and observability is well established, the characterization of the duality between their generalized counterparts remains an outstanding problem. Here, we establish both the weak and the strong duality between output controllability and functional observability. Specifically, we show that functional observability of a system implies output controllability of a dual system (weak duality), and that under a certain geometric condition the converse holds (strong duality). As an application of the strong duality, we derive a necessary and sufficient condition for target control via static feedback. This allow us to establish a separation principle between the design of target controllers and the design of functional observers in closed-loop systems. These results generalize the classical duality and separation principles in modern control theory.
comment: Code available at our GitHub repository: https://github.com/montanariarthur/TargetCtrb
♻ ☆ PID Tuning via Desired Step Response Curve Fitting
This paper presents a PID tuning method based on step response curve fitting (PID-SRCF) that utilizes L2-norm minimization for precise reference tracking and explicit transient response shaping. The algorithm optimizes controller parameters by minimizing the root-mean-square error between desired and actual step responses. The proposed approach determines optimal PID parameters by matching any closed-loop response to a desired system step response. Practically a first-order plus time delay model or a second-order system with defined settling time and overshoot requirements are preferred. The method has open-source implementation using constrained nonlinear optimization in MATLAB. Comparative evaluations demonstrate that PID-SRCF can replace known analytical methods like Ziegler Nichols, Lambda Tuning, Pole Placement, Dominant Pole and MATLAB proprietary PID tuning applications.
comment: 4 tables, 4 figures
♻ ☆ Energy-Efficient Motion Planner for Legged Robots IROS 2025
We propose an online motion planner for legged robot locomotion with the primary objective of achieving energy efficiency. The conceptual idea is to leverage a placement set of footstep positions based on the robot's body position to determine when and how to execute steps. In particular, the proposed planner uses virtual placement sets beneath the hip joints of the legs and executes a step when the foot is outside of such placement set. Furthermore, we propose a parameter design framework that considers both energy-efficiency and robustness measures to optimize the gait by changing the shape of the placement set along with other parameters, such as step height and swing time, as a function of walking speed. We show that the planner produces trajectories that have a low Cost of Transport (CoT) and high robustness measure, and evaluate our approach against model-free Reinforcement Learning (RL) and motion imitation using biological dog motion priors as the reference. Overall, within low to medium velocity range, we show a 50.4% improvement in CoT and improved robustness over model-free RL, our best performing baseline. Finally, we show ability to handle slippery surfaces, gait transitions, and disturbances in simulation and hardware with the Unitree A1 robot.
comment: This paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 8 figures
♻ ☆ Semantic Communication in Multi-team Dynamic Games: A Mean Field Perspective
Coordinating communication and control is a key component in the stability and performance of networked multi-agent systems. While single user networked control systems have gained a lot of attention within this domain, in this work, we address the more challenging problem of large population multi-team dynamic games. In particular, each team constitutes two decision makers (namely, the sensor and the controller) who coordinate over a shared network to control a dynamically evolving state of interest under costs on both actuation and sensing/communication. Due to the shared nature of the wireless channel, the overall cost of each team depends on other teams' policies, thereby leading to a noncooperative game setup. Due to the presence of a large number of teams, we compute approximate decentralized Nash equilibrium policies for each team using the paradigm of (extended) mean-field games, which is governed by (1) the mean traffic flowing over the channel, and (2) the value of information at the sensor, which highlights the semantic nature of the ensuing communication. In the process, we compute optimal controller policies and approximately optimal sensor policies for each representative team of the mean-field system to alleviate the problem of general non-contractivity of the mean-field fixed point operator associated with the finite cardinality of the sensor action space. Consequently, we also prove the $\epsilon$--Nash property of the mean-field equilibrium solution which essentially characterizes how well the solution derived using mean-field analysis performs on the finite-team system. Finally, we provide extensive numerical simulations, which corroborate the theoretical findings and lead to additional insights on the properties of the results presented.
comment: Submitted to IEEE for possible publication
♻ ☆ Smart Sensing Breaks the Accuracy Barrier in Battery State Monitoring
Accurate state-of-charge (SOC) estimation is essential for optimizing battery performance, ensuring safety, and maximizing economic value. Conventional current and voltage measurements, however, have inherent limitations in fully inferring the multiphysics-resolved dynamics inside battery cells. This creates an accuracy barrier that constrains battery usage and reduces cost-competitiveness and sustainability across industries dependent on battery technology. In this work, we introduce an integrated sensor framework that combines novel mechanical, thermal, gas, optical, and electrical sensors with traditional measurements to break through this barrier. We generate three unique datasets with eleven measurement types and propose an explainable machine-learning approach for SOC estimation. This approach renders the measured signals and the predictive result of machine learning physically interpretable with respect to battery SOC, offering fundamental insights into the time-varying importance of different signals. Our experimental results reveal a marked increase in SOC estimation accuracy--enhanced from 46.1% to 74.5%--compared to conventional methods. This approach not only advances SOC monitoring precision but also establishes a foundation for monitoring additional battery states to further improve safety, extend lifespan, and facilitate fast charging.
♻ ☆ Fully distributed and resilient source seeking for robot swarms
We propose a self-contained, resilient and fully distributed solution for locating the maximum of an unknown scalar field using a swarm of robots that travel at a constant speed. Unlike conventional reactive methods relying on gradient information, our methodology enables the swarm to determine an ascending direction so that it approaches the source with an arbitrary precision. Our source-seeking solution consists of three distributed algorithms running simultaneously in a slow-fast closed-loop system. The fastest algorithm provides the centroid-relative coordinates of the robots and the next slower one provides the ascending direction to be tracked. The tracking of the ascending direction by single integrators is instantaneous; howeverin this paper we will also focus on 2D unicycle-like robots with a constant speed. The third algorithm, the slowest one since the speed of the robots can be chosen arbitrarily slow, is the individual control law for the unicycle to track the estimated ascending direction.We will show that the three distributed algorithms converge exponentially fast to their objectives, allowing for a feasible slow-fast closed-loop system. The robots are not constrained to any particular geometric formation, and we study both discrete and continuous distributions of robots.The swarm shape analysis reveals the resiliency of our approach as expected in robot swarms, i.e., by amassing robots we ensure the source-seeking functionality in the event of missing or misplaced individuals or even if the robot network splits in two or more disconnected subnetworks.We exploit such an analysis so that the swarm can adapt to unknown environments by morphing its shape and maneuvering while still following an ascending direction. We analyze our solution with robots as kinematic points in n-dimensional Euclidean spaces and extend the analysis to 2D unicycle-like robots with constant speeds.
comment: 16 pages, submitted version to T-TAC. Jesus Bautista and Antonio Acuaviva contributed equally to this work. arXiv admin note: text overlap with arXiv:2309.02937
♻ ☆ Sampling-based Stochastic Data-driven Predictive Control under Data Uncertainty
We present a stochastic constrained output-feedback data-driven predictive control scheme for linear time-invariant systems subject to bounded additive disturbances. The approach uses data-driven predictors based on an extension of Willems' fundamental lemma and requires only a single persistently exciting input-output data trajectory. Compared to current state-of-the-art approaches, we do not rely on availability of exact disturbance data. Instead, we leverage a novel parameterization of the unknown disturbance data considering consistency with the measured data and the system class. This allows for deterministic approximation of the chance constraints in a sampling-based fashion. A robust constraint on the first predicted step enables recursive feasibility, closed-loop constraint satisfaction, and robust asymptotic stability in expectation under standard assumptions. A numerical example demonstrates the efficiency of the proposed control scheme.
♻ ☆ Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
♻ ☆ Stochastic MPC for Finite Gaussian Mixture Disturbances with Guarantees
This paper presents a stochastic model predictive control (SMPC) algorithm for linear systems subject to additive Gaussian mixture disturbances, with the goal of satisfying chance constraints. We focus on a special case where each Gaussian mixture component has a similar variance. To solve the SMPC problem, we formulate a branch model predictive control (BMPC) problem on simplified dynamics and leverage stochastic simulation relations (SSR). Our contribution is an extension of the SMPC literature to accommodate Gaussian mixture disturbances while retaining recursive feasibility and closed-loop guarantees. We illustrate the retention of guarantees with a case study of vehicle control on an ill-maintained road.
comment: 7 pages, 6 figures, accepted for ECC 2025
♻ ☆ Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses
Efficient and robust bioprocess control is essential for maximizing performance and adaptability in advanced biotechnological systems. In this work, we present a reinforcement-learning framework for multi-setpoint and multi-trajectory tracking. Tracking multiple setpoints and time-varying trajectories in reinforcement learning is challenging due to the complexity of balancing multiple objectives, a difficulty further exacerbated by system uncertainties such as uncertain initial conditions and stochastic dynamics. This challenge is relevant, e.g., in bioprocesses involving microbial consortia, where precise control over population compositions is required. We introduce a novel return function based on multiplicative reciprocal saturation functions, which explicitly couples reward gains to the simultaneous satisfaction of multiple references. Through a case study involving light-mediated cybergenetic growth control in microbial consortia, we demonstrate via computational experiments that our approach achieves faster convergence, improved stability, and superior control compliance compared to conventional quadratic-cost-based return functions. Moreover, our method enables tuning of the saturation function's parameters, shaping the learning process and policy updates. By incorporating system uncertainties, our framework also demonstrates robustness, a key requirement in industrial bioprocessing. Overall, this work advances reinforcement-learning-based control strategies in bioprocess engineering, with implications in the broader field of process and systems engineering.
Optimization and Control 30
☆ Approximating the order 2 quantum Wasserstein distance using the moment-SOS hierarchy
Optimal transport theory has recently been extended to quantum settings, where the density matrices generalize the probability measures. In this paper, we study the computational aspects of the order 2 quantum Wasserstein distance, formulating it as an infinite dimensional linear program in the space of positive Borel measures supported on products of two unit spheres. This formulation is recognized as an instance of the Generalized Moment Problem, which enables us to use the moment-sums of squares hierarchy to provide a sequence of lower bounds converging to the distance. We illustrate our approach with numerical experiments.
comment: 13 pages
☆ Logarithmic convexity of evolution equations and application to inverse problems
We review some results on the logarithmic convexity for evolution equations, a well-known method in inverse and ill-posed problems. We start with the classical case of self-adjoint operators. Then, we analyze the case of analytic semigroups. In this general case, we give an explicit estimate, which will be used to study inverse problems for initial data recovery. We illustrate our abstract result by an application to the Ornstein-Uhlenbeck equations. We discuss both analytic and non-analytic semigroups. We conclude with recent results for the time-fractional evolution equations with the Caputo derivative of order $0<\alpha <1$. We start with symmetric evolution equations. Then, we show that the results extend to the non-symmetric case for diffusion equations, provided that a gradient vector field generates the drift coefficient. Finally, some open problems will be mentioned.
comment: Short review
☆ Adversarial Observability and Performance Tradeoffs in Optimal Control
We develop a feedback controller that minimizes the observability of a set of adversarial sensors of a linear system, while adhering to strict closed-loop performance constraints. We quantify the effectiveness of adversarial sensors using the trace of their observability Gramian and its inverse, capturing both average observability and the least observable state directions of the system. We derive theoretical lower bounds on these metrics under performance constraints, characterizing the fundamental limits of observability reduction as a function of the performance tradeoff. Finally, we show that the performance-constrained optimization of the Gramian's trace can be formulated as a one-shot semidefinite program, while we address the optimization of its inverse through sequential semidefinite programming. Simulations on an aircraft show how the proposed scheme yields controllers that deteriorate adversarial observability while having near-optimal closed-loop performance.
comment: 8 pages, 3 Figures
☆ Exact Matrix Seriation through Mathematical Optimization: Stress and Effectiveness-Based Models
Matrix seriation, the problem of permuting the rows and columns of a matrix to uncover latent structure, is a fundamental technique in data science, particularly in the visualization and analysis of relational data. Applications span clustering, anomaly detection, and beyond. In this work, we present a unified framework grounded in mathematical optimization to address matrix seriation from a rigorous, model-based perspective. Our approach leverages combinatorial and mixed-integer optimization to represent seriation objectives and constraints with high fidelity, bridging the gap between traditional heuristic methods and exact solution techniques. We introduce new mathematical programming models for neighborhood-based stress criteria, including nonlinear formulations and their linearized counterparts. For structured settings such as Moore and von Neumann neighborhoods, we develop a novel Hamiltonian path-based reformulation that enables effective control over spatial arrangement and interpretability in the reordered matrix. To assess the practical impact of our models, we carry out an extensive set of experiments on synthetic and real-world datasets, as well as on a newly curated benchmark based on a coauthorship network from the matrix seriation literature. Our results show that these optimization-based formulations not only enhance solution quality and interpretability but also provide a versatile foundation for extending matrix seriation to new domains in data science.
comment: 30 pages, 20 figures, 8 tables. Codes and datasets available at: https://github.com/vblancoOR/seriation_mathopt
☆ Controllability of Boussinesq flows driven by finite-dimensional and physically localized forces
We show approximate controllability of Boussinesq flows in $\mathbb{T}^2 = \mathbb{R}^2 / 2\pi\mathbb{Z}^2$ driven by finite-dimensional controls that are supported in any fixed region $\omega \subset \mathbb{T}^2$. This addresses a Boussinesq version of a question by Agrachev and provides a first example of fluid PDEs controllable in that sense. In particular, we add in this context to results obtained for the Navier-Stokes system by Agrachev-Sarychev (Comm. Math. Phys. 265, 2006), where the controls are finite-dimensional but not localized in physical space, and Nersesyan-Rissel (Comm. Pure Appl. Math. 78, 2025), where physically localized controls admit for special $\omega$ a degenerate but not finite-dimensional structure. For our proof, we study controllability properties of tailored convection equations governed by time-periodic degenerately forced Euler flows that provide a twofold geometric mechanism: transport of information through $\omega$ versus non-stationary mixing effects transferring energy from low-dimensional sources to higher frequencies. The temperature is then controlled by using Coron's return method, while the velocity is mainly driven by the buoyant force. When $\omega$ contains two cuts of the torus and a closed square of side-length $L$, our approach yields explicit control spaces for the velocity and temperature of dimensions $2 + 18\lceil2\pi/L\rceil^2 + 8\lceil2\pi/L\rceil^4$ and $2 + 8\lceil2\pi/L\rceil^2$, respectively.
comment: 49 pages, 2 figures
☆ An approach to control design for two-level quantum ensemble systems
Quantum ensemble systems arise in a variety of applications, including NMR spectroscopy and robust quantum control. While their theoretical properties have been extensively studied, relatively little attention has been given to the explicit construction of control inputs. In this paper, we address this gap by presenting a fully implementable control strategy for a one-parameter family of driftless two-level quantum systems. The proposed method is supported by rigorous analysis that guarantees accurate approximation of target distributions on SU(2). Convergence properties are established analytically, and numerical simulations are provided to demonstrate the effectiveness of the approach.
comment: Submitted to 64th IEEE Conference on Decision and Control
☆ On the computation of the cosine measure in high dimensions
In derivative-free optimization, the cosine measure is a value that often arises in the convergence analysis of direct search methods. Given the increasing interest in high-dimensional derivative-free optimization problems, it is valuable to compute the cosine measure in this setting; however, it has recently been shown to be NP-hard. We propose a new formulation of the problem and heuristic to tackle this problem in higher dimensions and compare it with existing algorithms in the literature. In addition, new results are presented to facilitate the construction of sets with specific cosine measures, allowing for the creation of a test-set to benchmark the algorithms with.
☆ Integrated Balanced and Staggered Routing in Autonomous Mobility-on-Demand Systems
Autonomous mobility-on-demand (AMoD) systems, centrally coordinated fleets of self-driving vehicles, offer a promising alternative to traditional ride-hailing by improving traffic flow and reducing operating costs. Centralized control in AMoD systems enables two complementary routing strategies: balanced routing, which distributes traffic across alternative routes to ease congestion, and staggered routing, which delays departures to smooth peak demand over time. In this work, we introduce a unified framework that jointly optimizes both route choices and departure times to minimize system travel times. We formulate the problem as an optimization model and show that our congestion model yields an unbiased estimate of travel times derived from a discretized version of Vickrey's bottleneck model. To solve large-scale instances, we develop a custom metaheuristic based on a large neighborhood search framework. We assess our method through a case study on the Manhattan street network using real-world taxi data. In a setting with exclusively centrally controlled AMoD vehicles, our approach reduces total traffic delay by up to 25 percent and mitigates network congestion by up to 35 percent compared to selfish routing. We also consider mixed-traffic settings with both AMoD and conventional vehicles, comparing a welfare-oriented operator that minimizes total system travel time with a profit-oriented one that optimizes only the fleet's travel time. Independent of the operator's objective, the analysis reveals a win-win outcome: across all control levels, both autonomous and non-autonomous traffic benefit from the implementation of balancing and staggering strategies.
☆ Toward Decision-Oriented Prognostics: An Integrated Estimate-Optimize Framework for Predictive Maintenance
Recent research increasingly integrates machine learning (ML) into predictive maintenance (PdM) to reduce operational and maintenance costs in data-rich operational settings. However, uncertainty due to model misspecification continues to limit widespread industrial adoption. This paper proposes a PdM framework in which sensor-driven prognostics inform decision-making under economic trade-offs within a finite decision space. We investigate two key questions: (1) Does higher predictive accuracy necessarily lead to better maintenance decisions? (2) If not, how can the impact of prediction errors on downstream maintenance decisions be mitigated? We first demonstrate that in the traditional estimate-then-optimize (ETO) framework, errors in probabilistic prediction can result in inconsistent and suboptimal maintenance decisions. To address this, we propose an integrated estimate-optimize (IEO) framework that jointly tunes predictive models while directly optimizing for maintenance outcomes. We establish theoretical finite-sample guarantees on decision consistency under standard assumptions. Specifically, we develop a stochastic perturbation gradient descent algorithm suitable for small run-to-failure datasets. Empirical evaluations on a turbofan maintenance case study show that the IEO framework reduces average maintenance regret up to 22% compared to ETO. This study provides a principled approach to managing prediction errors in data-driven PdM. By aligning prognostic model training with maintenance objectives, the IEO framework improves robustness under model misspecification and improves decision quality. The improvement is particularly pronounced when the decision-making policy is misaligned with the decision-maker's target. These findings support more reliable maintenance planning in uncertain operational environments.
comment: 22 pages, 5 figures, 4 tables
☆ Finite-Horizon Strategy in Infinite-Horizon Linear-Quadratic Discrete-Time Dynamic Games
This paper explores a finite-horizon strategy, ``watching $T$ steps into the future and moving one step now,'' in an $N$-person infinite-horizon discrete-time linear-quadratic dynamic game. The game involves linear input/output/state dynamics and quadratic cost functions with heterogeneous discount factors. For the finite-horizon version, which forms the basis of the infinite-horizon game, we analyze the structure of the coupled generalized discrete Riccati difference equations related to the feedback Nash equilibrium (FNE) and derive a sufficient condition for the uniqueness of the finite-horizon FNE. Under this condition, the FNE can be efficiently computed via the proposed algorithm. In the infinite-horizon game, assume all players adopt this finite-horizon strategy. If the iterations of the coupled equations related to the FNE converge, and the invertibility and stability conditions hold, we prove the convergence of each player's total cost under the finite-horizon strategy, even when players use individual prediction horizons. Furthermore, we provide an explicit upper bound on the cost difference between the finite-horizon strategy and the infinite-horizon FNE associated with the limiting matrices, expressed via the distance between their feedback strategy matrices. This bound vanishes as $T$ tends to infinity, implying convergence to the infinite-horizon FNE cost. A non-scalar numerical example illustrates the convergence behavior.
comment: 10 pages, 2 figures
☆ ADDQ: Adaptive Distributional Double Q-Learning
Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.
☆ A Stochastic Electric Vehicle Routing Problem under Uncertain Energy Consumption
The increasing adoption of Electric Vehicles (EVs) for service and goods distribution operations has led to the emergence of Electric Vehicle Routing Problems (EVRPs), a class of vehicle routing problems addressing the unique challenges posed by the limited driving range and recharging needs of EVs. While the majority of EVRP variants have considered deterministic energy consumption, this paper focuses on the Stochastic Electric Vehicle Routing Problem with a Threshold recourse policy (SEVRP-T), where the uncertainty in energy consumption is considered, and a recourse policy is employed to ensure that EVs recharge at Charging Stations (CSs) whenever their State of Charge (SoC) falls below a specified threshold. We formulate the SEVRP-T as a two-stage stochastic mixed-integer second-order cone model, where the first stage determines the sequences of customers to be visited, and the second stage incorporates charging activities. The objective is to minimize the expected total duration of the routes, composed by travel times and recharging operations. To cope with the computational complexity of the model, we propose a heuristic based on an Iterated Local Search (ILS) procedure coupled with a Set Partitioning problem. To further speed up the heuristic, we develop two lower bounds on the corresponding first-stage customer sequences. Furthermore, to handle a large number of energy consumption scenarios, we employ a scenario reduction technique. Extensive computational experiments are conducted to validate the effectiveness of the proposed solution strategy and to assess the importance of considering the stochastic nature of the energy consumption. The research presented in this paper contributes to the growing body of literature on EVRP and provides insights into managing the operational deployment of EVs in logistics activities under uncertainty.
☆ PDE methods for extracting normal vector fields and distance functions of shapes
Partial differential equations can be used for extracting geometric features of shapes. This article summarizes recent methods to extract the normal vector field from an elliptic equation proposed by Yamada and from the heat equation, and also a method to extract the (signed) distance function from an elliptic equation that generalizes Varadhan's in 1967.
☆ Poset-Markov Channels: Capacity via Group Symmetry
Computing channel capacity is in general intractable because it is given by the limit of a sequence of optimization problems whose dimensionality grows to infinity. As a result, constant-sized characterizations of feedback or non-feedback capacity are known for only a few classes of channels with memory. This paper introduces poset-causal channels$\unicode{x2014}$a new formalism of a communication channel in which channel inputs and outputs are indexed by the elements of a partially ordered set (poset). We develop a novel methodology that allows us to establish a single-letter upper bound on the feedback capacity of a subclass of poset-causal channels whose memory structure exhibits a Markov property and symmetry. The methodology is based on symmetry reduction in optimization. We instantiate our method on two channel models: the Noisy Output is The STate (NOST) channel$\unicode{x2014}$for which the bound is tight$\unicode{x2014}$and a new two-dimensional extension of it.
comment: 44 pages, 11 figures
☆ Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control
We consider a Bayesian diffusion control problem of expected terminal utility maximization. The controller imposes a prior distribution on the unknown drift of an underlying diffusion. The Bayesian optimal control, tracking the posterior distribution of the unknown drift, can be characterized explicitly. However, in practice, the prior will generally be incorrectly specified, and the degree of model misspecification can have a significant impact on policy performance. To mitigate this and reduce overpessimism, we introduce a distributionally robust Bayesian control (DRBC) formulation in which the controller plays a game against an adversary who selects a prior in divergence neighborhood of a baseline prior. The adversarial approach has been studied in economics and efficient algorithms have been proposed in static optimization settings. We develop a strong duality result for our DRBC formulation. Combining these results together with tools from stochastic analysis, we are able to derive a loss that can be efficiently trained (as we demonstrate in our numerical experiments) using a suitable neural network architecture. As a result, we obtain an effective algorithm for computing the DRBC optimal strategy. The methodology for computing the DRBC optimal strategy is greatly simplified, as we show, in the important case in which the adversary chooses a prior from a Kullback-Leibler distributional uncertainty set.
♻ ☆ Two-echelon Electric Vehicle Routing Problem in Parcel Delivery: A Literature Review
Multi-echelon parcel delivery systems using electric vehicles (EVs) are crucial for managing urban logistics complexity and promoting sustainability. In multi-echelon systems, particularly within two-stage systems, larger vehicles transport parcels from a central depot to satellite hubs, where smaller EVs pick up the parcels and carry out last-mile deliveries. This system could increase efficiency, reduce emissions, and improve service reliability. The two-echelon electric vehicle routing problem (2E-EVRP), an extension of the traditional two-echelon vehicle routing problem (2E-VRP), addresses EV-specific challenges such as battery constraints and recharging stations to tackle environmental impacts, urban congestion, and e-commerce demands. While effectively reducing costs, energy use, and emissions, the 2E-EVRP faces modeling challenges due to multi-echelon structures, EV limitations, and recharging station selection. This paper systematically reviews 2E-EVRP literature, analyzing key studies. It proposes a classification scheme to categorize the papers based on the problem variants, objectives, constraints, and solution methods. It identifies gaps such as delivery tardiness, environmental trade-offs, multi-objective optimization, multiple depots, split deliveries, and time-dependent travel conditions. Future research directions include aligning models with urban policies, integrating parcel lockers, enabling same-day delivery, and incorporating advanced technologies like autonomous vehicles. Methodological advancements suggest using machine learning, reinforcement learning, and simulation-based approaches to enhance dynamic routing and real-time decision-making. These directions aim to expand the 2E-EVRP applicability, addressing theoretical and practical challenges in sustainable urban logistics for future works.
♻ ☆ On the Popov-Belevitch-Hautus tests for functional observability and output controllability
Functional observability and output controllability are properties that establish the conditions for the partial estimation and partial control of the system state, respectively. In the special case of full-state observability and controllability, the Popov-Belevitch-Hautus (PBH) tests provide conditions for the properties to hold based on the system eigenspace. Generalizations of the PBH test have been recently proposed for functional observability and output controllability, but thus far have only been proven valid for diagonalizable systems. Here, we rigorously establish the generalized PBH test for functional observability, extending its validity to a broader class of systems using Jordan decomposition. Likewise, we determine the class of systems under which the generalized PBH test is sufficient and necessary for output controllability. These results have immediate implications for observer and controller design, pole assignment, and optimal placement of sensors and drivers.
♻ ☆ Duality between controllability and observability for target control and estimation in networks
Output controllability and functional observability are properties that enable, respectively, the control and estimation of part of the state vector. These notions are of utmost importance in applications to high-dimensional systems, such as large-scale networks, in which only a target subset of variables (nodes) is sought to be controlled or estimated. Although the duality between full-state controllability and observability is well established, the characterization of the duality between their generalized counterparts remains an outstanding problem. Here, we establish both the weak and the strong duality between output controllability and functional observability. Specifically, we show that functional observability of a system implies output controllability of a dual system (weak duality), and that under a certain geometric condition the converse holds (strong duality). As an application of the strong duality, we derive a necessary and sufficient condition for target control via static feedback. This allow us to establish a separation principle between the design of target controllers and the design of functional observers in closed-loop systems. These results generalize the classical duality and separation principles in modern control theory.
comment: Code available at our GitHub repository: https://github.com/montanariarthur/TargetCtrb
♻ ☆ A Robust Twin Parametric Margin Support Vector Machine for Multiclass Classification
In this paper, we introduce novel Twin Parametric Margin Support Vector Machine (TPMSVM) models designed to address multiclass classification tasks under feature uncertainty. To handle data perturbations, we construct bounded-by-norm uncertainty set around each training observation and derive the robust counterparts of the deterministic models using robust optimization techniques. To capture complex data structure, we explore both linear and kernel-induced classifiers, providing computationally tractable reformulations of the resulting robust models. Additionally, we propose two alternatives for the final decision function, enhancing models' flexibility. Finally, we validate the effectiveness of the proposed robust multiclass TPMSVM methodology on real-world datasets, showing the good performance of the approach in the presence of uncertainty.
♻ ☆ Semantic Communication in Multi-team Dynamic Games: A Mean Field Perspective
Coordinating communication and control is a key component in the stability and performance of networked multi-agent systems. While single user networked control systems have gained a lot of attention within this domain, in this work, we address the more challenging problem of large population multi-team dynamic games. In particular, each team constitutes two decision makers (namely, the sensor and the controller) who coordinate over a shared network to control a dynamically evolving state of interest under costs on both actuation and sensing/communication. Due to the shared nature of the wireless channel, the overall cost of each team depends on other teams' policies, thereby leading to a noncooperative game setup. Due to the presence of a large number of teams, we compute approximate decentralized Nash equilibrium policies for each team using the paradigm of (extended) mean-field games, which is governed by (1) the mean traffic flowing over the channel, and (2) the value of information at the sensor, which highlights the semantic nature of the ensuing communication. In the process, we compute optimal controller policies and approximately optimal sensor policies for each representative team of the mean-field system to alleviate the problem of general non-contractivity of the mean-field fixed point operator associated with the finite cardinality of the sensor action space. Consequently, we also prove the $\epsilon$--Nash property of the mean-field equilibrium solution which essentially characterizes how well the solution derived using mean-field analysis performs on the finite-team system. Finally, we provide extensive numerical simulations, which corroborate the theoretical findings and lead to additional insights on the properties of the results presented.
comment: Submitted to IEEE for possible publication
♻ ☆ Algebraic Riccati Tensor Equations with Applications in Multilinear Control Systems
In a recent paper by Chen et al. [8], the authors initiated the control-theoretic study of a class of discrete-time multilinear time-invariant (MLTI) control systems, where system states, inputs, and outputs are all tensors endowed with the Einstein product. They established criteria for fundamental system-theoretic notions such as stability, reachability, and observability through tensor decomposition. Building on this new research direction, the purpose of our paper is to extend the study to continuous-time MLTI control systems. Specifically, we define Hamiltonian tensors and symplectic tensors, and we establish the Schur-Hamiltonian tensor decomposition and the symplectic tensor singular value decomposition (SVD). Based on these concepts, we propose the algebraic Riccati tensor equation (ARTE) and demonstrate that it has a unique positive semidefinite solution if the system is stabilizable and detectable. To find numerical solutions to the ARTE, we introduce a tensor-based Newton method. Additionally, we establish the tensor versions of the bounded real lemma and the small gain theorem. A first-order robustness analysis of the ARTE is also conducted. Finally, we provide a numerical example to illustrate the proposed theory and algorithms.
comment: 28 pages, 7 figures
♻ ☆ Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks
We present a fully constructive analysis of deep ReLU neural networks for classification and function approximation tasks. First, we prove that any dataset with $N$ distinct points in $\mathbb{R}^d$ and $M$ output classes can be exactly classified using a multilayer perceptron (MLP) of width $2$ and depth at most $2N + 4M - 1$, with all network parameters constructed explicitly. This result is sharp with respect to width and is interpreted through the lens of simultaneous or ensemble controllability in discrete nonlinear dynamics. Second, we show that these explicit constructions yield uniform bounds on the parameter norms and, in particular, provide upper estimates for minimizers of standard regularized training loss functionals in supervised learning. As the regularization parameter vanishes, the trained networks converge to exact classifiers with bounded norm, explaining the effectiveness of overparameterized training in the small-regularization regime. We also prove a universal approximation theorem in $L^p(\Omega; \mathbb{R}_+)$ for any bounded domain $\Omega \subset \mathbb{R}^d$ and $p \in [1, \infty)$, using MLPs of fixed width $d + 1$. The proof is constructive, geometrically motivated, and provides explicit estimates on the network depth when the target function belongs to the Sobolev space $W^{1,p}$. We also extend the approximation and depth estimation results to $L^p(\Omega; \mathbb{R}^m)$ for any $m \geq 1$. Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.
♻ ☆ Learning Decision-Focused Uncertainty Sets in Robust Optimization
We propose a data-driven technique to automatically learn contextual uncertainty sets in robust optimization, resulting in excellent worst-case and average-case performance while also guaranteeing constraint satisfaction. Our method reshapes the uncertainty sets by minimizing the expected performance across a contextual family of problems, subject to conditional-value-at-risk constraints. Our approach is very flexible, and can learn a wide variety of uncertainty sets while preserving tractability. We solve the constrained learning problem using a stochastic augmented Lagrangian method that relies on differentiating the solutions of the robust optimization problems with respect to the parameters of the uncertainty set. Due to the nonsmooth and nonconvex nature of the augmented Lagrangian function, we apply the nonsmooth conservative implicit function theorem to establish convergence to a critical point, which is a feasible solution of the constrained problem under mild assumptions. Using empirical process theory, we show finite-sample probabilistic guarantees of constraint satisfaction for the resulting solutions. Numerical experiments show that our method outperforms traditional approaches in robust and distributionally robust optimization in terms of out-of-sample performance and constraint satisfaction guarantees.
♻ ☆ On strict proto-differentiability of set-valued mappings
We will show that a multifunction is strictly proto-differentiable at a point of its graph if and only if it is graphically strictly differentiable, i.e., the graph of the multifunction locally coincides, up to a change of coordinates, with the graph of a single-valued mapping, which is strictly differentiable at the transformed reference point. This result allows point-based characterizations of strict proto-differentiability in terms of various generalized derivatives. Further we will prove that under strict proto-differentiability the properties of strong metric regularity, metric regularity and strong metric subregularity are equivalent. Finally, under strict proto-differentiability of the subgradient mapping, we provide a novel second-order relation between function values and subgradients for prox-regular functions which constitutes a nonsmooth extension of the trapezoidal rule of numerical integration.
♻ ☆ Interpolating between Optimal Transport and KL regularized Optimal Transport using Rényi Divergences
Regularized optimal transport (OT) has received much attention in recent years starting from Cuturi's introduction of Kullback-Leibler (KL) divergence regularized OT. In this paper, we propose regularizing the OT problem using the family of $\alpha$-R\'enyi divergences for $\alpha \in (0, 1)$. R\'enyi divergences are neither $f$-divergences nor Bregman distances, but they recover the KL divergence in the limit $\alpha \nearrow 1$. The advantage of introducing the additional parameter $\alpha$ is that for $\alpha \searrow 0$ we obtain convergence to the unregularized OT problem. For the KL regularized OT problem, this was achieved by letting the regularization parameter $\varepsilon$ tend to zero, which causes numerical instabilities. We present two different ways to obtain premetrics on probability measures, namely by R\'enyi divergence constraints and by penalization. The latter premetric interpolates between the unregularized and the KL regularized OT problem with weak convergence of the unique minimizer, generalizing the interpolation property of KL regularized OT. We use a nested mirror descent algorithm to solve the primal formulation. Both on real and synthetic data sets R\'enyi regularized OT plans outperform their KL and Tsallis counterparts in terms of being closer to the unregularized transport plans and recovering the ground truth in inference tasks better.
comment: 40 pages, 9 figures, 3 tables, comments welcome
♻ ☆ PSD and SOS Biquadratic Polynomials
Hilbert indicated that a positive semi-definite (psd) homogeneous quartic polynomial of four variables may not be sum of squares (sos). {Since a} $2 \times 2$ biquadratic polynomial is a homogeneous quartic polynomial of four variables{, this raises a fundamental question:} Is a psd $2 \times 2$ biquadratic polynomial sos? In this paper, we present a necessary and sufficient condition for a $2 \times 2$ psd biquadratic polynomial to be sos. We show that if a $2 \times 2$ {psd} biquadratic polynomial is sos, then it is sos of tetranomials, its sos rank is at most $4$, and the coefficients of the tetranomials can be orthogonal to each other. We prove that every psd $2\times 2$ biquadratic polynomial admits an equivalent representation with two neighbor half-cross terms and one full-cross term. Furthermore, any $2 \times 2$ psd biquadratic polynomial is a sum of squares of binomials (sosb) when it contains at most one half-cross term, and a sum of squares of trinomials (sostri) when it contains exactly two neighbor half-cross terms and no full-cross term.
♻ ☆ Simultaneous recovery of a corroded boundary and admittance using the Kohn-Vogelius method
We address the problem of identifying an unknown portion $\Gamma$ of the boundary of a $d$-dimensional ($d \in \{1, 2\}$) domain $\Omega$ and its associated Robin admittance coefficient, using two sets of boundary Cauchy data $(f, g)$--representing boundary temperature and heat flux--measured on the accessible portion $\Sigma$ of the boundary. Identifiability results \cite{Bacchelli2009,PaganiPierotti2009} indicate that a single measurement on $\Sigma$ is insufficient to uniquely determine both $\Gamma$ and $\alpha$, but two independent inputs yielding distinct solutions ensure the uniqueness of the pair $\Gamma$ and $\alpha$. In this paper, we propose a cost function based on the energy-gap of two auxiliary problems. We derive the variational derivatives of this objective functional with respect to both the Robin boundary $\Gamma$ and the admittance coefficient $\alpha$. These derivatives are utilized to develop a nonlinear gradient-based iterative scheme for the simultaneous numerical reconstruction of $\Gamma$ and $\alpha$. Numerical experiments are presented to demonstrate the effectiveness and practicality of the proposed method.
comment: in \textit{Computational Methods for Inverse Problems and Applications}, ICMDS 2024, Khouribga, Morocco, October 21--22, 2024, Springer Proc. Math. Stat., vol. 498, Springer, Cham, 2025
♻ ☆ Nonlinear optimals and their role in sustaining turbulence in channel flow
We investigate the energy transfer from the mean profile to velocity fluctuations in channel flow by calculating nonlinear optimal disturbances,i.e. the initial condition of a given finite energy that achieves the highest possible energy growth during a given fixed time horizon. It is found that for a large range of time horizons and initial disturbance energies, the nonlinear optimal exhibits streak spacing and amplitude consistent with DNS at least at Re_tau = 180, which suggests that they isolate the relevant physical mechanisms that sustain turbulence. Moreover, the time horizon necessary for a nonlinear disturbance to outperform a linear optimal is consistent with previous DNS-based estimates using eddy turnover time, which offers a new perspective on how some turbulent time scales are determined.
♻ ☆ Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment
This paper optimizes the configuration of large-scale data centers toward cost-effective, reliable and sustainable cloud supply chains. The problem involves placing incoming racks of servers within a data center to maximize demand coverage given space, power and cooling restrictions. We formulate an online integer optimization model to support rack placement decisions. We propose a tractable online sampling optimization (OSO) approach to multi-stage stochastic optimization, which approximates unknown parameters with a sample path and re-optimizes decisions dynamically. We prove that OSO achieves a strong competitive ratio in canonical online resource allocation problems and sublinear regret in the online batched bin packing problem. Theoretical and computational results show it can outperform mean-based certainty-equivalent resolving heuristics. Our algorithm has been packaged into a software solution deployed across Microsoft's data centers, contributing an interactive decision-making process at the human-machine interface. Using deployment data, econometric tests suggest that adoption of the solution has a negative and statistically significant impact on power stranding, estimated at 1-3 percentage point. At the scale of cloud computing, these improvements in data center performance result in significant cost savings and environmental benefits.
comment: 72 pages
♻ ☆ Iterative Minimax Games with Coupled Linear Constraints
The study of nonconvex minimax games has gained significant momentum in machine learning and decision science communities due to their fundamental connections to adversarial training scenarios. This work develops a primal-dual alternating proximal gradient (PDAPG) algorithm framework for resolving iterative minimax games featuring nonsmooth nonconvex objectives subject to coupled linear constraints. We establish rigorous convergence guarantees for both nonconvex-strongly concave and nonconvex-concave game configurations, demonstrating that PDAPG achieves an $\varepsilon$-stationary solution within $\mathcal{O}\left( \varepsilon ^{-2} \right)$ iterations for strongly concave settings and $\mathcal{O}\left( \varepsilon ^{-4} \right)$ iterations for concave scenarios. Our analysis provides the first known iteration complexity bounds for this class of constrained minimax games, particularly addressing the critical challenge of coupled linear constraints that induce inherent interdependencies among strategy variables. The proposed game-theoretic framework advances existing solution methodologies by simultaneously handling nonsmooth components and coordinated constraint structures through alternating primal-dual updates.
Robotics 15
☆ Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field
Relative localization in the near-field scenario is critically important for unmanned vehicle (UxV) applications. Although related works addressing 2D relative localization problem have been widely studied for unmanned ground vehicles (UGVs), the problem in 3D scenarios for unmanned aerial vehicles (UAVs) involves more uncertainties and remains to be investigated. Inspired by the phenomenon that animals can achieve swarm behaviors solely based on individual perception of relative information, this study proposes an infrastructure-free 3D relative localization framework that relies exclusively on onboard ultra-wideband (UWB) sensors. Leveraging 2D relative positioning research, we conducted feasibility analysis, system modeling, simulations, performance evaluation, and field tests using UWB sensors. The key contributions of this work include: derivation of the Cram\'er-Rao lower bound (CRLB) and geometric dilution of precision (GDOP) for near-field scenarios; development of two localization algorithms -- one based on Euclidean distance matrix (EDM) and another employing maximum likelihood estimation (MLE); comprehensive performance comparison and computational complexity analysis against state-of-the-art methods; simulation studies and field experiments; a novel sensor deployment strategy inspired by animal behavior, enabling single-sensor implementation within the proposed framework for UxV applications. The theoretical, simulation, and experimental results demonstrate strong generalizability to other 3D near-field localization tasks, with significant potential for a cost-effective cross-platform UxV collaborative system.
☆ Situated Haptic Interaction: Exploring the Role of Context in Affective Perception of Robotic Touch
Affective interaction is not merely about recognizing emotions; it is an embodied, situated process shaped by context and co-created through interaction. In affective computing, the role of haptic feedback within dynamic emotional exchanges remains underexplored. This study investigates how situational emotional cues influence the perception and interpretation of haptic signals given by a robot. In a controlled experiment, 32 participants watched video scenarios in which a robot experienced either positive actions (such as being kissed), negative actions (such as being slapped) or neutral actions. After each video, the robot conveyed its emotional response through haptic communication, delivered via a wearable vibration sleeve worn by the participant. Participants rated the robot's emotional state-its valence (positive or negative) and arousal (intensity)-based on the video, the haptic feedback, and the combination of the two. The study reveals a dynamic interplay between visual context and touch. Participants' interpretation of haptic feedback was strongly shaped by the emotional context of the video, with visual context often overriding the perceived valence of the haptic signal. Negative haptic cues amplified the perceived valence of the interaction, while positive cues softened it. Furthermore, haptics override the participants' perception of arousal of the video. Together, these results offer insights into how situated haptic feedback can enrich affective human-robot interaction, pointing toward more nuanced and embodied approaches to emotional communication with machines.
☆ CUPID: Curating Data your Robot Loves with Influence Functions
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy's expected return. This enables ranking and selection of demonstrations according to their impact on the policy's closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Additional materials are made available at: https://cupid-curation.github.io.
comment: Project page: https://cupid-curation.github.io. 28 pages, 15 figures
☆ Analysis and experiments of the dissipative Twistcar: direction reversal and asymptotic approximations
Underactuated wheeled vehicles are commonly studied as nonholonomic systems with periodic actuation. Twistcar is a classical example inspired by a riding toy, which has been analyzed using a planar model of a dynamical system with nonholonomic constraints. Most of the previous analyses did not account for energy dissipation due to friction. In this work, we study a theoretical two-link model of the Twistcar while incorporating dissipation due to rolling resistance. We obtain asymptotic expressions for the system's small-amplitude steady-state periodic dynamics, which reveals the possibility of reversing the direction of motion upon varying the geometric and mass properties of the vehicle. Next, we design and construct a robotic prototype of the Twistcar whose center-of-mass position can be shifted by adding and removing a massive block, enabling demonstration of the Twistcar's direction reversal phenomenon. We also conduct parameter fitting for the frictional resistance in order to improve agreement with experiments.
☆ Multimodal Anomaly Detection with a Mixture-of-Experts IROS 2025
With a growing number of robots being deployed across diverse applications, robust multimodal anomaly detection becomes increasingly important. In robotic manipulation, failures typically arise from (1) robot-driven anomalies due to an insufficient task model or hardware limitations, and (2) environment-driven anomalies caused by dynamic environmental changes or external interferences. Conventional anomaly detection methods focus either on the first by low-level statistical modeling of proprioceptive signals or the second by deep learning-based visual environment observation, each with different computational and training data requirements. To effectively capture anomalies from both sources, we propose a mixture-of-experts framework that integrates the complementary detection mechanisms with a visual-language model for environment monitoring and a Gaussian-mixture regression-based detector for tracking deviations in interaction forces and robot motions. We introduce a confidence-based fusion mechanism that dynamically selects the most reliable detector for each situation. We evaluate our approach on both household and industrial tasks using two robotic systems, demonstrating a 60% reduction in detection delay while improving frame-wise anomaly detection performance compared to individual detectors.
comment: 8 pages, 5 figures, 1 table, the paper has been accepted for publication in the Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
☆ Faster Motion Planning via Restarts
Randomized methods such as PRM and RRT are widely used in motion planning. However, in some cases, their running-time suffers from inherent instability, leading to ``catastrophic'' performance even for relatively simple instances. We apply stochastic restart techniques, some of them new, for speeding up Las Vegas algorithms, that provide dramatic speedups in practice (a factor of $3$ [or larger] in many cases). Our experiments demonstrate that the new algorithms have faster runtimes, shorter paths, and greater gains from multi-threading (when compared with straightforward parallel implementation). We prove the optimality of the new variants. Our implementation is open source, available on github, and is easy to deploy and use.
comment: arXiv admin note: text overlap with arXiv:2503.04633
MinD: Unified Visual Imagination and Control via Hierarchical World Models
Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate in Dream (MinD), a hierarchical diffusion-based world model framework that employs a dual-system design for vision-language manipulation. MinD executes VGM at low frequencies to extract video prediction features, while leveraging a high-frequency diffusion policy for real-time interaction. This architecture enables low-latency, closed-loop control in manipulation with coherent visual guidance. To better coordinate the two systems, we introduce a video-action diffusion matching module (DiffMatcher), with a novel co-training strategy that uses separate schedulers for each diffusion model. Specifically, we introduce a diffusion-forcing mechanism to DiffMatcher that aligns their intermediate representations during training, helping the fast action model better understand video-based predictions. Beyond manipulation, MinD also functions as a world simulator, reliably predicting task success or failure in latent space before execution. Trustworthy analysis further shows that VGMs can preemptively evaluate task feasibility and mitigate risks. Extensive experiments across multiple benchmarks demonstrate that MinD achieves state-of-the-art manipulation (63%+) in RL-Bench, advancing the frontier of unified world modeling in robotics.
♻ ☆ Stochastic Motion Planning as Gaussian Variational Inference: Theory and Algorithms
We present a novel formulation for motion planning under uncertainties based on variational inference where the optimal motion plan is modeled as a posterior distribution. We propose a Gaussian variational inference-based framework, termed Gaussian Variational Inference Motion Planning (GVI-MP), to approximate this posterior by a Gaussian distribution over the trajectories. We show that the GVI-MP framework is dual to a special class of stochastic control problems and brings robustness into the decision-making in motion planning. We develop two algorithms to numerically solve this variational inference and the equivalent control formulations for motion planning. The first algorithm uses a natural gradient paradigm to iteratively update a Gaussian proposal distribution on the sparse motion planning factor graph. We propose a second algorithm, the Proximal Covariance Steering Motion Planner (PCS-MP), to solve the same inference problem in its stochastic control form with an additional terminal constraint. We leverage a proximal gradient paradigm where, at each iteration, we quadratically approximate nonlinear state costs and solve a linear covariance steering problem in closed form. The efficacy of the proposed algorithms is demonstrated through extensive experiments on various robot models. An implementation is provided in https://github.com/hzyu17/VIMP.
comment: 20 pages
♻ ☆ Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients. [https://sites.google.com/seas.upenn.edu/learning-rom]
♻ ☆ Experimental Setup and Software Pipeline to Evaluate Optimization based Autonomous Multi-Robot Search Algorithms
Signal source localization has been a problem of interest in the multi-robot systems domain given its applications in search & rescue and hazard localization in various industrial and outdoor settings. A variety of multi-robot search algorithms exist that usually formulate and solve the associated autonomous motion planning problem as a heuristic model-free or belief model-based optimization process. Most of these algorithms however remains tested only in simulation, thereby losing the opportunity to generate knowledge about how such algorithms would compare/contrast in a real physical setting in terms of search performance and real-time computing performance. To address this gap, this paper presents a new lab-scale physical setup and associated open-source software pipeline to evaluate and benchmark multi-robot search algorithms. The presented physical setup innovatively uses an acoustic source (that is safe and inexpensive) and small ground robots (e-pucks) operating in a standard motion-capture environment. This setup can be easily recreated and used by most robotics researchers. The acoustic source also presents interesting uncertainty in terms of its noise-to-signal ratio, which is useful to assess sim-to-real gaps. The overall software pipeline is designed to readily interface with any multi-robot search algorithm with minimal effort and is executable in parallel asynchronous form. This pipeline includes a framework for distributed implementation of multi-robot or swarm search algorithms, integrated with a ROS (Robotics Operating System)-based software stack for motion capture supported localization. The utility of this novel setup is demonstrated by using it to evaluate two state-of-the-art multi-robot search algorithms, based on swarm optimization and batch-Bayesian Optimization (called Bayes-Swarm), as well as a random walk baseline.
comment: IDETC 2025
♻ ☆ cuVSLAM: CUDA accelerated visual odometry and mapping
Accurate and robust pose estimation is a key requirement for any autonomous robot. We present cuVSLAM, a state-of-the-art solution for visual simultaneous localization and mapping, which can operate with a variety of visual-inertial sensor suites, including multiple RGB and depth cameras, and inertial measurement units. cuVSLAM supports operation with as few as one RGB camera to as many as 32 cameras, in arbitrary geometric configurations, thus supporting a wide range of robotic setups. cuVSLAM is specifically optimized using CUDA to deploy in real-time applications with minimal computational overhead on edge-computing devices such as the NVIDIA Jetson. We present the design and implementation of cuVSLAM, example use cases, and empirical results on several state-of-the-art benchmarks demonstrating the best-in-class performance of cuVSLAM.
♻ ☆ Terrain-aware Low Altitude Path Planning
In this paper, we study the problem of generating low-altitude path plans for nap-of-the-earth (NOE) flight in real time with only RGB images from onboard cameras and the vehicle pose. We propose a novel training method that combines behavior cloning and self-supervised learning, where the self-supervision component allows the learned policy to refine the paths generated by the expert planner. Simulation studies show 24.7% reduction in average path elevation compared to the standard behavior cloning approach.
♻ ☆ Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition
Recent advances in robotics are driving real-world autonomy for long-term and large-scale missions, where loop closures via place recognition are vital for mitigating pose estimation drift. However, achieving real-time performance remains challenging for resource-constrained mobile robots and multi-robot systems due to the computational burden of high-density sampling, which increases the complexity of comparing and verifying query samples against a growing map database. Conventional methods often retain redundant information or miss critical data by relying on fixed sampling intervals or operating in 3-D space instead of the descriptor feature space. To address these challenges, we introduce the concept of sample space and propose a novel keyframe sampling approach for LiDAR-based place recognition. Our method minimizes redundancy while preserving essential information in the hyper-dimensional descriptor space, supporting both learning-based and handcrafted descriptors. The proposed approach incorporates a sliding window optimization strategy to ensure efficient keyframe selection and real-time performance, enabling seamless integration into robotic pipelines. In sum, our approach demonstrates robust performance across diverse datasets, with the ability to adapt seamlessly from indoor to outdoor scenarios without parameter tuning, reducing loop closure detection times and memory requirements.
comment: The work is no longer intended for consideration in its current form. Readers are instead encouraged to refer to our related and more complete study, arXiv:2501.01791, which should be considered as a stand-alone contribution
♻ ☆ Employing Laban Shape for Generating Emotionally and Functionally Expressive Trajectories in Robotic Manipulators
Successful human-robot collaboration depends on cohesive communication and a precise understanding of the robot's abilities, goals, and constraints. While robotic manipulators offer high precision, versatility, and productivity, they exhibit expressionless and monotonous motions that conceal the robot's intention, resulting in a lack of efficiency and transparency with humans. In this work, we use Laban notation, a dance annotation language, to enable robotic manipulators to generate trajectories with functional expressivity, where the robot uses nonverbal cues to communicate its abilities and the likelihood of succeeding at its task. We achieve this by introducing two novel variants of Hesitant expressive motion (Spoke-Like and Arc-Like). We also enhance the emotional expressivity of four existing emotive trajectories (Happy, Sad, Shy, and Angry) by augmenting Laban Effort usage with Laban Shape. The functionally expressive motions are validated via a human-subjects study, where participants equate both variants of Hesitant motion with reduced robot competency. The enhanced emotive trajectories are shown to be viewed as distinct emotions using the Valence-Arousal-Dominance (VAD) spectrum, corroborating the usage of Laban Shape.
comment: Accepted for presentation at the 2025 IEEE RO-MAN Conference
♻ ☆ Agile, Autonomous Spacecraft Constellations with Disruption Tolerant Networking to Monitor Precipitation and Urban Floods
Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena. We demonstrate a ground-based and onboard algorithmic framework that combines orbital mechanics, attitude control, inter-satellite communication, intelligent prediction and planning to schedule the time-varying, re-orientation of agile, small satellites in a constellation. Planner intelligence is improved by updating the predictive value of future space-time observations based on shared observations of evolving episodic precipitation and urban flood forecasts. Reliable inter-satellite communication within a fast, dynamic constellation topology is modeled in the physical, access control and network layer. We apply the framework on a representative 24-satellite constellation observing 5 global regions. Results show appropriately low latency in information exchange (average within 1/3rd available time for implicit consensus), enabling the onboard scheduler to observe ~7% more flood magnitude than a ground-based implementation. Both onboard and offline versions performed ~98% better than constellations without agility.
Systems and Control 36
☆ Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field
Relative localization in the near-field scenario is critically important for unmanned vehicle (UxV) applications. Although related works addressing 2D relative localization problem have been widely studied for unmanned ground vehicles (UGVs), the problem in 3D scenarios for unmanned aerial vehicles (UAVs) involves more uncertainties and remains to be investigated. Inspired by the phenomenon that animals can achieve swarm behaviors solely based on individual perception of relative information, this study proposes an infrastructure-free 3D relative localization framework that relies exclusively on onboard ultra-wideband (UWB) sensors. Leveraging 2D relative positioning research, we conducted feasibility analysis, system modeling, simulations, performance evaluation, and field tests using UWB sensors. The key contributions of this work include: derivation of the Cram\'er-Rao lower bound (CRLB) and geometric dilution of precision (GDOP) for near-field scenarios; development of two localization algorithms -- one based on Euclidean distance matrix (EDM) and another employing maximum likelihood estimation (MLE); comprehensive performance comparison and computational complexity analysis against state-of-the-art methods; simulation studies and field experiments; a novel sensor deployment strategy inspired by animal behavior, enabling single-sensor implementation within the proposed framework for UxV applications. The theoretical, simulation, and experimental results demonstrate strong generalizability to other 3D near-field localization tasks, with significant potential for a cost-effective cross-platform UxV collaborative system.
☆ Simulation of a closed-loop dc-dc converter using a physics-informed neural network-based model
The growing reliance on power electronics introduces new challenges requiring detailed time-domain analyses with fast and accurate circuit simulation tools. Currently, commercial time-domain simulation software are mainly relying on physics-based methods to simulate power electronics. Recent work showed that data-driven and physics-informed learning methods can increase simulation speed with limited compromise on accuracy, but many challenges remain before deployment in commercial tools can be possible. In this paper, we propose a physics-informed bidirectional long-short term memory neural network (BiLSTM-PINN) model to simulate the time-domain response of a closed-loop dc-dc boost converter for various operating points, parameters, and perturbations. A physics-informed fully-connected neural network (FCNN) and a BiLSTM are also trained to establish a comparison. The three methods are then compared using step-response tests to assess their performance and limitations in terms of accuracy. The results show that the BiLSTM-PINN and BiLSTM models outperform the FCNN model by more than 9 and 4.5 times, respectively, in terms of median RMSE. Their standard deviation values are more than 2.6 and 1.7 smaller than the FCNN's, making them also more consistent. Those results illustrate that the proposed BiLSTM-PINN is a potential alternative to other physics-based or data-driven methods for power electronics simulations.
comment: 8 pages, 6 figures, Paper submitted to the International Conference on Power Systems Transients (IPST2025) in Guadalajara, Mexico, June 8-12, 2025
☆ Model Reduction of Homogeneous Polynomial Dynamical Systems via Tensor Decomposition
Model reduction plays a critical role in system control, with established methods such as balanced truncation widely used for linear systems. However, extending these methods to nonlinear settings, particularly polynomial dynamical systems that are often used to model higher-order interactions in physics, biology, and ecology, remains a significant challenge. In this article, we develop a novel model reduction method for homogeneous polynomial dynamical systems (HPDSs) with linear input and output grounded in tensor decomposition. Leveraging the inherent tensor structure of HPDSs, we construct reduced models by extracting dominant mode subspaces via higher-order singular value decomposition. Notably, we establish that key system-theoretic properties, including stability, controllability, and observability, are preserved in the reduced model. We demonstrate the effectiveness of our method using numerical examples.
☆ AgenticControl: An Automated Control Design Framework Using Large Language Models
Traditional control system design, reliant on expert knowledge and precise models, struggles with complex, nonlinear, or uncertain dynamics. This paper introduces AgenticControl, a novel multi-agent framework that automates controller design using coordinated Large Language Model (LLM) agents. Through structured JSON communication, these agents handle tasks including controller selection, scenario design, parameter optimization, performance evaluation, and decision-making. Through an actor-critic optimization approach, the system iteratively improves performance while progressing through scenarios of increasing complexity to ensure robustness under nominal conditions, measurement noise, actuator disturbances, and parametric uncertainties. Key innovations include structured multi-agent collaboration, robust optimization mechanisms, and real-time adaptability via in-context learning. Validated across four diverse control systems, namely, DC Motor Position control, Ball and Beam, Inverted Pendulum, and Double Inverted Pendulum, the framework achieves competitive performance against classical methods. Its Full State Feedback solution closely matches Linear Quadratic Regulator (LQR) results, while the designed PID controller significantly outperforming MATLAB's PIDTuner, reducing PID tracking error by 55% through adaptive parameter exploration. A comparative study of five LLM models reveals distinct optimization profiles, with DeepSeek achieving the fastest convergence. This work demonstrates the potential of LLM-driven control design, paving the way for advanced techniques like model predictive control and reinforcement learning.
☆ Looking for Signs: Reasoning About FOBNNs Using SAT
First-Order Boolean Networks with Non-deterministic updates (FOBNN) compute a boolean transition graph representing the absence and presence of species over time. The utility of FOBNNs has been justified by their theoretical soundness with respect to the Euler simulation of the differential equations. However, we lack practical means to work with FOBNNs and an empirical evaluation of their properties. We present a sound and efficient reduction of the first-order FOBNN transition relation to a propositional logic formula. This makes it possible to use modern SAT solvers to reason on the full transition graph, even for large models. We use this encoding to assess the feasibility and efficiency of practical reasoning with FOBNNs. To do so, we focus on the computation of fixed points. We also compare the transition graphs obtained via FOBNNs to those computed by the classic boolean semantics of reaction networks. Overall, our encoding opens new directions for the analysis of FOBNNs and deepens the understanding of their relationship with reaction networks.
comment: Author version, accepted at CMSB25
☆ Optimal Design of Experiment for Electrochemical Parameter Identification of Li-ion Battery via Deep Reinforcement Learning
Accurate parameter estimation in electrochemical battery models is essential for monitoring and assessing the performance of lithium-ion batteries (LiBs). This paper presents a novel approach that combines deep reinforcement learning (DRL) with an optimal experimental design (OED) framework to identify key electrochemical parameters of LiB cell models. The proposed method utilizes the twin delayed deep deterministic policy gradient (TD3) algorithm to optimize input excitation, thereby increasing the sensitivity of the system response to electrochemical parameters. The performance of this DRL-based approach is evaluated against a nonlinear model predictive control (NMPC) method and conventional tests. Results indicate that the DRL-based method provides superior information content, reflected in higher Fisher information (FI) values and lower parameter estimation errors compared to the NMPC design and conventional test practices. Additionally, the DRL approach offers a substantial reduction in experimental time and computational resources.
☆ Model Reference Adaptive Control of Networked Systems with State and Input Delays
Adaptive control strategies have progressively advanced to accommodate increasingly uncertain, delayed, and interconnected systems. This paper addresses the model reference adaptive control (MRAC) of networked, heterogeneous, and unknown dynamical agents subject to both state and input delays. The objective is to ensure that all follower agents asymptotically track the trajectory of a stable leader system, despite system uncertainties and communication constraints. Two communication topologies are considered, full connectivity between each agent and the leader, and partial connectivity wherein agents rely on both neighboring peers and the leader. The agent-to-agent and agent-to-leader interactions are encoded using a Laplacian-like matrix and a diagonal model-weighting matrix, respectively. To compensate for the delays, a predictor-based control structure and an auxiliary dynamic system are proposed. The control framework includes distributed adaptive parameter laws derived via Lyapunov-based analysis, ensuring convergence of the augmented tracking error. Stability conditions are established through a carefully constructed Lyapunov Krasovskii functional, under minimal assumptions on connectivity and excitation. Numerical simulations of both network structures validate the proposed method, demonstrating that exact leader tracking is achieved under appropriately designed learning rates and initializations. This work lays a foundation for future studies on fault-resilient distributed adaptive control incorporating data-driven or reinforcement learning techniques.
comment: This is the extended version of http://doi.org/10.11591/ijece.v14i5.pp5055-5063
☆ Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP), where the transition kernel and reward functions are unknown to the seller. In each round, the seller determines an allocation and a payment for each bidder. Each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller's allocation policy. Unlike existing works that formulate the problem as a multi-armed bandit model or as an episodic MDP, where the environment resets to an initial state after each round or episode, our paper considers a more realistic and sophisticated setting in which the market continues to evolve without restarting. We first extend the Vickrey-Clarke-Groves (VCG) mechanism, which is known to be efficient, truthful, and individually rational for one-shot static auctions, to sequential auctions, thereby obtaining a dynamic VCG mechanism counterpart that preserves these desired properties. We then focus on the online setting and develop an online reinforcement learning algorithm for the seller to learn the underlying MDP model and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned online mechanism asymptotically converges to a dynamic mechanism that approximately satisfies efficiency, truthfulness, and individual rationality with arbitrarily high probability and achieves guaranteed performance in terms of various notions of regret.
comment: 16 pages
☆ Steering Conceptual Bias via Transformer Latent-Subspace Activation
This work examines whether activating latent subspaces in language models (LLMs) can steer scientific code generation toward a specific programming language. Five causal LLMs were first evaluated on scientific coding prompts to quantify their baseline bias among four programming languages. A static neuron-attribution method, perturbing the highest activated MLP weight for a C++ or CPP token, proved brittle and exhibited limited generalization across prompt styles and model scales. To address these limitations, a gradient-refined adaptive activation steering framework (G-ACT) was developed: per-prompt activation differences are clustered into a small set of steering directions, and lightweight per-layer probes are trained and refined online to select the appropriate steering vector. In LLaMA-3.2 3B, this approach reliably biases generation towards the CPP language by increasing the average probe classification accuracy by 15% and the early layers (0-6) improving the probe classification accuracy by 61.5% compared to the standard ACT framework. For LLaMA-3.3 70B, where attention-head signals become more diffuse, targeted injections at key layers still improve language selection. Although per-layer probing introduces a modest inference overhead, it remains practical by steering only a subset of layers and enables reproducible model behavior. These results demonstrate a scalable, interpretable and efficient mechanism for concept-level control for practical agentic systems.
☆ FICA: Faster Inner Convex Approximation of Chance Constrained Grid Dispatch with Decision-Coupled Uncertainty
This paper proposes a Faster Inner Convex Approximation (FICA) method for solving power system dispatch problems with Wasserstein distributionally robust joint chance constraints (WJCC) and incorporating the modelling of the automatic generation control factors. The problem studied belongs to the computationally challenging class of WJCC with left-hand-side uncertainty (LHS-WJCC). By exploiting the special one-dimensional structure (even if only partially present) of the problem, the proposed FICA incorporates a set of strong valid inequalities to accelerate the solution process. We prove that FICA achieves the same optimality as the well-known conditional value-at-risk (CVaR) inner convex approximation method. Our numerical experiments demonstrate that the proposed FICA can yield 40x computational speedup compared to CVaR, and can even reach up to 500x speedup when the optimisation horizon exceeds 16 time steps. This speedup is achieved when only 50% of constraints in a WJCC have the one-dimensional structure. The approximation quality is numerically verified to be the same as CVaR, and the quality gap is below 1% when compared to the computationally demanding exact reformulation of the LHS-WJCC in most cases. We also discuss the applications of FICA in optimisation problems from other domains that (partially) exhibit the one-dimensional structure.
comment: 10 pages, in review for IEEE Transactions on Power Systems
☆ Spectrum Opportunities for the Wireless Future: From Direct-to-Device Satellite Applications to 6G Cellular
For the next-generation wireless networks and beyond, both the upper mid-band (7 GHz-24 GHz) and terahertz (100 GHz-1 THz) spectra are gaining global attention from service providers, academic research groups, policy makers, and standards organizations. This article provides an in-depth analysis of recent regulatory rulings and spectrum preferences issued by international standard bodies such as the International Telecommunications Union and Federal Communications Commission as they seek to identify feasible bands for future wireless networks. In this paper, we present the promising spectrum allocations earmarked for 6G and beyond. We also provide exemplars that illuminate the passive service protections and spectrum feasibility for coexistence between terrestrial wireless networks and satellites and other non-terrestrial networks (NTN), and discuss key technical constraints that will challenge future spectrum use for the wireless industry. The findings highlight promising frequency bands while addressing regulatory and technological challenges for future wireless service deployment.
☆ Hybrid Single-Pulse and Sawyer-Tower Method for Accurate Transistor Loss Separation in High-Frequency High-Efficiency Power Converters
Accurate measurement of transistor parasitic capacitance and its associated energy losses is critical for evaluating device performance, particularly in high-frequency and high-efficiency power conversion systems. This paper proposes a hybrid single-pulse and Sawyer-Tower test method to analyse switching characteristics of field-effect transistors (FET), which not only eliminates overlap losses but also mitigates the effects of current backflow observed in traditional double-pulse testing. Through a precise loss separation model, it enables an accurate quantification of switching losses and provides a refined understanding of device energy dissipation mechanisms. We validate the hysteresis data and loss separation results through experimental measurements on a 350-W LLC converter, which further offers deeper insights into transistor dynamic behaviour and its dependence on operating conditions. This method is applicable to a wide range of transistors, including emerging SiC and GaN devices, and serves as a valuable tool for device characterization and optimization in power electronics.
comment: 5 pages, 8 figures
☆ How flexible do we need to be? Using electricity systems models to identify optimal designs for flexible carbon capture storage system for gas-fired power plants
As the share of variable renewable energy in power systems grows, enhancing the operational flexibility of combined cycle gas turbines with carbon capture and storage (CCGT-CCS) becomes increasingly valuable. This study integrates techno-economic analysis with capacity expansion modeling to quantify the value of improved CCGT-CCS flexibility-such as lower start-up costs, reduced minimum generation, faster ramping, and shorter up/down times-at both plant and system levels. Using the Texas power system as a case study, we find that increased flexibility raises CCGT-CCS generation profits and installed capacity. Under various policy scenarios, CCGT-CCS benefits most from a CO2 tax (or equivalent emissions cap), more so than from clean energy standards or capture subsidies like the federal 45Q tax credit. However, electricity system cost savings remain modest, reducing total costs by only 0.3-0.5%. Thus, flexibility improvements should be pursued only if they entail limited increases in capital and maintenance costs.
☆ New Power Decoupling Method for Grid Forming Inverter Based on Adaptive Virtual-Synchronous Machine in Weak Grids
Many countries' policies have shifted rapidly towards using renewable energy for climate reasons. As a result, inverter-based resources are beginning to dominate power systems. Key elements for managing the loss of conventional generators are virtual-synchronous-generator-based grid-forming inverters. Despite the unique advantages of this technology, there are still various challenges, most notably the problem of active-reactive power coupling due to a nonzero power angle and high grid impedance ratio R/X. The effect of power coupling means that any change in the inverter's active power will affect the reactive power and vice versa. This challenge results in grid instability, reduces control performance, and restricts the active power delivery capability of the inverter to the grid. This paper presents a new vision to solve this impact in weak grids by a new power-decoupling method based on adaptive virtual synchronous generator parameters. The power coupling will be studied considering the parameters causing this effect. Fuzzy logic will serve to adjust the parameters of power control loops. Hardware-in-the-loop testing on a real-time simulator (OP4610) and a physical microcontroller verified and validated the proposed method. The results showed the proposed method's effectiveness in eliminating static and dynamic power coupling and improving the grid-forming inverter performance under different operating conditions.
comment: 10 pages, 18 figures
☆ Frequency Control in Microgrids: An Adaptive Fuzzy-Neural-Network Virtual Synchronous Generator
The reliance on distributed renewable energy has increased recently. As a result, power electronic-based distributed generators replaced synchronous generators which led to a change in the dynamic characteristics of the microgrid. Most critically, they reduced system inertia and damping. Virtual synchronous generators emulated in power electronics, which mimic the dynamic behaviour of synchronous generators, are meant to fix this problem. However, fixed virtual synchronous generator parameters cannot guarantee a frequency regulation within the acceptable tolerance range. Conversely, a dynamic adjustment of these virtual parameters promises robust solution with stable frequency. This paper proposes a method to adapt the inertia, damping, and droop parameters dynamically through a fuzzy neural network controller. This controller trains itself online to choose appropriate values for these virtual parameters. The proposed method can be applied to a typical AC microgrid by considering the penetration and impact of renewable energy sources. We study the system in a MATLAB/Simulink model and validate it experimentally in real time using hardware-in-the-loop based on an embedded ARM system (SAM3X8E, Cortex-M3). Compared to traditional and fuzzy logic controller methods, the results demonstrate that the proposed method significantly reduces the frequency deviation to less than 0.03 Hz and shortens the stabilizing/recovery time.
comment: 11 pages, 17 figures
☆ A detailed simulation model for fifth generation district heating and cooling networks with seasonal latent storage evaluated on field data
Fifth generation district heating and cooling (5GDHC) networks accelerate the use of renewable energies in the heating sector and enable flexible, efficient and future-proof heating and cooling supply via a single network. Due to their low temperature level and high integration of renewables, 5GDHC systems pose new challenges for the modeling of these networks in order to simulate and test operational strategies. A particular feature is the use of uninsulated pipes, which allow energy exchange with the surrounding ground. Accurate modeling of this interaction is essential for reliable simulation and optimization. This paper presents a thermp-physical model of the pip connections, the surrounding soil, a latent heat storage in the form of an ice storage as a seasonal heat storage and the house transfer stations. The model is derived from mass and energy balances leading to ordinary differential equations (ODEs). Validation is performed using field date from the 5GDHC network in Gutach-Bleibach, Germany, which supplies heating and cooling to 30 modern buildings. With an average model deviation of 4.5 % in the normalized mean bias error (NMBE) and 15.9 % in the coefficient of the variation of the root mean square error (CVRMSE), the model's accuracy is validated against the available temperature measurements. The realistic representation of the thermal-hydraulic interactions between soil and pipes, as well as the heat flow within the network, confirms the accuracy of the model and its applicability for the simulation of 5GDHC systems. The model is made openly accessible under an open-source license.
☆ Discrete-Time Linear Dynamical System Control Using Sparse Inputs With Time-Varying Support
In networked control systems, communication resource constraints often necessitate the use of \emph{sparse} control input vectors. A prototypical problem is how to ensure controllability of a linear dynamical system when only a limited number of actuators (inputs) can be active at each time step. In this work, we first present an algorithm for determining the \emph{sparse actuator schedule}, i.e., the sequence of supports of the input vectors that ensures controllability. Next, we extend the algorithm to minimize the average control energy by simultaneously minimizing the trace of the controllability Gramian, under the sparsity constraints. We derive theoretical guarantees for both algorithms: the first algorithm ensures controllability with a minimal number of control inputs at a given sparsity level; for the second algorithm, we derive an upper bound on the average control energy under the resulting actuator schedule. Finally, we develop a novel sparse controller based on Kalman filtering and sparse signal recovery that drives the system to a desired state in the presence of process and measurement noise. We also derive an upper bound on the steady-state MSE attained by the algorithm. We corroborate our theoretical results using numerical simulations and illustrate that sparse control achieves a control performance comparable to the fully actuated systems.
comment: 12 pages, 8 figures, 1 table, journal
☆ Networked pointing system: Bearing-only target localization and pointing control
In the paper, we formulate the target-pointing consensus problem where the headings of agents are required to point at a common target. Only a few agents in the network can measure the bearing information of the target. A two-step solution consisting of a bearing-only estimator for target localization and a control law for target pointing is constructed to address this problem. Compared to the strong assumptions of existing works, we only require two agents not collinear with the target to ensure localizability. By introducing the concept of virtual fusion node, we prove that both the estimation error and the tracking error converge asymptotically to the origin. The video demonstration of the verification can be found at https://youtu.be/S9- eyofk1DY.
comment: IFAC Conference on Networked Systems, 2025
☆ Receding Horizon Recursive Location Estimation
This paper presents a recursive solution to the receding or moving horizon estimation (MHE) problem for nonlinear time-variant systems. We provide the conditions under which the recursive MHE is equivalent to the extended Kalman filter (EKF), regardless of the horizon size. Theoretical and empirical evidence is also provided. Moreover, we clarify the connection between MHE and factor graph optimization (FGO). We apply the recursive MHE to GNSS localization and evaluate its performance using publicly available datasets. The paper is based on the deterministic least squares framework.
☆ Aperiodic-sampled neural network controllers with closed-loop stability verifications (extended version)
In this paper, we synthesize two aperiodic-sampled deep neural network (DNN) control schemes, based on the closed-loop tracking stability guarantees. By means of the integral quadratic constraint coping with the input-output behaviour of system uncertainties/nonlinearities and the convex relaxations of nonlinear DNN activations leveraging their local sector-bounded attributes, we establish conditions to design the event- and self-triggered logics and to compute the ellipsoidal inner approximations of region of attraction, respectively. Finally, we perform a numerical example of an inverted pendulum to illustrate the effectiveness of the proposed aperiodic-sampled DNN control schemes.
comment: 17 pages, 10 figures
☆ Physics-Informed Neural Networks for Nonlocal Flow Modeling of Connected Automated Vehicles
Connected automated vehicles (CAVs) cruising control strategies have been extensively studied at the microscopic level. CAV controllers sense and react to traffic both upstream and downstream, yet most macroscopic models still assume locality, where the desired speed only depends on local density. The nonlocal macroscopic traffic flow models that explicitly capture the ``look ahead'' and ``look behind'' nonlocal CAV dynamics remain underexplored. In this paper, we propose a Physics-informed Neural Network framework to directly learn a macroscopic non-local flow model from a generic looking-ahead looking-behind vehicle motion model, which bridges the micro-macro modeling gap. We reconstruct macroscopic traffic states from synthetic CAV trajectories generated by the proposed microscopic control designs, and then learn a non-local traffic flow model that embeds a non-local conservation law to capture the resulting look-ahead look-behind dynamics. To analyze how CAV control parameters affect nonlocal traffic flow, we conduct high-fidelity driving simulator experiments to collect human drivers' trajectory data with varying downstream and upstream visibility, which serves as a baseline for tuning CAV control gains. Our analysis validates that the learned non-local flow model predicts CAV traffic dynamics more accurately than local models, and the fundamental diagram exhibits far less scatter in the speed - density relation. We further show that the looking-ahead/looking-behind control gains mainly reshape the non-local kernels, while the macroscopic speed and non-local density relation mainly depends on the desired speed function choice of the CAV controller. Our results provide a systematic approach for learning non-local macroscopic traffic-flow models directly from generic CAV control designs.
☆ Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control
Mathematical models are crucial for optimizing and controlling chemical processes, yet they often face significant limitations in terms of computational time, algorithm complexity, and development costs. Hybrid models, which combine mechanistic models with data-driven models (i.e. models derived via the application of machine learning to experimental data), have emerged as a promising solution to these challenges. However, the identification of dynamic hybrid models remains difficult due to the need to integrate data-driven models within mechanistic model structures. We present an incremental identification approach for dynamic hybrid models that decouples the mechanistic and data-driven components to overcome computational and conceptual difficulties. Our methodology comprises four key steps: (1) regularized dynamic parameter estimation to determine optimal time profiles for flux variables, (2) correlation analysis to evaluate relationships between variables, (3) data-driven model identification using advanced machine learning techniques, and (4) hybrid model integration to combine the mechanistic and data-driven components. This approach facilitates early evaluation of model structure suitability, accelerates the development of hybrid models, and allows for independent identification of data-driven components. Three case studies are presented to illustrate the robustness, reliability, and efficiency of our incremental approach in handling complex systems and scenarios with limited data.
comment: 18 pages, 10 Figures
☆ A Computationally Efficient Method for Solving Mixed-Integer AC Optimal Power Flow Problems
Stepwise controllable devices, such as switched capacitors or stepwise controllable loads and generators, transform the nonconvex AC optimal power flow (AC-OPF) problem into a nonconvex mixed-integer (MI) programming problem which is generally hard to solve optimally. Existing methods for solving MI-AC-OPF problems usually suffer from either limited accuracy or computational intractability, making them impractical for real-world applications. To address these challenges, we propose an efficient iterative deflation approach providing high-quality approximate solutions. In each iteration, a continuously relaxed version of the MI-AC-OPF problem is solved and one candidate integer value is systematically eliminated based on the evaluation of a simple power flow result. The computational complexity of the proposed algorithm grows linearly with the number of integer optimization variables, ensuring scalability. Simulations demonstrate that the proposed approach achieves significant improvements in solution accuracy compared to a state-of-the-art approach. Thus, the proposed method is promising for solving practical MI-AC-OPF problems.
♻ ☆ Adaptive Control of Dubins Vehicle in the Presence of Loss of Effectiveness (Extended Version)
The control of a Dubins Vehicle when subjected to a loss of control effectiveness in the turning rate is considered. A complex state-space representation is used to model the vehicle dynamics. An adaptive control design is proposed, with the underlying stability analysis guaranteeing closed-loop boundedness and tracking of a desired path. It is shown that a path constructed by waypoints and a minimum turn radius can be specified using a reference model which can be followed by the closed loop system. The control design utilizes the complex state-space representation as well as a PID controller for the nominal closed-loop. How the design can be modified to ensure path following even in the presence input constraints is also discussed. Simulation studies are carried out to complement the theoretical derivations.
comment: Submitted and accepted to the 64th IEEE Conference on Decision and Control, 2025 and L-CSS 2025
♻ ☆ Linear and Nonlinear Ultra-Short Pulse Looped Antennas: Radiation and Parametric Oscillations
Modern optical systems send and receive ultra-short temporal pulses (USP). While ultra-broad band antennas do exist in the microwave region (e.g., log-periodic antennas), their short temporal response is typically limited by the antenna's large dispersion, hence, resulting in a substantial pulse broadening. The issue becomes more severe when one considers both the transmitted and received pulses. Through simulations and experiments one can show that properly designed loop antennas, either thick loops or 3-loop antennas, exhibit USP attributes, 280 ps upon transmission and 380 ps upon reception (or, an overall equivalent coherent channel exceeding 2.5 GHz). Finally, most parametric amplifiers are narrow band and one may ask if a broadband amplification is possible. A loop inside a loop system, coupled by a nonlinear impedance element exhibits a line narrowing and signal amplification with large bandwidth, which is inversely scalable with the loops' diameter. In all, these elements could be advantageous for applications such as ultra-wide bandwidth communication and non-linear quantum information systems.
comment: 8 pages 9 figures
♻ ☆ PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions
A sustainable electricity infrastructure requires the explicit integration of carbon emissions into power system modeling and optimization paradigms. However, existing open-source datasets for power system R&D lack generator-level carbon emission profiling, limiting the ability to benchmark and compare various carbon-aware grid operational strategies. To address this gap, this work introduces PGLib-CO2, an open-source extension to the widely adopted PGLib-OPF test case library. PGLib-CO2 enriches standard network cases with CO2 and CO2-equivalent emission intensity factors by expanding the fuel-type categorization used by PGLib-OPF, attaining a realistic generator-level carbon profiling. It is also packaged for both Python's pandapower and Julia's PowerModels.jl, for a seamless, user-friendly integration of emission modeling into grid computation and optimization tasks. The dataset produced by PGLib-CO2 can support grid-based carbon accounting, emission metric evaluation, and integration into AC optimal power flow (OPF) and optimal load shifting (OLS) formulations. We demonstrate PGLib-CO2's utility through case studies that quantify cost-emission trade-offs and optimize a carbon-aware objective function. By standardizing carbon-enhanced test cases, PGLib-CO2 provides an open-source, reproducible foundation for benchmarking carbon-aware computation, facilitating future research in sustainable power system operation.
♻ ☆ Agile, Autonomous Spacecraft Constellations with Disruption Tolerant Networking to Monitor Precipitation and Urban Floods
Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena. We demonstrate a ground-based and onboard algorithmic framework that combines orbital mechanics, attitude control, inter-satellite communication, intelligent prediction and planning to schedule the time-varying, re-orientation of agile, small satellites in a constellation. Planner intelligence is improved by updating the predictive value of future space-time observations based on shared observations of evolving episodic precipitation and urban flood forecasts. Reliable inter-satellite communication within a fast, dynamic constellation topology is modeled in the physical, access control and network layer. We apply the framework on a representative 24-satellite constellation observing 5 global regions. Results show appropriately low latency in information exchange (average within 1/3rd available time for implicit consensus), enabling the onboard scheduler to observe ~7% more flood magnitude than a ground-based implementation. Both onboard and offline versions performed ~98% better than constellations without agility.
♻ ☆ A Spatial-Domain Coordinated Control Method for CAVs at Unsignalized Intersections Considering Motion Uncertainty
Coordinated control of connected and automated vehicles (CAVs) emerges as a promising technology to improve traffic safety, efficiency, and sustainability. Meanwhile, mixed traffic, where CAVs coexist with conventional human-driven vehicles (HDVs), represents an upcoming and necessary stage in the development of intelligent transportation systems. Considering the motion uncertainty of HDVs, this paper proposes a coordinated control method for trajectory planning of CAVs at an unsignalized intersection in mixed traffic. By sampling in distance and using an exact change of variables, the coordinated control problem is formulated in the spatial domain as a nonlinear program, thereby allowing for unified linear collision avoidance constraints to handle vehicle crossing, following, merging, and diverging conflicts. The motion uncertainty of HDVs is decoupled and modeled as path uncertainty and speed uncertainty, whereby the robustness of collision avoidance is ensured in both spatial and temporal dimensions. The prediction deviation for HDVs is compensated by receding horizon optimization, and a real-time iteration (RTI) scheme is developed to improve computational efficiency. Simulation case studies are conducted to validate the efficacy, robustness, and potential for real-time application of the proposed methods. The results show that the proposed control scheme provides collision-free and smooth trajectories with state and control constraints satisfied. Compared with the converged baseline, the RTI scheme reduces the computation time by orders of magnitude, and the solution deviation is less than 2.3%, demonstrating a favorable trade-off between computational effort and optimality.
comment: 15 pages, 15 figures
♻ ☆ Reliable Vertical Federated Learning in 5G Core Network Architecture
This work proposes a new algorithm to mitigate model generalization loss in Vertical Federated Learning (VFL) operating under client reliability constraints within 5G Core Networks (CNs). Recently studied and endorsed by 3GPP, VFL enables collaborative and load-balanced model training and inference across the CN. However, the performance of VFL significantly degrades when the Network Data Analytics Functions (NWDAFs) - which serve as primary clients for VFL model training and inference - experience reliability issues stemming from resource constraints and operational overhead. Unlike edge environments, CN environments adopt fundamentally different data management strategies, characterized by more centralized data orchestration capabilities. This presents opportunities to implement better distributed solutions that take full advantage of the CN data handling flexibility. Leveraging this flexibility, we propose a method that optimizes the vertical feature split among clients while centrally defining their local models based on reliability metrics. Our empirical evaluation demonstrates the effectiveness of our proposed algorithm, showing improved performance over traditional baseline methods.
comment: Globecom Submission
♻ ☆ Joint Power and Spectrum Orchestration for D2D Semantic Communication Underlying Energy-Efficient Cellular Networks
Semantic communication (SemCom) has been recently deemed a promising next-generation wireless technique to enable efficient spectrum savings and information exchanges, thus naturally introducing a novel and practical network paradigm where cellular and device-to-device (D2D) SemCom approaches coexist. Nevertheless, the involved wireless resource management becomes complicated and challenging due to the unique semantic performance measurements and energy-consuming semantic coding mechanism. To this end, this paper jointly investigates power control and spectrum reuse problems for energy-efficient D2D SemCom cellular networks. Concretely, we first model the user preference-aware semantic triplet transmission and leverage a novel metric of semantic value to identify the semantic information importance conveyed in SemCom. Then, we define the additional power consumption from semantic encoding in conjunction with basic power amplifier dissipation to derive the overall system energy efficiency (semantics/Joule). Next, we formulate an energy efficiency maximization problem for joint power and spectrum allocation subject to several SemCom-related and practical constraints. Afterward, we propose an optimal resource management solution by employing the fractional-to-subtractive problem transformation and decomposition while developing a three-stage method with theoretical analysis of its optimality guarantee and computational complexity. Numerical results demonstrate the adequate performance superiority of our proposed solution compared with different benchmarks.
comment: This paper has been submitted to IEEE Trans. on Wireless Communications for the second round of peer review after major revisions
♻ ☆ Multi-Task Lifelong Reinforcement Learning for Wireless Sensor Networks
Enhancing the sustainability and efficiency of wireless sensor networks (WSN) in dynamic and unpredictable environments requires adaptive communication and energy harvesting strategies. We propose a novel adaptive control strategy for WSNs that optimizes data transmission and EH to minimize overall energy consumption while ensuring queue stability and energy storing constraints under dynamic environmental conditions. The notion of adaptability therein is achieved by transferring the known environment-specific knowledge to new conditions resorting to the lifelong reinforcement learning concepts. We evaluate our proposed method against two baseline frameworks: Lyapunov-based optimization, and policy-gradient reinforcement learning (RL). Simulation results demonstrate that our approach rapidly adapts to changing environmental conditions by leveraging transferable knowledge, achieving near-optimal performance approximately $30\%$ faster than the RL method and $60\%$ faster than the Lyapunov-based approach. The implementation is available at our GitHub repository for reproducibility purposes [1].
♻ ☆ Autocratic strategies in Cournot oligopoly game
An oligopoly is a market in which the price of a goods is controlled by a few firms. Cournot introduced the simplest game-theoretic model of oligopoly, where profit-maximizing behavior of each firm results in market failure. Furthermore, when the Cournot oligopoly game is infinitely repeated, firms can tacitly collude to monopolize the market. Such tacit collusion is realized by the same mechanism as direct reciprocity in the repeated prisoner's dilemma game, where mutual cooperation can be realized whereas defection is favorable for both prisoners in one-shot game. Recently, in the repeated prisoner's dilemma game, a class of strategies called zero-determinant strategies attracts much attention in the context of direct reciprocity. Zero-determinant strategies are autocratic strategies which unilaterally control payoffs of players. There were many attempts to find zero-determinant strategies in other games and to extend them so as to apply them to broader situations. In this paper, first, we show that zero-determinant strategies exist even in the repeated Cournot oligopoly game. Especially, we prove that an averagely unbeatable zero-determinant strategy exists, which is guaranteed to obtain the average payoff of the opponents. Second, we numerically show that the averagely unbeatable zero-determinant strategy can be used to promote collusion when it is used against an adaptively learning player, whereas it cannot promote collusion when it is used against two adaptively learning players. Our findings elucidate some negative impact of zero-determinant strategies in oligopoly market.
comment: 25 pages, 8 figures
♻ ☆ Enhanced Rapid Detection of High-impedance Arc Faults in Medium Voltage Electrical Distribution Networks
High-impedance arc faults in AC power systems have the potential to lead to catastrophic accidents. However, significant challenges exist in identifying these faults because of the much weaker characteristics and variety when grounded with different surfaces. Previous research has concentrated predominantly on arc fault detection in low-voltage systems, leaving a significant gap in medium-voltage applications. In this work, a novel approach has been developed that enables rapid arc fault detection for medium-voltage distribution lines. In contrast to existing black-box feature-based approaches, the Hankel alternative view of the Koopman (HAVOK) analysis developed from nonlinear dynamics has been applied, which not only offers interpretable features but also opens up new application options in the area of arc fault detection. The method achieves a much faster detection speed in 0.45 ms, 99.36\% enhanced compared to harmonic randomness and waveform distortion method, thus making it suitable for real-time applications. It demonstrates the ability to detect arc faults across various scenarios, including different grounding surfaces and levels of system noise, boosting its practical importance for stakeholders in safety-critical industries.
♻ ☆ IDCAIS: Inter-Defender Collision-Aware Interception Strategy against Multiple Attackers
In the prior literature on multi-agent area defense games, the assignments of the defenders to the attackers are done based on a cost metric associated only with the interception of the attackers. In contrast to that, this paper presents an Inter-Defender Collision-Aware Interception Strategy (IDCAIS) for defenders to intercept attackers in order to defend a protected area, such that the defender-to-attacker assignment protocol not only takes into account an interception-related cost but also takes into account any possible future collisions among the defenders on their optimal interception trajectories. In particular, in this paper, the defenders are assigned to intercept attackers using a mixed-integer quadratic program (MIQP) that: 1) minimizes the sum of times taken by defenders to capture the attackers under time-optimal control, as well as 2) helps eliminate or delay possible future collisions among the defenders on the optimal trajectories. To prevent inevitable collisions on optimal trajectories or collisions arising due to time-sub-optimal behavior by the attackers, a minimally augmented control using exponential control barrier function (ECBF) is also provided. Simulations show the efficacy of the approach.
comment: 14 pages, 12 figures
♻ ☆ Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
♻ ☆ Online energy management system for a fuel cell/battery hybrid system with multiple fuel cell stacks
Fuel cell (FC)/battery hybrid systems have attracted substantial attention for achieving zero-emissions buses, trucks, ships, and planes. An online energy management system (EMS) is essential for these hybrid systems, it controls energy flow and ensures optimal system performance. Key aspects include fuel efficiency and mitigating FC and battery degradation. This paper proposes a health-aware EMS for FC and battery hybrid systems with multiple FC stacks. The proposed EMS employs mixed integer quadratic programming (MIQP) to control each FC stack in the hybrid system independently, i.e., MIQP-based individual stack control (ISC), with significant fuel cost reductions, FC and battery degradations. The proposed method is compared with classical dynamic programming (DP), with a 2243 times faster computational speed than the DP method while maintaining nearoptimal performance. The case study results show that ISC achieves a 64.68 % total cost reduction compared to CSC in the examined scenario, with substantial reductions across key metrics including battery degradation (4 %), hydrogen fuel consumption (22 %), fuel cell idling loss (99 %), and fuel cell load-change loss (41 %)
Optimization and Control 35
☆ Relative Explanations for Contextual Problems with Endogenous Uncertainty: An Application to Competitive Facility Location
In this paper, we consider contextual stochastic optimization problems subject to endogenous uncertainty, where the decisions affect the underlying distributions. To implement such decisions in practice, it is crucial to ensure that their outcomes are interpretable and trustworthy. To this end, we compute relative counterfactual explanations, providing practitioners with concrete changes in the contextual covariates required for a solution to satisfy specific constraints. Whereas relative explanations have been introduced in prior literature, to the best of our knowledge, this is the first work focused on problems with binary decision variables and subject to endogenous uncertainty. We propose a methodology that uses Wasserstein distance as regularization and to compute a lower bound. It leads to a drastic reduction in computation times, compared to the unregularized counterpart. We illustrate the method using a choice-based competitive facility location problem, and present numerical experiments that demonstrate its ability to efficiently compute sparse and interpretable explanations.
☆ Global regularity of the value function in a stopper vs. singular-controller game
We study a class of zero-sum stochastic games between a stopper and a singular-controller, previously considered in [Bovo and De Angelis (2025)]. The underlying singularly-controlled dynamics takes values in $\mathcal{O}\subseteq\mathbb{R}$. The problem is set on a finite time-horizon and is connected to a parabolic variational inequality of min-max type with spatial-derivative and obstacle constraints. We show that the value function of the problem is of class $C^1$ in the whole domain $[0,T)\times\mathcal{O}$ and that the second-order spatial derivative and the second-order mixed derivative are continuous everywhere except for a (potential) jump across a non-decreasing curve (the stopping boundary of the game). The latter discontinuity is a natural consequence of the partial differential equation associated to the problem. Beyond its intrinsic analytical value, such a regularity for the value function is a stepping stone for further exploring the structure and properties of the free-boundaries of the stochastic game, which in turn determine the optimal strategies of the players.
comment: 25 pages
☆ First-Order Sparse Convex Optimization: Better Rates with Sparse Updates
In was recently established that for convex optimization problems with a sparse optimal solution (may it be entry-wise sparsity or matrix rank-wise sparsity) it is possible to have linear convergence rates which depend on an improved mixed-norm condition number of the form $\frac{\beta_1{}s}{\alpha_2}$, where $\beta_1$ is the $\ell_1$-Lipchitz continuity constant of the gradient, $\alpha_2$ is the $\ell_2$-quadratic growth constant, and $s$ is the sparsity of the optimal solution. However, beyond the improved convergence rate, these methods are unable to leverage the sparsity of optimal solutions towards improving also the runtime of each iteration, which may still be prohibitively high for high-dimensional problems. In this work, we establish that linear convergence rates which depend on this improved condition number can be obtained using only sparse updates, which may result in overall significantly improved running times. Moreover, our methods are considerably easier to implement.
☆ Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP), where the transition kernel and reward functions are unknown to the seller. In each round, the seller determines an allocation and a payment for each bidder. Each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller's allocation policy. Unlike existing works that formulate the problem as a multi-armed bandit model or as an episodic MDP, where the environment resets to an initial state after each round or episode, our paper considers a more realistic and sophisticated setting in which the market continues to evolve without restarting. We first extend the Vickrey-Clarke-Groves (VCG) mechanism, which is known to be efficient, truthful, and individually rational for one-shot static auctions, to sequential auctions, thereby obtaining a dynamic VCG mechanism counterpart that preserves these desired properties. We then focus on the online setting and develop an online reinforcement learning algorithm for the seller to learn the underlying MDP model and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned online mechanism asymptotically converges to a dynamic mechanism that approximately satisfies efficiency, truthfulness, and individual rationality with arbitrarily high probability and achieves guaranteed performance in terms of various notions of regret.
comment: 16 pages
☆ A comparison principle for variational problems : with an application to optimal transport
We study variational problems on Banach spaces which involve submodular energies. We extend the notion of exchangeability to this infinite dimensional setting and show that it is in duality with submodularity. These two notions allow us to derive comparison principle in an abstract fashion. We apply our results to the optimal transport and entropic optimal transport problems. We then derive comparison principles on the Kantorovich and Schr\"odinger potentials. We also prove comparison principles for the associated JKO schemes.
☆ Multi-Agent Online Control with Adversarial Disturbances
Multi-agent control problems involving a large number of agents with competing and time-varying objectives are increasingly prevalent in applications across robotics, economics, and energy systems. In this paper, we study online control in multi-agent linear dynamical systems with disturbances. In contrast to most prior work in multi-agent control, we consider an online setting where disturbances are adversarial and where each agent seeks to minimize its own, adversarial sequence of convex losses. In this setting, we investigate the robustness of gradient-based controllers from single-agent online control, with a particular focus on understanding how individual regret guarantees are influenced by the number of agents in the system. Under minimal communication assumptions, we prove near-optimal sublinear regret bounds that hold uniformly for all agents. Finally, when the objectives of the agents are aligned, we show that the multi-agent control problem induces a time-varying potential game for which we derive equilibrium gap guarantees.
☆ FICA: Faster Inner Convex Approximation of Chance Constrained Grid Dispatch with Decision-Coupled Uncertainty
This paper proposes a Faster Inner Convex Approximation (FICA) method for solving power system dispatch problems with Wasserstein distributionally robust joint chance constraints (WJCC) and incorporating the modelling of the automatic generation control factors. The problem studied belongs to the computationally challenging class of WJCC with left-hand-side uncertainty (LHS-WJCC). By exploiting the special one-dimensional structure (even if only partially present) of the problem, the proposed FICA incorporates a set of strong valid inequalities to accelerate the solution process. We prove that FICA achieves the same optimality as the well-known conditional value-at-risk (CVaR) inner convex approximation method. Our numerical experiments demonstrate that the proposed FICA can yield 40x computational speedup compared to CVaR, and can even reach up to 500x speedup when the optimisation horizon exceeds 16 time steps. This speedup is achieved when only 50% of constraints in a WJCC have the one-dimensional structure. The approximation quality is numerically verified to be the same as CVaR, and the quality gap is below 1% when compared to the computationally demanding exact reformulation of the LHS-WJCC in most cases. We also discuss the applications of FICA in optimisation problems from other domains that (partially) exhibit the one-dimensional structure.
comment: 10 pages, in review for IEEE Transactions on Power Systems
☆ Leveraging Transfer Learning to Overcome Data Limitations in Czochralski Crystal Growth
The Czochralski (Cz) method is a widely used process for growing high-quality single crystals, critical for applications in semiconductors, optics, and advanced materials. Achieving optimal growth conditions requires precise control of process and furnace design parameters. Still, data scarcity -- especially for new materials -- limits the application of machine learning (ML) in predictive modeling and optimization. This study proposes a transfer learning approach to overcome this limitation by adapting ML models trained on a higher data volume of one source material (Si) to a lower data volume of another target material (Ge and GaAs). The materials were deliberately selected to assess the robustness of the transfer learning approach in handling varying data similarity, with Cz-Ge being similar to Cz-Si, and GaAs grown via the liquid encapsulated Czochralski method (LEC), which differs from Cz-Si. We explore various transfer learning strategies, including Warm Start, Merged Training, and Hyperparameters Transfer, and evaluate multiple ML architectures across two different materials. Our results demonstrate that transfer learning significantly enhances predictive accuracy with minimal data, providing a practical framework for optimizing Cz growth parameters across diverse materials.
comment: Submitted to Advanced Theory and Simulations. 21 pages, 8 figures
☆ Near-Optimal Dynamic Policies for Joint Replenishment in Continuous/Discrete Time
While dynamic policies have historically formed the foundation of most influential papers dedicated to the joint replenishment problem, we are still facing profound gaps in our structural understanding of optimal such policies as well as in their surrounding computational questions. To date, the seminal work of Roundy (1985, 1986) and Jackson et al. (1985) remains unsurpassed in efficiently developing provably-good dynamic policies in this context. The principal contribution of this paper consists in developing a wide range of algorithmic ideas and analytical insights around the continuous-time joint replenishment problem, culminating in a deterministic framework for efficiently approximating optimal dynamic policies to any desired level of accuracy. These advances enable us to derive a compactly-encoded replenishment policy whose long-run average cost is within factor $1 + \epsilon$ of the dynamic optimum, arriving at an efficient polynomial-time approximation scheme (EPTAS). Technically speaking, our approach hinges on affirmative resolutions to two fundamental open questions: -- We devise the first efficient discretization-based framework for approximating the joint replenishment problem. Specifically, we prove that every continuous-time infinite-horizon instance can be reduced to a corresponding discrete-time $O( \frac{ n^3 }{ \epsilon^6 } )$-period instance, while incurring a multiplicative optimality loss of at most $1 + \epsilon$. -- Motivated by this relation, we substantially improve on the $O( 2^{2^{O(1/\epsilon)}} \cdot (nT)^{ O(1) } )$-time approximation scheme of Nonner and Sviridenko (2013) for the discrete-time joint replenishment problem. Beyond an exponential improvement in running time, we demonstrate that randomization and hierarchical decompositions can be entirely avoided, while concurrently offering a relatively simple analysis.
☆ An improved input-constrained funnel controller for nonlinear systems
We present an improvement of a recent funnel controller design for uncertain nonlinear multi-input, multi-output systems modeled by higher order functional differential equations in the presence of input constraints. The objective is to guarantee the evolution of the tracking error within a performance funnel with prescribed desired shape for the case of inactive saturation. Compared to its precursor, controller complexity is significantly reduced, much fewer design parameters are involved and simulations exhibit a superior performance.
☆ On the Maximization of Real Sequences
In this paper, we study a maximization problem on real sequences. More precisely, for a given sequence, we are interesting in computing the supremum of the sequence and an index for which the associated term is maximal. We propose a general methodology to solve this maximization problem. The method is based on upper approximations constructed from pairs of eventually decreasing sequences of strictly increasing continuous functions on $[0,1]$ and of scalars in $(0,1)$. Then, we can associate integers with those pairs using inverses on $[0,1]$ of the functions. We prove that such pairs always exist and one provides the index maximizer. In general, such pairs provide an upper bound of the greatest maximizer of the sequence. Finally, we apply the methodology on concrete examples including famous sequences such as logistic, Fibonacci and Syracuse sequences. We also apply our techniques to norm based peaks computation problems on discrete-time linear systems.
comment: 31 pages
☆ Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control
Mathematical models are crucial for optimizing and controlling chemical processes, yet they often face significant limitations in terms of computational time, algorithm complexity, and development costs. Hybrid models, which combine mechanistic models with data-driven models (i.e. models derived via the application of machine learning to experimental data), have emerged as a promising solution to these challenges. However, the identification of dynamic hybrid models remains difficult due to the need to integrate data-driven models within mechanistic model structures. We present an incremental identification approach for dynamic hybrid models that decouples the mechanistic and data-driven components to overcome computational and conceptual difficulties. Our methodology comprises four key steps: (1) regularized dynamic parameter estimation to determine optimal time profiles for flux variables, (2) correlation analysis to evaluate relationships between variables, (3) data-driven model identification using advanced machine learning techniques, and (4) hybrid model integration to combine the mechanistic and data-driven components. This approach facilitates early evaluation of model structure suitability, accelerates the development of hybrid models, and allows for independent identification of data-driven components. Three case studies are presented to illustrate the robustness, reliability, and efficiency of our incremental approach in handling complex systems and scenarios with limited data.
comment: 18 pages, 10 Figures
☆ A Computationally Efficient Method for Solving Mixed-Integer AC Optimal Power Flow Problems
Stepwise controllable devices, such as switched capacitors or stepwise controllable loads and generators, transform the nonconvex AC optimal power flow (AC-OPF) problem into a nonconvex mixed-integer (MI) programming problem which is generally hard to solve optimally. Existing methods for solving MI-AC-OPF problems usually suffer from either limited accuracy or computational intractability, making them impractical for real-world applications. To address these challenges, we propose an efficient iterative deflation approach providing high-quality approximate solutions. In each iteration, a continuously relaxed version of the MI-AC-OPF problem is solved and one candidate integer value is systematically eliminated based on the evaluation of a simple power flow result. The computational complexity of the proposed algorithm grows linearly with the number of integer optimization variables, ensuring scalability. Simulations demonstrate that the proposed approach achieves significant improvements in solution accuracy compared to a state-of-the-art approach. Thus, the proposed method is promising for solving practical MI-AC-OPF problems.
☆ Finite-Time Information-Theoretic Bounds in Queueing Control
We establish the first finite-time information-theoretic lower bounds-and derive new policies that achieve them-for the total queue length in scheduling problems over stochastic processing networks with both adversarial and stochastic arrivals. Prior analyses of MaxWeight guarantee only stability and asymptotic optimality in heavy traffic; we prove that, at finite horizons, MaxWeight can incur strictly larger backlog by problem-dependent factors which we identify. Our main innovations are 1) a minimax framework that pinpoints the precise problem parameters governing any policy's finite-time performance; 2) an information-theoretic lower bound on total queue length; 3) fundamental limitation of MaxWeight that it is suboptimal in finite time; and 4) a new scheduling rule that minimizes the full Lyapunov drift-including its second-order term-thereby matching the lower bound under certain conditions, up to universal constants. These findings reveal a fundamental limitation on "drift-only" methods and points the way toward principled, non-asymptotic optimality in queueing control.
☆ Spectral Outer-Approximation Algorithms for Binary Semidefinite Problems
Integer semidefinite programming (ISDP) has recently gained attention due to its connection to binary quadratically constrained quadratic programs (BQCQPs), which can be exactly reformulated as binary semidefinite programs (BSDPs). However, it remains unclear whether this reformulation effectively uses existing ISDP solvers to address BQCQPs. To the best of our knowledge, no specialized ISDP algorithms exploit the unique structure of BSDPs derived from BQCQPs. This paper proposes a novel spectral outer approximation algorithm tailored for BSDPs derived from BQCQP reformulations. Our approach is inspired by polyhedral and second-order representable regions that outer approximate the feasible set of a semidefinite program relying on a spectral decomposition of a matrix that simultaneously diagonalizes the objective matrix and an aggregation of the constraint matrices. Computational experiments show that our algorithm is competitive with, and in some cases outperforms, state-of-the-art ISDP solvers such as SCIP-SDP and PAJARITO, highlighting ISDP's potential for solving BQCQPs.
♻ ☆ Geometry of efficient weight vectors
Pairwise comparison matrices and the weight vectors obtained from them are important concepts in multi-criteria decision making. A weight vector calculated from a pairwise comparison matrix is called Pareto efficient if the approximation of the matrix elements by the weight ratios cannot be improved for any element of the matrix without worsening it for another element. The aim of this paper is to show the geometrical properties of the set of Pareto efficient weight vectors for $4\times 4$ pairwise comparison matrices. We prove that the set of efficient weight vectors is a union of three tetrahedra, each determined by four weight vectors calculated from an incomplete submatrix of the pairwise comparison matrix that can be represented by a path-type spanning tree graph. It is shown that with suitable rearrangements the orientations of the $4$-cycles in the Blanquero-Carrizosa-Conde graphs for the efficient weight vectors are determined as well. The special cases (double perturbed, simple perturbed, consistent matrices) are discussed in the appendices.
♻ ☆ The Gittins Index: A Design Principle for Decision-Making Under Uncertainty
The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora's box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems. The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.
♻ ☆ Ambulance Allocation for Patient-Centered Care
Emergency Medical Services (EMS) in the United States and similar systems typically utilize a single treatment pathway, transporting all patients to emergency departments (EDs), regardless of their actual care needs or preferences. Recent policy reforms have sought to introduce alternative treatment pathways to divert lower acuity patients from the ED, but operationalizing these options has proven difficult. This paper proposes a patient-centered EMS (PC-EMS) ambulance allocation model that supports multiple care pathways by aligning EMS responses with individual patient needs. We develop a two-stage mixed-integer optimization framework that incorporates multiple dispatch and secondary assignment strategies which enable dynamic resource deployment. The model maximizes appropriate ED diversions while maintaining ambulance availability using a queueing-based availability constraint. We leverage national EMS data and machine learning to estimate dispatcher accuracy and diversion potential. Simulations across diverse geographic regions suggest that agencies can achieve up to 80% of possible ED diversions by equipping only 15 to 25% of their fleet with diversion capable units. Adaptive dispatch strategies improve diversion rates by 3.4 to 8.6 times compared to conventional single unit dispatch. These results provide actionable guidance for PC-EMS implementation by quantifying the trade off between equipment investment and operational coordination. Using the allocation model, agencies can strategically choose between upgrading fewer units with advanced dispatching protocols versus larger fleet investments with simpler operations. This approach offers flexible pathways suited to different organizational capabilities and implementation readiness.
♻ ☆ CatMADS: Mesh Adaptive Direct Search for constrained blackbox optimization with categorical variables
Solving optimization problems in which functions are blackboxes and variables involve different types poses significant theoretical and algorithmic challenges. Nevertheless, such settings frequently occur in simulation-based engineering design and machine learning. This paper extends the Mesh Adaptive Direct Search (MADS) algorithm to address mixed-variable problems with categorical, integer and continuous variables. MADS is a robust derivative-free optimization framework with a well-established convergence analysis for constrained quantitative problems. CatMADS generalizes MADS by incorporating categorical variables through distance-induced neighborhoods. A detailed convergence analysis of CatMADS is provided, with flexible choices balancing computational cost and local optimality strength. Four types of mixed-variable local minima are introduced, corresponding to progressively stronger notions of local optimality. CatMADS integrates the progressive barrier strategy for handling constraints, and ensures Clarke stationarity. An instance of \catmads employs cross-validation to construct problem-specific categorical distances. This instance is compared to state-of-the-art solvers on 32 mixed-variable problems, half of which are constrained. Data profiles show that CatMADS achieves the best results, demonstrating that the framework is empirically efficient in addition to having strong theoretical foundations.
comment: 25 pages without references, 7 figures and code available at https://github.com/bbopt/CatMADS_prototype
♻ ☆ Adaptive Acceleration Without Strong Convexity Priors Or Restarts
Accelerated optimization methods are typically parametrized by the Lipschitz smoothness and strong convexity parameters of a given problem, which are rarely known in practice. While the Lipschitz smoothness parameter L can be easily estimated by backtracking, directly estimating (i.e., without restarts) the strong convexity parameter m remains an open problem. This paper addresses this challenge by proposing NAG-free, an optimization method that achieves acceleration while estimating m online without priors or restarts. Our method couples Nesterov's accelerated gradient (NAG) method with an inexpensive estimator that does not require additional computation, only storage of one more iterate and gradient. We prove that in the canonical setting of the open problem, NAG-free converges globally at least as fast as gradient descent and locally at an accelerated rate, without resorting to m priors or restarts. We also introduce a complementary estimator for L, symmetric to the m estimator. We provide extensive empirical evidence that this NAG-free heuristic with L and m estimators often achieves acceleration in practice, outperforming restart schemes and traditional accelerated methods parametrized by offline estimates of L and m. Our experiments also suggest that NAG-free might be effective even for nonconvex problems when combined with restarts.
♻ ☆ A strongly polynomial-time algorithm for the general linear programming problem
This article presents a strongly polynomial-time algorithm for the general linear programming problem. This algorithm is an implicit reduction procedure that works as follows. Primal and dual problems are combined into a special system of linear equations constrained by complementarity relations and non-negative variables. Each iteration of the algorithm consists of applying a pair of complementary Gauss-Jordan pivoting operations, guided by a necessary-condition lemma. The algorithm requires no more than k+n iterations, as there are only k+n complementary pairs of columns to compare one-pair-at-a-time, where k is the number of constraints and n is the number of variables of given general linear programming problem. Numerical illustration is given that includes an instance of a classical problem of Klee and Minty and a problem of Beale.
comment: 27 pages; New annotations and remarks have been added to aid efficient reading and volunteered reviews. The results are the same as before
Learning to Insert for Constructive Neural Vehicle Routing Solver
Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm's flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes.
♻ ☆ Finite Dimensional Projections of HJB Equations in the Wasserstein Space
This paper continues the study of controlled interacting particle systems with common noise started in [W. Gangbo, S. Mayorga and A. \'{S}wi\k{e}ch, SIAM J. Math. Anal. 53 (2021), no. 2, 1320--1356] and [S. Mayorga and A. \'{S}wi\k{e}ch, SIAM J. Control Optim. 61 (2023), no. 2, 820--851]. First, we extend the following results of the previously mentioned works to the case of multiplicative noise: (i) We generalize the convergence of the value functions $u_n$ corresponding to control problems of $n$ particles to the value function $V$ corresponding to an appropriately defined infinite dimensional control problem; (ii) we prove, under certain additional assumptions, $C^{1,1}$ regularity of $V$ in the spatial variable. The second main contribution of the present work is the proof that if $DV$ is continuous (which, in particular, includes the previously proven case of $C^{1,1}$ regularity in the spatial variable), the value function $V$ projects precisely onto the value functions $u_n$. Using this projection property, we show that optimal controls of the finite dimensional problem correspond to optimal controls of the infinite dimensional problem and vice versa. In the case of a linear state equation, we are able to prove that $V$ projects precisely onto the value functions $u_n$ under relaxed assumptions on the coefficients of the cost functional by using approximation techniques in the Wasserstein space, thus covering cases where $V$ may not be differentiable.
comment: 37 pages; accepted for publication in Ann. Appl. Probab
♻ ☆ Accelerating Proximal Gradient Descent via Silver Stepsizes
Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2024, 2025), namely $O(\varepsilon^{- \log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing -- the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.
comment: 33 pages, COLT 2025. v2: minor expository changes, matches camera-ready version
♻ ☆ When to Forget? Complexity Trade-offs in Machine Unlearning
Machine Unlearning (MU) aims at removing the influence of specific data points from a trained model, striving to achieve this at a fraction of the cost of full model retraining. In this paper, we analyze the efficiency of unlearning methods and establish the first upper and lower bounds on minimax computation times for this problem, characterizing the performance of the most efficient algorithm against the most difficult objective function. Specifically, for strongly convex objective functions and under the assumption that the forget data is inaccessible to the unlearning method, we provide a phase diagram for the unlearning complexity ratio -- a novel metric that compares the computational cost of the best unlearning method to full model retraining. The phase diagram reveals three distinct regimes: one where unlearning at a reduced cost is infeasible, another where unlearning is trivial because adding noise suffices, and a third where unlearning achieves significant computational advantages over retraining. These findings highlight the critical role of factors such as data dimensionality, the number of samples to forget, and privacy constraints in determining the practical feasibility of unlearning.
♻ ☆ Dynamic Pricing for Reusable Resources: The Power of Two Prices
Motivated by real-world applications such as rental and cloud computing services, we investigate pricing for reusable resources. We consider a system where a single resource with a fixed number of identical copies serves customers with heterogeneous willingness-to-pay (WTP), and the usage duration distribution is general. Optimal dynamic policies are computationally intractable when usage durations are not memoryless, so existing literature has focused on static pricing, which incurs a steady-state performance loss of ${O}(\sqrt{c})$ compared to optimality when supply and demand scale with $c$. We propose a class of dynamic "stock-dependent" policies that 1) are computationally tractable and 2) can attain a steady-state performance loss of $o(\sqrt{c})$. We give parametric bounds based on the local shape of the reward function at the optimal fluid admission probability and show that the performance loss of stock-dependent policies can be as low as ${O}((\log{c})^2)$. We characterize the tight performance loss for stock-dependent policies and show that they can in fact be achieved by a simple two-price policy that sets a higher price when the stock is below some threshold and a lower price otherwise. We extend our results to settings with multiple resources and multiple customer classes. Finally, we demonstrate this "minimally dynamic" class of two-price policies performs well numerically, even in non-asymptotic settings, suggesting that a little dynamicity can go a long way.
♻ ☆ Krasnoselskii-Mann Iterations: Inertia, Perturbations and Approximation
This paper is concerned with the study of a family of fixed point iterations combining relaxation with different inertial (acceleration) principles. We provide a systematic, unified and insightful analysis of the hypotheses that ensure their weak, strong and linear convergence, either matching or improving previous results obtained by analysing particular cases separately. We also show that these methods are robust with respect to different kinds of perturbations--which may come from computational errors, intentional deviations, as well as regularisation or approximation schemes--under surprisingly weak assumptions. Although we mostly focus on theoretical aspects, numerical illustrations in image inpainting and electricity production markets reveal possible trends in the behaviour of these types of methods.
comment: Accepted for publication in Journal of Optimization Theory and Applications
♻ ☆ Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors
We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $\sigma$-quasioptimal metric projection ($\sigma \geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $\sigma$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections are quasioptimal, converge locally and linearly for super-regular sets with transversal intersection. The theory is motivated by the successful application of alternating projections to low-rank matrix and tensor approximation. We focus on two problems -- nonnegative low-rank approximation and low-rank approximation in the maximum norm -- and develop fast alternating-projection algorithms for matrices and tensor trains based on cross approximation and acceleration techniques. The numerical experiments confirm that the proposed methods are efficient and suggest that they can be used to regularise various low-rank computational routines.
comment: Accepted for publication in Numerische Mathematik
♻ ☆ A Globally Convergent Gradient Method with Momentum
In this work, we consider smooth unconstrained optimization problems and we deal with the class of gradient methods with momentum, i.e., descent algorithms where the search direction is defined as a linear combination of the current gradient and the preceding search direction. This family of algorithms includes nonlinear conjugate gradient methods and Polyak's heavy-ball approach, and is thus of high practical and theoretical interest in large-scale nonlinear optimization. We propose a general framework where the scalars of the linear combination defining the search direction are computed simultaneously by minimizing the approximate quadratic model in the 2 dimensional subspace. This strategy allows us to define a class of gradient methods with momentum enjoying global convergence guarantees and an optimal worst-case complexity bound in the nonconvex setting. Differently than all related works in the literature, the convergence conditions are stated in terms of the Hessian matrix of the bi-dimensional quadratic model. To the best of our knowledge, these results are novel to the literature. Moreover, extensive computational experiments show that the gradient method with momentum here presented is a solid choice to tackle some classes of nonconvex unconstrained problems.
♻ ☆ Soft decision trees for survival analysis
Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.
♻ ☆ Examples of small-time controllable Schrödinger equations
A variety of physically relevant bilinear Schr\"odinger equations are known to be approximately controllable in large times. There are however examples which are approximately controllable in large times, but not in small times. This obstruction happens e.g. in the presence of (sub)quadratic potentials, because Gaussian states are preserved, at least for small times. In this work, we provide the first examples of small-time approximately controllable bilinear Schr\"odinger equations. In particular, we show that a control on the frequency of a quadratic potential permits to construct approximate solutions that evolve arbitrarily fast along space-dilations. Once we have access to space-dilations, we can exploit them to generate time-contractions. In this way, we build on previous results of large-time control, to obtain control in small times.
comment: 27 pages
♻ ☆ Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope-based Reformulation
In this paper, we study a class of bilevel optimization problems where the lower-level problem is a convex composite optimization model, which arises in various applications, including bilevel hyperparameter selection for regularized regression models. To solve these problems, we propose an Alternating Gradient-type algorithm with Inexact Lower-level Solutions (AGILS) based on a Moreau envelope-based reformulation of the bilevel optimization problem. The proposed algorithm does not require exact solutions of the lower-level problem at each iteration, improving computational efficiency. We prove the convergence of AGILS to stationary points and, under the Kurdyka-{\L}ojasiewicz (KL) property, establish its sequential convergence. Numerical experiments, including a toy example and a bilevel hyperparameter selection problem for the sparse group Lasso model, demonstrate the effectiveness of the proposed AGILS.
♻ ☆ A new characterization of symmetric $H^+$-tensors and $M$-tensors
In this work, we present a new characterization of symmetric $H^+$-tensors. It is known that a symmetric tensor is an $H^+$-tensor if and only if it is a generalized diagonally dominant tensor with nonnegative diagonal elements. By exploring the diagonal dominance property, we derive new necessary and sufficient conditions for a symmetric tensor to be an $H^+$-tensor. Based on these conditions, we propose a novel method that allows to check if a tensor is a symmetric $H^+$-tensor in polynomial time. Moreover, these results can be applied to the closely related and important class of $M$-tensors. In particular, this allows to efficiently compute the minimum $H$-eigenvalue of symmetric $M$-tensors. Furthermore, we show how this latter result can be used to provide tighter lower bounds for the minimum $H$-eigenvalue of the Fan product of two symmetric $M$-tensors.
comment: 3 tables
♻ ☆ ASGO: Adaptive Structured Gradient Optimization
Training deep neural networks is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than by vectors. Under this structural representation, it has been widely observed that gradients are low-rank and Hessians are approximately block-wise diagonal. These structured properties are crucial for designing efficient optimization algorithms, but are not utilized by many current popular optimizers like Adam. In this paper, we present a novel optimization algorithm ASGO that capitalizes on these properties by employing a preconditioner that is adaptively updated using structured gradients. By fine-grained theoretical analysis, ASGO is proven to achieve superior convergence rates compared to existing structured gradient methods. Based on the convergence theory, we further demonstrate that ASGO can benefit from the low-rank and block-wise diagonal properties. We also discuss practical modifications of ASGO and empirically verify ASGO's effectiveness on language model tasks.
comment: 30 pages
♻ ☆ Drift Control with Discretionary Stopping for a Diffusion
We consider stochastic control with discretionary stopping for the drift of a diffusion process over an infinite time horizon. The objective is to choose a control process and a stopping time to minimize the expectation of a convex terminal cost in the presence of a fixed operating cost and a control-dependent running cost per unit of elapsed time. Under appropriate conditions on the coefficients of the controlled diffusion, an optimal pair of control and stopping rules is shown to exist. Moreover, under the same assumptions, it is shown that the optimal control is a constant which can be computed fairly explicitly; and that it is optimal to stop the first time an appropriate interval is visited. We consider also a constrained version of the above problem, in which an upper bound on the expectation of available stopping times is imposed; we show that this constrained problem can be reduced to an unconstrained problem with some appropriate change of parameters and, as a result, solved by similar arguments.
comment: 23 pages
Systems and Control 16
☆ Wisdom of Crowds Through Myopic Self-Confidence Adaptation
The wisdom of crowds is an umbrella term for phenomena suggesting that the collective judgment or decision of a large group can be more accurate than the individual judgments or decisions of the group members. A well-known example illustrating this concept is the competition at a country fair described by Galton, where the median value of the individual guesses about the weight of an ox resulted in an astonishingly accurate estimate of the actual weight. This phenomenon resembles classical results in probability theory and relies on independent decision-making. The accuracy of the group's final decision can be significantly reduced if the final agents' opinions are driven by a few influential agents. In this paper, we consider a group of agents who initially possess uncorrelated and unbiased noisy measurements of a common state of the world. Assume these agents iteratively update their estimates according to a simple non-Bayesian learning rule, commonly known in mathematical sociology as the French-DeGroot dynamics or iterative opinion pooling. As a result of this iterative distributed averaging process, each agent arrives at an asymptotic estimate of the state of the world, with the variance of this estimate determined by the matrix of weights the agents assign to each other. Every agent aims at minimizing the variance of her asymptotic estimate of the state of the world; however, such variance is also influenced by the weights allocated by other agents. To achieve the best possible estimate, the agents must then solve a game-theoretic, multi-objective optimization problem defined by the available sets of influence weights. We characterize both the Pareto frontier and the set of Nash equilibria in the resulting game. Additionally, we examine asynchronous best-response dynamics for the group of agents and prove their convergence to the set of strict Nash equilibria.
☆ Symbolic Reduction for Formal Synthesis of Global Lyapunov Functions
We investigate the formal synthesis of global polynomial Lyapunov functions for polynomial vector fields. We establish that a sign-definite polynomial must satisfy specific algebraic constraints, which we leverage to develop a set of straightforward symbolic reduction rules. These rules can be recursively applied to symbolically simplify the Lyapunov candidate, enabling more efficient and robust discovery of Lyapunov functions via optimization or satisfiability modulo theories (SMT) solving. In many cases, without such simplification, finding a valid Lyapunov function is often infeasible. When strict Lyapunov functions are unavailable, we design synthesis procedures for finding weak Lyapunov functions to verify global asymptotic stability using LaSalle's invariance principle. Finally, we encode instability conditions for Lyapunov functions and develop SMT procedures to disprove global asymptotic stability. Through a series of examples, we demonstrate that the proposed symbolic reduction, LaSalle-type conditions, and instability tests allow us to efficiently solve many cases that would otherwise be challenging.
comment: An extended version of a paper to be presented at QEST + FORMATS 2025
☆ G-SEED: A Spatio-temporal Encoding Framework for Forest and Grassland Data Based on GeoSOT
In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse data sources. To address these challenges, this paper proposes G-SEED (GeoSOT-based Scalable Encoding and Extraction for Forest and Grassland Spatio-temporal Data), a unified encoding and management framework based on the hierarchical GeoSOT (Geographical coordinate global Subdivision grid with One dimension integer on 2n tree) grid system. G-SEED integrates spatial, temporal, and type information into a composite code, enabling consistent encoding of both structured and unstructured data, including remote sensing imagery, vector maps, sensor records, documents, and multimedia content. The framework incorporates adaptive grid-level selection, center-cell-based indexing, and full-coverage grid arrays to optimize spatial querying and compression. Through extensive experiments on a real-world dataset from Shennongjia National Park (China), G-SEED demonstrates superior performance in spatial precision control, cross-source consistency, query efficiency, and compression compared to mainstream methods such as Geohash and H3. This study provides a scalable and reusable paradigm for the unified organization of forest and grassland big data, supporting dynamic monitoring and intelligent decision-making in these domains.
comment: 11 pages, 2 figures. Previously submitted to a non-academic conference (ICGARSA 2025) and formally withdrawn
☆ Distributionally robust minimization in meta-learning for system identification
Meta learning aims at learning how to solve tasks, and thus it allows to estimate models that can be quickly adapted to new scenarios. This work explores distributionally robust minimization in meta learning for system identification. Standard meta learning approaches optimize the expected loss, overlooking task variability. We use an alternative approach, adopting a distributionally robust optimization paradigm that prioritizes high-loss tasks, enhancing performance in worst-case scenarios. Evaluated on a meta model trained on a class of synthetic dynamical systems and tested in both in-distribution and out-of-distribution settings, the proposed approach allows to reduce failures in safety-critical applications.
☆ Bird's-eye view safety monitoring for the construction top under the tower crane
The tower crane is involving more automated and intelligent operation procedure, and importantly, the application of automation technologies to the safety issues is imperative ahead of the utilization of any other advances. Among diverse risk management tasks on site, it is essential to protect the human workers on the workspace between the tower crane and constructed building top area (construction top) from the bird's-eye view, especially with Modular Integrated Construction (MiC) lifted. Also, the camera and Light Detection And Ranging (LiDAR) can capture abundant 3D information on site, which is however yet made the best use. Considering the safety protection for humans and tower cranes, we present an AI-based fully automated safety monitoring system for tower crane lifting from the bird's-eye view, surveilling to shield the human workers on the construction top and avoid cranes' collision by alarming the crane operator. The system achieved a 3D data fusion for localization of humans and MiCs by integrating the captured information from camera and LiDAR. The state-of-the-art methods were explored and implemented into our proposed software pipeline coupled with the hardware and display systems. Furthermore, we conducted an analysis of the components in the pipeline to verify the accuracy and effectiveness of the involved methods. The display and visualization on the real site proved that our system can serve as a valuable safety monitoring toolkit on site.
☆ ROBBO: An Efficient Method for Pareto Front Estimation with Guaranteed Accuracy
A new method to estimate the Pareto Front (PF) in bi-objective optimization problems is presented. Assuming a continuous PF, the approach, named ROBBO (RObust and Balanced Bi-objective Optimization), needs to sample at most a finite, pre-computed number of PF points. Upon termination, it guarantees that the worst-case approximation error lies within a desired tolerance range, predefined by the decision maker, for each of the two objective functions. Theoretical results are derived, about the worst-case number of PF samples required to guarantee the wanted accuracy, both in general and for specific sampling methods from the literature. A comparative analysis, both theoretical and numerical, demonstrates the superiority of the proposed method with respect to popular ones. The approach is finally showcased in a constrained path-following problem for a 2-axis positioning system and in a steady-state optimization problem for a Continuous-flow Stirred Tank Reactor. An open demo implementation of ROBBO is made available online.
comment: 35 pages, 10 figures, under review
☆ Non-Euclidean Enriched Contraction Theory for Monotone Operators and Monotone Dynamical Systems
We adopt an operator-theoretic perspective to analyze a class of nonlinear fixed-point iterations and discrete-time dynamical systems. Specifically, we study the Krasnoselskij iteration - at the heart of countless algorithmic schemes and underpinning the stability analysis of numerous dynamical models - by focusing on a non-Euclidean vector space equipped with the diagonally weighted supremum norm. By extending the state of the art, we introduce the notion of enriched weak contractivity, which (i) is characterized by a simple, verifiable condition for Lipschitz operators, and (ii) yields explicit bounds on the admissible step size for the Krasnoselskij iteration. Our results relate the notion of weak contractivity with that of monotonicity of operators and dynamical systems and show its generality to design larger step sizes and improved convergence speed for broader classes of dynamical systems. The newly developed theory is illustrated on two applications: the design of zero-finding algorithms for monotone operators and the design of nonlinear consensus dynamics in monotone multi-agent dynamical systems.
comment: 13 pages, 2 figure
☆ A Scenario-based Model Predictive Control Scheme for Pandemic Response through Non-pharmaceutical Interventions
This paper presents a scenario-based model predictive control (MPC) scheme designed to control an evolving pandemic via non-pharmaceutical intervention (NPIs). The proposed approach combines predictions of possible pandemic evolution to decide on a level of severity of NPIs to be implemented over multiple weeks to maintain hospital pressure below a prescribed threshold, while minimizing their impact on society. Specifically, we first introduce a compartmental model which divides the population into Susceptible, Infected, Detected, Threatened, Healed, and Expired (SIDTHE) subpopulations and describe its positive invariant set. This model is expressive enough to explicitly capture the fraction of hospitalized individuals while preserving parameter identifiability w.r.t. publicly available datasets. Second, we devise a scenario-based MPC scheme with recourse actions that captures potential uncertainty of the model parameters. e.g., due to population behavior or seasonality. Our results show that the scenario-based nature of the proposed controller manages to adequately respond to all scenarios, keeping the hospital pressure at bay also in very challenging situations when conventional MPC methods fail.
☆ Safety Certificate against Latent Variables with Partially Unidentifiable Dynamics ICML 2025
Many systems contain latent variables that make their dynamics partially unidentifiable or cause distribution shifts in the observed statistics between offline and online data. However, existing control techniques often assume access to complete dynamics or perfect simulators with fully observable states, which are necessary to verify whether the system remains within a safe set (forward invariance) or safe actions are consistently feasible at all times. To address this limitation, we propose a technique for designing probabilistic safety certificates for systems with latent variables. A key technical enabler is the formulation of invariance conditions in probability space, which can be constructed using observed statistics in the presence of distribution shifts due to latent variables. We use this invariance condition to construct a safety certificate that can be implemented efficiently in real-time control. The proposed safety certificate can continuously find feasible actions that control long-term risk to stay within tolerance. Stochastic safe control and (causal) reinforcement learning have been studied in isolation until now. To the best of our knowledge, the proposed work is the first to use causal reinforcement learning to quantify long-term risk for the design of safety certificates. This integration enables safety certificates to efficiently ensure long-term safety in the presence of latent variables. The effectiveness of the proposed safety certificate is demonstrated in numerical simulations.
comment: Accepted to ICML 2025
☆ Inverse Chance Constrained Optimal Power Flow
The chance constrained optimal power flow (CC-OPF) essentially finds the low-cost generation dispatch scheme ensuring operational constraints are met with a specified probability, termed the security level. While the security level is a crucial input parameter, how it shapes the CC-OPF feasibility boundary has not been revealed. Changing the security level from a parameter to a decision variable, this letter proposes the inverse CC-OPF that seeks the highest feasible security level supported by the system. To efficiently solve this problem, we design a Newton-Raphson-like iteration algorithm leveraging the duality-based sensitivity analysis of an associated surrogate problem. Numerical experiments validate the proposed approach, revealing complex feasibility boundaries for security levels that underscore the importance of coordinating security levels across multiple chance constraints.
comment: 3 pages, 1 figure
♻ ☆ A Linear Parameter-Varying Framework for the Analysis of Time-Varying Optimization Algorithms
In this paper we propose a framework to analyze iterative first-order optimization algorithms for time-varying convex optimization. We assume that the temporal variability is caused by a time-varying parameter entering the objective, which can be measured at the time of decision but whose future values are unknown. We consider the case of strongly convex objective functions with Lipschitz continuous gradients under a convex constraint set. We model the algorithms as discrete-time linear parameter varying (LPV) systems in feedback with monotone operators such as the time-varying gradient. We leverage the approach of analyzing algorithms as uncertain control interconnections with integral quadratic constraints (IQCs) and generalize that framework to the time-varying case. We propose novel IQCs that are capable of capturing the behavior of time-varying nonlinearities and leverage techniques from the LPV literature to establish novel bounds on the tracking error. Quantitative bounds can be computed by solving a semi-definite program and can be interpreted as an input-to-state stability result with respect to a disturbance signal which increases with the temporal variability of the problem. As a departure from results in this research area, our bounds introduce a dependence on different additional measures of temporal variations, such as the function value and gradient rate of change. We exemplify our main results with numerical experiments that showcase how our analysis framework is able to capture convergence rates of different first-order algorithms for time-varying optimization through the choice of IQC and rate bounds.
♻ ☆ Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control
We study a sequential decision-making problem for a profit-maximizing operator of an autonomous mobility-on-demand system. Optimizing a central operator's vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic's loss function to appropriately consider coordinated actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.
♻ ☆ Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems
The networked nature of multi-robot systems presents challenges in the context of multi-agent reinforcement learning. Centralized control policies do not scale with increasing numbers of robots, whereas independent control policies do not exploit the information provided by other robots, exhibiting poor performance in cooperative-competitive tasks. In this work we propose a physics-informed reinforcement learning approach able to learn distributed multi-robot control policies that are both scalable and make use of all the available information to each robot. Our approach has three key characteristics. First, it imposes a port-Hamiltonian structure on the policy representation, respecting energy conservation properties of physical robot systems and the networked nature of robot team interactions. Second, it uses self-attention to ensure a sparse policy representation able to handle time-varying information at each robot from the interaction graph. Third, we present a soft actor-critic reinforcement learning algorithm parameterized by our self-attention port-Hamiltonian control policy, which accounts for the correlation among robots during training while overcoming the need of value function factorization. Extensive simulations in different multi-robot scenarios demonstrate the success of the proposed approach, surpassing previous multi-robot reinforcement learning solutions in scalability, while achieving similar or superior performance (with averaged cumulative reward up to x2 greater than the state-of-the-art with robot teams x6 larger than the number of robots at training time). We also validate our approach on multiple real robots in the Georgia Tech Robotarium under imperfect communication, demonstrating zero-shot sim-to-real transfer and scalability across number of robots.
comment: Paper accepted and published at IEEE T-RO
♻ ☆ The value of hedging against energy storage uncertainties when designing energy parks
Energy storage is needed to match renewable generation to industrial loads in energy parks. However, the future performance of bulk storage technologies is currently highly uncertain. Due to the urgency of decarbonization targets, energy park projects must be designed and begun now. But, as uncertainty in storage performance reduces, a different technology than identified during initial design may turn out cheaper. Enabling flexibility so that design adaptations can be made as better information becomes available would lower the cost of decarbonizing industry. But having this flexibility is itself costly. This raises the question, "Is it worth it?" This study quantifies the benefit of retaining flexibility to adapt energy park designs and optionality over storage technology choice as uncertainty reduces, to determine whether it is economically worthwhile. It applies the Value of Information analysis framework to the sizing of wind, solar, and storage in an illustrative energy park model based on a real-world proposal near Rotterdam, considering uncertainty in storage efficiency, lifetime, and capital cost. Updating asset sizings after storage uncertainty reduced is found to reduce total costs by 18% on average. Having the option to switch storage technology choice as well reduces costs by a further 13%, which is substantially greater than the cost of providing storage optionality. Using two storage technologies in the energy park reduces costs by 14%, and in this case storage optionality is not worthwhile. These results are robust to the level of uncertainty reduction in storage performance, and the risk aversion of the system designer.
comment: 35 pages, 10 figures, 10 tables
GenControl: Generative AI-Driven Autonomous Design of Control Algorithms
Designing controllers for complex industrial electronic systems is challenging due to nonlinearities and parameter uncertainties, and traditional methods are often slow and costly. To address this, we propose a novel autonomous design framework driven by Large Language Models (LLMs). Our approach employs a bi-level optimization strategy: an LLM intelligently explores and iteratively improves the control algorithm's structure, while a Particle Swarm Optimization (PSO) algorithm efficiently refines the parameters for any given structure. This method achieves end-to-end automated design. Validated through a simulation of a DC-DC Boost converter, our framework successfully evolved a basic controller into a high-performance adaptive version that met all stringent design specifications for fast response, low error, and robustness. This work presents a new paradigm for control design that significantly enhances automation and efficiency.
♻ ☆ G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation CVPR 2025
Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D semantic representation by leveraging foundation models. Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates. This integration enables complete semantic understanding even under occlusions while eliminating manual annotation requirements. By incorporating semantic flow into diffusion policies, we demonstrate significant improvements in both terminal-constrained manipulation and cross-object generalization. Extensive experiments across five simulation tasks show that G3Flow consistently outperforms existing approaches, achieving up to 68.3% and 50.1% average success rates on terminal-constrained manipulation and cross-object generalization tasks respectively. Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.
comment: Webpage: https://tianxingchen.github.io/G3Flow/, accepted to CVPR 2025
Optimization and Control 19
☆ Wisdom of Crowds Through Myopic Self-Confidence Adaptation
The wisdom of crowds is an umbrella term for phenomena suggesting that the collective judgment or decision of a large group can be more accurate than the individual judgments or decisions of the group members. A well-known example illustrating this concept is the competition at a country fair described by Galton, where the median value of the individual guesses about the weight of an ox resulted in an astonishingly accurate estimate of the actual weight. This phenomenon resembles classical results in probability theory and relies on independent decision-making. The accuracy of the group's final decision can be significantly reduced if the final agents' opinions are driven by a few influential agents. In this paper, we consider a group of agents who initially possess uncorrelated and unbiased noisy measurements of a common state of the world. Assume these agents iteratively update their estimates according to a simple non-Bayesian learning rule, commonly known in mathematical sociology as the French-DeGroot dynamics or iterative opinion pooling. As a result of this iterative distributed averaging process, each agent arrives at an asymptotic estimate of the state of the world, with the variance of this estimate determined by the matrix of weights the agents assign to each other. Every agent aims at minimizing the variance of her asymptotic estimate of the state of the world; however, such variance is also influenced by the weights allocated by other agents. To achieve the best possible estimate, the agents must then solve a game-theoretic, multi-objective optimization problem defined by the available sets of influence weights. We characterize both the Pareto frontier and the set of Nash equilibria in the resulting game. Additionally, we examine asynchronous best-response dynamics for the group of agents and prove their convergence to the set of strict Nash equilibria.
☆ Symbolic Reduction for Formal Synthesis of Global Lyapunov Functions
We investigate the formal synthesis of global polynomial Lyapunov functions for polynomial vector fields. We establish that a sign-definite polynomial must satisfy specific algebraic constraints, which we leverage to develop a set of straightforward symbolic reduction rules. These rules can be recursively applied to symbolically simplify the Lyapunov candidate, enabling more efficient and robust discovery of Lyapunov functions via optimization or satisfiability modulo theories (SMT) solving. In many cases, without such simplification, finding a valid Lyapunov function is often infeasible. When strict Lyapunov functions are unavailable, we design synthesis procedures for finding weak Lyapunov functions to verify global asymptotic stability using LaSalle's invariance principle. Finally, we encode instability conditions for Lyapunov functions and develop SMT procedures to disprove global asymptotic stability. Through a series of examples, we demonstrate that the proposed symbolic reduction, LaSalle-type conditions, and instability tests allow us to efficiently solve many cases that would otherwise be challenging.
comment: An extended version of a paper to be presented at QEST + FORMATS 2025
☆ On the Linear Speedup of the Push-Pull Method for Decentralized Optimization over Digraphs
The linear speedup property is essential for demonstrating the advantage of distributed algorithms over their single-node counterparts. In this paper, we study the stochastic Push-Pull method, a widely adopted decentralized optimization algorithm over directed graphs (digraphs). Unlike methods that rely solely on row-stochastic or column-stochastic mixing matrices, Push-Pull avoids nonlinear correction and has shown superior empirical performance across a variety of settings. However, its theoretical analysis remains challenging, and the linear speedup property has not been generally establishe--revealing a significant gap between empirical success and limited theoretical understanding. To bridge this gap, we propose a novel analysis framework and prove that Push-Pull achieves linear speedup over arbitrary strongly connected digraphs. Our results provide the comprehensive theoretical understanding for stochastic Push-Pull, aligning its theory with empirical performance. Code: \href{https://github.com/pkumelon/PushPull}{https://github.com/pkumelon/PushPull}.
☆ ROBBO: An Efficient Method for Pareto Front Estimation with Guaranteed Accuracy
A new method to estimate the Pareto Front (PF) in bi-objective optimization problems is presented. Assuming a continuous PF, the approach, named ROBBO (RObust and Balanced Bi-objective Optimization), needs to sample at most a finite, pre-computed number of PF points. Upon termination, it guarantees that the worst-case approximation error lies within a desired tolerance range, predefined by the decision maker, for each of the two objective functions. Theoretical results are derived, about the worst-case number of PF samples required to guarantee the wanted accuracy, both in general and for specific sampling methods from the literature. A comparative analysis, both theoretical and numerical, demonstrates the superiority of the proposed method with respect to popular ones. The approach is finally showcased in a constrained path-following problem for a 2-axis positioning system and in a steady-state optimization problem for a Continuous-flow Stirred Tank Reactor. An open demo implementation of ROBBO is made available online.
comment: 35 pages, 10 figures, under review
☆ Non-Euclidean Enriched Contraction Theory for Monotone Operators and Monotone Dynamical Systems
We adopt an operator-theoretic perspective to analyze a class of nonlinear fixed-point iterations and discrete-time dynamical systems. Specifically, we study the Krasnoselskij iteration - at the heart of countless algorithmic schemes and underpinning the stability analysis of numerous dynamical models - by focusing on a non-Euclidean vector space equipped with the diagonally weighted supremum norm. By extending the state of the art, we introduce the notion of enriched weak contractivity, which (i) is characterized by a simple, verifiable condition for Lipschitz operators, and (ii) yields explicit bounds on the admissible step size for the Krasnoselskij iteration. Our results relate the notion of weak contractivity with that of monotonicity of operators and dynamical systems and show its generality to design larger step sizes and improved convergence speed for broader classes of dynamical systems. The newly developed theory is illustrated on two applications: the design of zero-finding algorithms for monotone operators and the design of nonlinear consensus dynamics in monotone multi-agent dynamical systems.
comment: 13 pages, 2 figure
☆ Inverse Chance Constrained Optimal Power Flow
The chance constrained optimal power flow (CC-OPF) essentially finds the low-cost generation dispatch scheme ensuring operational constraints are met with a specified probability, termed the security level. While the security level is a crucial input parameter, how it shapes the CC-OPF feasibility boundary has not been revealed. Changing the security level from a parameter to a decision variable, this letter proposes the inverse CC-OPF that seeks the highest feasible security level supported by the system. To efficiently solve this problem, we design a Newton-Raphson-like iteration algorithm leveraging the duality-based sensitivity analysis of an associated surrogate problem. Numerical experiments validate the proposed approach, revealing complex feasibility boundaries for security levels that underscore the importance of coordinating security levels across multiple chance constraints.
comment: 3 pages, 1 figure
☆ Nonconvex Nonsmooth Multicomposite Optimization and Its Applications to Recurrent Neural Networks
We consider a class of nonconvex nonsmooth multicomposite optimization problems where the objective function consists of a Tikhonov regularizer and a composition of multiple nonconvex nonsmooth component functions. Such optimization problems arise from tangible applications in machine learning and beyond. To define and compute its first-order and second-order d(irectional)-stationary points effectively, we first derive the closed-form expression of the tangent cone for the feasible region of its constrained reformulation. Building on this, we establish its equivalence with the corresponding constrained and $\ell_1$-penalty reformulations in terms of global optimality and d-stationarity. The equivalence offers indirect methods to attain the first-order and second-order d-stationary points of the original problem in certain cases. We apply our results to the training process of recurrent neural networks (RNNs).
☆ Classical optimization algorithms for diagonalizing quantum Hamiltonians
Diagonalizing a Hamiltonian, which is essential for simulating its long-time dynamics, is a key primitive in quantum computing and has been proven to yield a quantum advantage for several specific families of Hamiltonians. Yet, despite its importance, only a handful of diagonalization algorithms exist, and correspondingly few families of fast-forwardable Hamiltonians have been identified. This paper introduces classical optimization algorithms for Hamiltonian diagonalization by formulating a cost function that penalizes off-diagonal terms and enforces unitarity via an orthogonality constraint, both expressed in the Pauli operator basis. We pinpoint a class of Hamiltonians that highlights severe drawbacks of existing methods, including exponential per-iteration cost, exponential circuit depth, or convergence to spurious optima. Our approach overcomes these shortcomings, achieving polynomial-time efficiency while provably avoiding suboptimal points. As a result, we broaden the known realm of fast-forwardable systems, showing that quantum-diagonalizable Hamiltonians extend to cases generated by exponentially large Lie algebras. On the practical side, we also present a randomized-coordinate variant that achieves a more efficient per-iteration cost than the deterministic counterpart. We demonstrate the effectiveness of these algorithms through explicit examples and numerical experiments.
♻ ☆ Evolution of Measures in Nonsmooth Dynamical Systems: Formalisms and Computation
This article develops mathematical formalisms and provides numerical methods for studying the evolution of measures in nonsmooth dynamical systems using the continuity equation. The nonsmooth dynamical system is described by an evolution variational inequality and we derive the continuity equation associated with this system class using three different formalisms. The first formalism consists of using the {superposition principle} to describe the continuity equation for a measure that disintegrates into a probability measure supported on the set of vector fields and another measure representing the distribution of system trajectories at each time instant. The second formalism is based on the regularization of the nonsmooth vector field and describing the measure as the limit of a sequence of measures associated with the regularization parameter. In doing so, we obtain quantitative bounds on the Wasserstein metric between measure solutions of the regularized vector field and the limiting measure associated with the nonsmooth vector field. The third formalism uses a time-stepping algorithm to model a time-discretized evolution of the measures and show that the absolutely continuous trajectories associated with the continuity equation are recovered in the limit as the sampling time goes to zero. We also validate each formalism with numerical examples. For the first formalism, we use polynomial optimization techniques and the moment-SOS hierarchy to obtain approximate moments of the measures. For the second formalism, we illustrate the bounds on the Wasserstein metric for an academic example for which the closed-form expression of the Wasserstein metric can be calculated. For the third formalism, we illustrate the time-stepping based algorithm for measure evolution on an example that shows the effect of the concentration of measures.
comment: This is the version closest to the published article
♻ ☆ A Linear Parameter-Varying Framework for the Analysis of Time-Varying Optimization Algorithms
In this paper we propose a framework to analyze iterative first-order optimization algorithms for time-varying convex optimization. We assume that the temporal variability is caused by a time-varying parameter entering the objective, which can be measured at the time of decision but whose future values are unknown. We consider the case of strongly convex objective functions with Lipschitz continuous gradients under a convex constraint set. We model the algorithms as discrete-time linear parameter varying (LPV) systems in feedback with monotone operators such as the time-varying gradient. We leverage the approach of analyzing algorithms as uncertain control interconnections with integral quadratic constraints (IQCs) and generalize that framework to the time-varying case. We propose novel IQCs that are capable of capturing the behavior of time-varying nonlinearities and leverage techniques from the LPV literature to establish novel bounds on the tracking error. Quantitative bounds can be computed by solving a semi-definite program and can be interpreted as an input-to-state stability result with respect to a disturbance signal which increases with the temporal variability of the problem. As a departure from results in this research area, our bounds introduce a dependence on different additional measures of temporal variations, such as the function value and gradient rate of change. We exemplify our main results with numerical experiments that showcase how our analysis framework is able to capture convergence rates of different first-order algorithms for time-varying optimization through the choice of IQC and rate bounds.
♻ ☆ On the fast convergence of minibatch heavy ball momentum
Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.
comment: update to match journal version
♻ ☆ Locational Energy Storage Bid Bounds for Facilitating Social Welfare Convergence
This paper proposes a novel method to generate bid bounds that can serve as offer caps for energy storage in electricity markets to help reduce system costs and regulate potential market power exercises. We derive the bid bounds based on a tractable multi-period economic dispatch chance-constrained formulation that systematically incorporates the uncertainty and risk preference of the system operator. The key analytical results verify that the bounds effectively cap storage bids across all uncertainty scenarios with a guaranteed confidence level. We show that bid bounds decrease as the state of charge increases but rise with greater netload uncertainty and risk preference. We test the effectiveness of the proposed pricing mechanism based on the 8-bus ISO-NE test system, including agent-based storage bidding models. Simulation results demonstrate that the proposed bid bounds effectively align storage bids with the social welfare objective and outperform existing deterministic bid bounds. Under 30% renewable capacity and 20% storage capacity, the bid bounds contribute to an average reduction of 0.17% in system cost, while increasing storage profit by an average of 10.16% across various system uncertainty scenarios and bidding strategies. These benefits scale up with increased storage economic withholding and storage capacity.
♻ ☆ Bounding-Focused Discretization Methods for the Global Optimization of Nonconvex Semi-Infinite Programs
We use sensitivity analysis to design bounding-focused discretization (cutting-surface) methods for the global optimization of nonconvex semi-infinite programs (SIPs). We begin by formulating the optimal bounding-focused discretization of SIPs as a max-min problem and propose variants that are more computationally tractable. We then use parametric sensitivity theory to design an effective heuristic approach for solving these max-min problems. We also show how our new iterative discretization methods may be modified to ensure that the solutions of their discretizations converge to an optimal solution of the SIP. We then formulate optimal bounding-focused generalized discretization of SIPs as max-min problems and design heuristic algorithms for their solution. Numerical experiments on standard nonconvex SIP test instances from the literature demonstrate that our new bounding-focused discretization methods can significantly reduce the number of iterations for convergence relative to a state-of-the-art feasibility-focused discretization method.
♻ ☆ Sustainable Water Treatment through Fractional-Order Chemostat Modeling with Sliding Memory and Periodic Boundary Conditions: A Mathematical Framework for Clean Water and Sanitation
This work develops and analyzes a novel fractional-order chemostat system (FOCS) with a Caputo fractional derivative (CFD) featuring a sliding memory window and periodic boundary conditions (PBCs), designed to model microbial pollutant degradation in sustainable water treatment. By incorporating the Caputo fractional derivative with sliding memory (CFDS), the model captures time-dependent behaviors and memory effects in biological systems more realistically than classical integer-order formulations. We reduce the two-dimensional fractional differential equations (FDEs) governing substrate and biomass concentrations to a one-dimensional FDE by utilizing the PBCs. The existence and uniqueness of non-trivial, periodic solutions are established using the Caratheodory framework and fixed-point theorems, ensuring the system's well-posedness. We prove the positivity and boundedness of solutions, demonstrating that substrate concentrations remain within physically meaningful bounds and biomass concentrations stay strictly positive, with solution trajectories confined to a biologically feasible invariant set. Additionally, we analyze non-trivial equilibria under constant dilution rates and derive their stability properties. The rigorous mathematical results confirm the viability of FOCS models for representing memory-driven, periodic bioprocesses, offering a foundation for advanced water treatment strategies that align with Sustainable Development Goal 6 (Clean Water and Sanitation).
♻ ☆ Copositive geometry of Feynman integrals
Copositive matrices and copositive polynomials are objects from optimization. We connect these to the geometry of Feynman integrals in physics. The integral is guaranteed to converge if its kinematic parameters lie in the copositive cone. P\'olya's method makes this manifest. We study the copositive cone for the second Symanzik polynomial of any Feynman graph. Its algebraic boundary is described by Landau discriminants.
comment: Final version to appear in Letters in Mathematical Physics
♻ ☆ Bishop-Phelps Type Scalarization for Vector Optimization in Real Topological-Linear Spaces
It is well-known that scalarization techniques (e.g., in the sense of Gerstewitz; Kasimbeyli; Pascoletti and Serafini; Zaffaroni) are useful for generating (weakly, properly) efficient solutions of vector optimization problems. One recognized approach is the conic scalarization method in vector optimization in real normed spaces proposed by Kasimbeyli (2010, SIAM J Optim 20), which is based on augmented dual cones and Bishop-Phelps type (norm-linear) scalarizing functions. In this paper, we present new results on cone separation in real topological-linear spaces by using Bishop-Phelps type separating cones / separating seminorm-linear functions. Moreover, we study some extensions of known scalarization results in vector optimization (in the sense of Eichfelder; Gerstewitz; Jahn; Kasimbeyli; Pascoletti and Serafini). On this basis, we propose a Bishop-Phelps type scalarization method for vector optimization problems in real topological-linear spaces, which is based on Bishop-Phelps type cone representing and cone monotone scalarizing functions (e.g., Gerstewitz scalarizing functions or seminorm-linear scalarizing functions). Thus, our method also extends Kasimbeyli's conic scalarization method from real normed spaces to real topological-linear spaces. Within this framework, we derive new Bishop-Phelps type scalarization results for the concepts of weak efficiency and different types of proper efficiency.
♻ ☆ An Efficient Method for Extracting the Shortest Path from the Dubins Set for Short Distances Between Initial and Final Positions
Path planning is crucial for the efficient operation of Autonomous Mobile Robots (AMRs) in factory environments. Many existing algorithms rely on Dubins paths, which have been adapted for various applications. However, an efficient method for directly determining the shortest Dubins path remains underdeveloped. This paper presents a comprehensive approach to efficiently identify the shortest path within the Dubins set. We classify the initial and final configurations into six equivalency groups based on the quadrants formed by their orientation angle pairs. Paths within each group exhibit shared topological properties, enabling a reduction in the number of candidate cases to analyze. This pre-classification step simplifies the problem and eliminates the need to explicitly compute and compare the lengths of all possible paths. As a result, the proposed method significantly lowers computational complexity. Extensive experiments confirm that our approach consistently outperforms existing methods in terms of computational efficiency.
comment: 9 pages, 10 figures
♻ ☆ Interpretable global minima of deep ReLU neural networks on sequentially separable data
We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
comment: AMS Latex, 31 pages, 3 figures
♻ ☆ Complexity of trust-region methods in the presence of unbounded Hessian approximations
We extend traditional complexity analyses of trust-region methods for unconstrained, possibly nonconvex, optimization. Whereas most complexity analyses assume uniform boundedness of the model Hessians, we work with potentially unbounded model Hessians. Boundedness is not guaranteed in practical implementations, in particular ones based on quasi-Newton updates such as PSB, BFGS and SR1. Our analysis is conducted for a family of trust-region methods that includes most known methods as special cases. We examine two regimes of Hessian growth: one bounded by a power of the number of successful iterations, and one bounded by a power of the number of iterations. This allows us to formalize and address the intuition of Powell [IMA J. Numer. Ana. 30(1):289-301,2010], who studied convergence under a special case of our assumptions, but whose proof contained complexity arguments. Specifically, for \(0 \leq p < 1\), we establish sharp \(O([(1-p)\epsilon^{-2}]^{1/(1-p)})\) evaluation complexity to find an \(\epsilon\)-stationary point when model Hessians are \(O(|\mathcal{S}_{k-1}|^p)\), where \(|\mathcal{S}_{k-1}|\) is the number of iterations where the step was accepted, up to iteration \(k-1\). For \(p = 1\), which is the case studied by Powell, we establish a sharp \(O(\exp(c_1\epsilon^{-2}))\) evaluation complexity for a certain constant \(c_1 > 0\). We establish similar sharp bounds when model Hessians are \(O(k^p)\), where \(k\) is the iteration counter, for \(0 \leq p < 1\). When \(p = 1\), the complexity bound depends on the parameters of the family, but reduces to \(O((1 - \log(\epsilon))\exp(c_2\epsilon^{-2}))\) for a certain constant \(c_2 > 0\) for the special case of the standard trust-region method. As special cases, we derive novel complexity bounds for (strongly) convex objectives under the same growth assumptions.
Systems and Control 20
☆ Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking RSS 2025
Learning-based control approaches like reinforcement learning (RL) have recently produced a slew of impressive results for tasks like quadrotor trajectory tracking and drone racing. Naturally, it is common to demonstrate the advantages of these new controllers against established methods like analytical controllers. We observe, however, that reliably comparing the performance of such very different classes of controllers is more complicated than might appear at first sight. As a case study, we take up the problem of agile tracking of an end-effector for a quadrotor with a fixed arm. We develop a set of best practices for synthesizing the best-in-class RL and geometric controllers (GC) for benchmarking. In the process, we resolve widespread RL-favoring biases in prior studies that provide asymmetric access to: (1) the task definition, in the form of an objective function, (2) representative datasets, for parameter optimization, and (3) feedforward information, describing the desired future trajectory. The resulting findings are the following: our improvements to the experimental protocol for comparing learned and classical controllers are critical, and each of the above asymmetries can yield misleading conclusions. Prior works have claimed that RL outperforms GC, but we find the gaps between the two controller classes are much smaller than previously published when accounting for symmetric comparisons. Geometric control achieves lower steady-state error than RL, while RL has better transient performance, resulting in GC performing better in relatively slow or less agile tasks, but RL performing better when greater agility is required. Finally, we open-source implementations of geometric and RL controllers for these aerial vehicles, implementing best practices for future development. Website and code is available at https://pratikkunapuli.github.io/rl-vs-gc/
comment: Accepted for publication to RSS 2025. 10 pages, 5 figures. Project website: https://pratikkunapuli.github.io/rl-vs-gc/
☆ Maximum-likelihood reprojections for reliable Koopman-based predictions and bifurcation analysis of parametric dynamical systems
Koopman-based methods leverage a nonlinear lifting to enable linear regression techniques. Consequently, data generation, learning and prediction is performed through the lens of this lifting, giving rise to a nonlinear manifold that is invariant under the Koopman operator. In data-driven approximation such as Extended Dynamic Mode Decomposition, this invariance is typically lost due to the presence of (finite-data) approximation errors. In this work, we show that reprojections are crucial for reliable predictions. We provide an approach via closest-point projections that ensure consistency with this nonlinear manifold, which is strongly related to a Riemannian metric and maximum likelihood estimates. While these results are already novel for autonomous systems, we present our approach for parametric systems, providing the basis for data-driven bifurcation analysis and control applications.
comment: 22 pages, 9 figures
☆ Quasiparticle Dynamics in NbN Superconducting Microwave Resonators at Single Photon Regime
Exchanging energy below the superconducting gap introduces quasiparticle energy distributions in superconducting quantum circuits, which will be responsible for their decoherence. This study examines the impact of quasiparticle energy on the performance of NbN superconducting microwave coplanar waveguide resonators on silicon chips. We measured the resonance frequency and internal quality factor in response to temperature sweeps to evaluate the effect of quasiparticle dynamics. Moreover, by calculating the complex conductivity of the NbN film, we identified the contribution of quasiparticle density to the experimental results.
☆ RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in visuomotor control, yet ensuring their robustness in unstructured real-world environments remains a persistent challenge. In this paper, we investigate test-time scaling through the lens of sampling and verification as means to enhance the robustness and generalization of VLAs. We first demonstrate that the relationship between action error and the number of generated samples follows an exponentiated power law across a range of VLAs, indicating the existence of inference-time scaling laws. Building on these insights, we introduce RoboMonkey, a test-time scaling framework for VLAs. At deployment, RoboMonkey samples a small set of actions from a VLA, applies Gaussian perturbation and majority voting to construct an action proposal distribution, and then uses a Vision Language Model (VLM)-based verifier to select the optimal action. We propose a synthetic data generation pipeline for training such VLM-based action verifiers, and demonstrate that scaling the synthetic dataset consistently improves verification and downstream accuracy. Through extensive simulated and hardware experiments, we show that pairing existing VLAs with RoboMonkey yields significant performance gains, achieving a 25% absolute improvement on out-of-distribution tasks and 8% on in-distribution tasks. Additionally, when adapting to new robot setups, we show that fine-tuning both VLAs and action verifiers yields a 7% performance increase compared to fine-tuning VLAs alone.
☆ In silico evaluation of pramlintide dosing algorithms in artificial pancreas systems
Pramlintide's capability to delay gastric emptying has motivated its use in artificial pancreas systems, accompanying insulin as a control action. Due to the scarcity of pramlintide simulation models in the literature, in silico testing of insulin-plus-pramlintide strategies is not widely used. This work incorporates a recent pramlintide pharmacokinetics/pharmacodynamics model into the T1DM UVA/Padova simulator to adjust and validate four insulin-plus-pramlintide control algorithms. The proposals are based on an existing insulin controller and administer pramlintide either as independent boluses or as a ratio of the insulin infusion. The results of the insulin-pramlintide algorithms are compared against their insulin-alone counterparts, showing an improvement in the time in range between 3.00\% and 10.53\%, consistent with results reported in clinical trials in the literature. Future work will focus on individualizing the pramlintide model to the patients' characteristics and evaluating the implemented strategies under more challenging scenarios.
☆ Optimizing Exploration with a New Uncertainty Framework for Active SLAM Systems
Accurate reconstruction of the environment is a central goal of Simultaneous Localization and Mapping (SLAM) systems. However, the agent's trajectory can significantly affect estimation accuracy. This paper presents a new method to model map uncertainty in Active SLAM systems using an Uncertainty Map (UM). The UM uses probability distributions to capture where the map is uncertain, allowing Uncertainty Frontiers (UF) to be defined as key exploration-exploitation objectives and potential stopping criteria. In addition, the method introduces the Signed Relative Entropy (SiREn), based on the Kullback-Leibler divergence, to measure both coverage and uncertainty together. This helps balance exploration and exploitation through an easy-to-understand parameter. Unlike methods that depend on particular SLAM setups, the proposed approach is compatible with different types of sensors, such as cameras, LiDARs, and multi-sensor fusion. It also addresses common problems in exploration planning and stopping conditions. Furthermore, integrating this map modeling approach with a UF-based planning system enables the agent to autonomously explore open spaces, a behavior not previously observed in the Active SLAM literature. Code and implementation details are available as a ROS node, and all generated data are openly available for public use, facilitating broader adoption and validation of the proposed approach.
☆ Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes
Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output measurements, by working in a Paley-Wiener reproducing kernel Hilbert space. Kernel norm bounds are tightened using a uniformly-randomized Hoeffding's inequality for small samples and an empirical Bernstein bound for larger ones. We derive an approximate threshold, based on the sample size and how informative the inputs are, that governs which bound to deploy. Finally, we apply majority voting to aggregate confidence sets from random subsamples, boosting both stability and region size. We prove that even per-input aggregated intervals retain their simultaneous coverage guarantee. These refinements are also validated through numerical experiments.
☆ Location Information Sharing Using Software Defined Radio in Multi-UAV Systems
SDR (Software Defined Radio) provides flexible, reproducible, and longer-lasting radio tools for military and civilian wireless communications infrastructure. SDR is a radio communication system whose components are implemented as software. This study aims to establish multi-channel wireless communication with FANET between two SDRs to share location information and examine it in a realistic test environment. We used multi-channel token circulation as a channel access protocol and GNU Radio platform for SDR software development. The structures of the communication layer, including the protocols, communication systems, and network structures suggested in the studies in the literature, are generally tested in the simulation environment. The simulation environment provides researchers with fast and easy development and testing, but disadvantages exist. These cause a product to be isolated from hardware, software, and cost effects encountered while developing and environmental factors affecting the communication channel while testing. Another contribution of the study is to present the developed block diagrams and codes as clear and reproducible. The developed software and block diagrams are available at github.com/knrl/uav-in-802.11-gnuradio.
comment: 10 pages, 6 figure
☆ Quantification of Sim2Real Gap via Neural Simulation Gap Function
In this paper, we introduce the notion of neural simulation gap functions, which formally quantifies the gap between the mathematical model and the model in the high-fidelity simulator, which closely resembles reality. Many times, a controller designed for a mathematical model does not work in reality because of the unmodelled gap between the two systems. With the help of this simulation gap function, one can use existing model-based tools to design controllers for the mathematical system and formally guarantee a decent transition from the simulation to the real world. Although in this work, we have quantified this gap using a neural network, which is trained using a finite number of data points, we give formal guarantees on the simulation gap function for the entire state space including the unseen data points. We collect data from high-fidelity simulators leveraging recent advancements in Real-to-Sim transfer to ensure close alignment with reality. We demonstrate our results through two case studies - a Mecanum bot and a Pendulum.
☆ ARCH-COMP25 Category Report: Stochastic Models
This report is concerned with a friendly competition for formal verification and policy synthesis of stochastic models. The main goal of the report is to introduce new benchmarks and their properties within this category and recommend next steps toward next year's edition of the competition. In particular, this report introduces three recently developed software tools, a new water distribution network benchmark, and a collection of simplified benchmarks intended to facilitate further comparisons among tools that were previously not directly comparable. This friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in Summer 2025.
♻ ☆ Age of Computing: A Metric of Computation Freshness in Communication and Computation Cooperative Networks
In communication and computation cooperative networks (3CNs), timely computation is crucial but not always guaranteed. There is a strong demand for a computational task to be completed within a given deadline. The time taken involves both processing time, communication time, and the impact of the deadline. However, a measure of such timeliness in 3CNs is lacking. In this paper, we introduce the novel concept, Age of Computing (AoC), to capture computation freshness in 3CNs. We analyze AoC in a line topology consisting of a source, a transmitter, a receiver, and a computational node. Tasks generated by the source are immediately available at the transmitter, where they enter a communication queue. These tasks then pass to the receiver and subsequently to a computation queue at the computational node for processing. Each task has a deadline, requiring completion within this timeframe. AoC is evaluated under two types of deadlines: (i) soft deadline, tasks can be fed back to the source if delayed beyond the deadline, but with additional latency; (ii) hard deadline, tasks delayed beyond the deadline are discarded. Under both deadlines, we derive the AoC formula and a general expression for the time-average AoC. For the first-come, first-serve discipline, we obtain a closed-form expression for the average AoC under the soft deadline and an approximation for the hard deadline. In addition to freshness, we define computation throughput, providing a general expression and an approximation. To explore the relationship between freshness and throughput, we construct an optimization problem and prove that the objective pair is a weakly Pareto-optimal point. Numerical results validate all the theoretical findings. Additionally, they reveal that under the hard deadline, the computation throughput serves as a reliable proxy for the average AoC.
♻ ☆ A model predictive control framework with robust stability guarantees under large disturbances
To address feasibility issues in model predictive control (MPC), most implementations relax state constraints by using slack variables and adding a penalty to the cost. We propose an alternative strategy: relaxing the initial state constraint with a penalty. Compared to state-of-the-art soft constrained MPC formulations, the proposed formulation has two key features: (i) input-to-state stability and bounds on the cumulative constraint violation for large disturbances; (ii) close-to-optimal performance under nominal operating conditions. The idea is initially presented for open-loop asymptotically stable nonlinear systems by designing the penalty as a Lyapunov function, but we also show how to relax this condition to: i) Lyapunov stable systems; ii) stabilizable systems; and iii) utilizing an implicit characterization of the Lyapunov function. In the special case of linear systems, the proposed MPC formulation reduces to a quadratic program, and the offline design and online computational complexity are only marginally increased compared to a nominal design. Numerical examples demonstrate benefits compared to state-of-the-art soft-constrained MPC formulations.
♻ ☆ Recursive Gaussian Process State Space Model
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
♻ ☆ Closed-loop Performance Optimization of Model Predictive Control with Robustness Guarantees
Model mismatch and process noise are two frequently occurring phenomena that can drastically affect the performance of model predictive control (MPC) in practical applications. We propose a principled way to tune the cost function and the constraints of linear MPC schemes to improve the closed-loop performance and robust constraint satisfaction on uncertain nonlinear dynamics with additive noise. The tuning is performed using a novel MPC tuning algorithm based on backpropagation developed in our earlier work. Using the scenario approach, we provide probabilistic bounds on the likelihood of closed-loop constraint violation over a finite horizon. We showcase the effectiveness of the proposed method on linear and nonlinear simulation examples.
♻ ☆ Deep reinforcement learning-based joint real-time energy scheduling for green buildings with heterogeneous battery energy storage devices
Green buildings (GBs) with renewable energy and building energy management systems (BEMS) enable efficient energy use and support sustainable development. Electric vehicles (EVs), as flexible storage resources, enhance system flexibility when integrated with stationary energy storage systems (ESS) for real-time scheduling. However, differing degradation and operational characteristics of ESS and EVs complicate scheduling strategies. This paper proposes a model-free deep reinforcement learning (DRL) method for joint real-time scheduling based on a combined battery system (CBS) integrating ESS and EVs. We develop accurate degradation models and cost estimates, prioritize EV travel demands, and enable collaborative ESS-EV operation under varying conditions. A prediction model optimizes energy interaction between CBS and BEMS. To address heterogeneous states, action coupling, and learning efficiency, the DRL algorithm incorporates double networks, a dueling mechanism, and prioritized experience replay. Experiments show a 37.94 percent to 40.01 percent reduction in operating costs compared to a mixed-integer linear programming (MILP) approach.
♻ ☆ Object State Estimation Through Robotic Active Interaction for Biological Autonomous Drilling
Estimating the state of biological specimens is challenging due to limited observation through microscopic vision. For instance, during mouse skull drilling, the appearance alters little when thinning bone tissue because of its semi-transparent property and the high-magnification microscopic vision. To obtain the object's state, we introduce an object state estimation method for biological specimens through active interaction based on the deflection. The method is integrated to enhance the autonomous drilling system developed in our previous work. The method and integrated system were evaluated through 12 autonomous eggshell drilling experiment trials. The results show that the system achieved a 91.7% successful ratio and 75% detachable ratio, showcasing its potential applicability in more complex surgical procedures such as mouse skull craniotomy. This research paves the way for further development of autonomous robotic systems capable of estimating the object's state through active interaction.
comment: 6 pages, 5 figures, accepted by RA-L. Please refer to the DOI to access the accepted version
♻ ☆ Reset Controller Analysis and Design for Unstable Linear Plants using Scaled Relative Graphs
In this technical communique, we develop a graphical design procedure for reset controllers for unstable LTI plants based on recent developments on Scaled Relative Graph analysis, yielding an $L_2$-gain performance bound. The stabilizing controller consists of a second order reset element in parallel with a proportional gain. The proposed method goes beyond existing approaches that are limited to stable systems only, providing a well-applicable approach to design problems in practice where the plant is unstable.
comment: 5 pages, submitted on 16-06-2025 as a technical communique to Automatica. Second version with minor corrections 21-06-2025
♻ ☆ Privacy-Preserving Resilient Vector Consensus
This paper studies privacy-preserving resilient vector consensus in multi-agent systems against faulty agents, where normal agents can achieve consensus within the convex hull of their initial states while protecting state vectors from being disclosed. Specifically, we consider a modification of an existing algorithm known as Approximate Distributed Robust Convergence Using Centerpoints (ADRC), i.e., Privacy-Preserving ADRC (PP-ADRC). Under PP-ADRC, each normal agent introduces multivariate Gaussian noise to its state during each iteration. We first provide sufficient conditions to ensure that all normal agents' states can achieve mean square convergence under PP-ADRC. Then, we analyze convergence accuracy from two perspectives, i.e., the Mahalanobis distance of the final value from its expectation and the Hausdorff distance-based alteration of the convex hull caused by noise when only partial dimensions are added with noise. Then, we employ concentrated geo-privacy to characterize privacy preservation and conduct a thorough comparison with differential privacy. Finally, numerical simulations demonstrate the theoretical results.
♻ ☆ Distributed Optimization of Clique-Wise Coupled Problems via Three-Operator Splitting
This study explores distributed optimization problems with clique-wise coupling via operator splitting and how we can utilize this framework for performance analysis and enhancement. This framework extends beyond conventional pairwise coupled problems (e.g., consensus optimization) and is applicable to broader examples. To this end, we first introduce a new distributed optimization algorithm by leveraging a clique-based matrix and the Davis-Yin splitting (DYS), a versatile three-operator splitting method. We then demonstrate that this approach sheds new light on conventional algorithms in the following way: (i) Existing algorithms (NIDS, Exact diffusion, diffusion, and our previous work) can be derived from our proposed method; (ii) We present a new mixing matrix based on clique-wise coupling, which surfaces when deriving the NIDS. We prove its preferable distribution of eigenvalues, enabling fast consensus; (iii) These observations yield a new linear convergence rate for the NIDS with non-smooth objective functions. Remarkably our linear rate is first established for the general DYS with a projection for a subspace. This case is not covered by any prior results, to our knowledge. Finally, numerical examples showcase the efficacy of our proposed approach.
comment: 32 pages, 5 figures. Several modifications have been made mainly in numerical results
♻ ☆ Quantum-Enhanced Reinforcement Learning for Power Grid Security Assessment
The increasingly challenging task of maintaining power grid security requires innovative solutions. Novel approaches using reinforcement learning (RL) agents have been proposed to help grid operators navigate the massive decision space and nonlinear behavior of these complex networks. However, applying RL to power grid security assessment, specifically for combinatorially troublesome contingency analysis problems, has proven difficult to scale. The integration of quantum computing into these RL frameworks helps scale by improving computational efficiency and boosting agent proficiency by leveraging quantum advantages in action exploration and model-based interdependence. To demonstrate a proof-of-concept use of quantum computing for RL agent training and simulation, we propose a hybrid agent that runs on quantum hardware using IBM's Qiskit Runtime. We also provide detailed insight into the construction of parameterized quantum circuits (PQCs) for generating relevant quantum output. This agent's proficiency at maintaining grid stability is demonstrated relative to a benchmark model without quantum enhancement using N-k contingency analysis. Additionally, we offer a comparative assessment of the training procedures for RL models integrated with a quantum backend.
comment: 6 pages, 6 figures, 3 tables. Submitted to the 57th North American Power Symposium (NAPS) 2025
Optimization and Control 19
☆ On circumcentered direct methods for monotone variational inequality problems
The variational inequality problem (VIP) plays a central role in the theory and applications in continuous optimization. In particular, minimization problems and KKT systems can be regard as VIPs. In this work, we present the first methods using circumcenter iterations for solving VIPs. The circumcentered-reflection method (CRM) is a tool based on projections developed with the aim of finding a point in the intersection of finitely many closed convex sets. CRM has gone through enhancements and adaptations over the last few years and was proven to be faster in many settings than competitors such as alternating projections and the Douglas-Rachford method. One of the nice features of CRM is that it is able to deal with approximate projections, which is exactly what we enforced in this article theoretically and numerically. We present both a circumcenter method for the VIP with paramonotone operator and monotone operator, the first employing exact projections and the second approximate ones. Convergence results showing their convergence are established. Numerically, our experiments indicate good performance, including advantages over the well-know extragradient method.
☆ Pushing the Complexity Boundaries of Fixed-Point Equations: Adaptation to Contraction and Controlled Expansion
Fixed-point equations with Lipschitz operators have been studied for more than a century, and are central to problems in mathematical optimization, game theory, economics, and dynamical systems, among others. When the Lipschitz constant of the operator is larger than one (i.e., when the operator is expansive), it is well known that approximating fixed-point equations becomes computationally intractable even in basic finite-dimensional settings. In this work, we aim to push these complexity boundaries by introducing algorithms that can address problems with mildly expansive (i.e., with Lipschitz constant slightly larger than one) operators not excluded by existing lower bounds, attaining the best possible fixed-point error up to universal constants. We further introduce a class of \emph{gradually expansive operators} that allow for constant (up to $\approx 1.4$) expansion between points, for which we prove convergence to $\epsilon$-approximate fixed points in order-$(1/\epsilon)$ iterations for $\epsilon > 0.$ Our algorithms automatically adapt to the Lipschitz constant of the operator and attain optimal oracle complexity bounds when the input operator is nonexpansive or contractive. Our results apply to general, possibly infinite-dimensional normed vector spaces and can be extended to positively curved geodesic metric spaces.
☆ Regular Tree Search for Simulation Optimization
Tackling simulation optimization problems with non-convex objective functions remains a fundamental challenge in operations research. In this paper, we propose a class of random search algorithms, called Regular Tree Search, which integrates adaptive sampling with recursive partitioning of the search space. The algorithm concentrates simulations on increasingly promising regions by iteratively refining a tree structure. A tree search strategy guides sampling decisions, while partitioning is triggered when the number of samples in a leaf node exceeds a threshold that depends on its depth. Furthermore, a specific tree search strategy, Upper Confidence Bounds applied to Trees (UCT), is employed in the Regular Tree Search. We prove global convergence under sub-Gaussian noise, based on assumptions involving the optimality gap, without requiring continuity of the objective function. Numerical experiments confirm that the algorithm reliably identifies the global optimum and provides accurate estimates of its objective value.
☆ An Analytical Framework for the Linear Best-Worst Method and its Application to Achieve Sustainable Development Goals--Oriented Agri-Food Supply Chains
The Best-Worst Method (BWM) has emerged as a prominent multi-criteria decision-making method for determining the weights of the decision criteria. Among various BWM models, this research focuses on the linear model of the BWM. This model calculates weights by solving an optimization problem, necessitating optimization software. In this article, we present a novel framework that solves this optimization model mathematically, yielding an analytical expression for the resultant weights, thus eliminating the requirement for an optimization software. The proposed approach enhances both the conceptual clarity of the underlying optimization process and the computational efficiency of the model. Based of this framework, we demonstrate the model's limited response to data variations, i.e., its lower data sensitivity. We also compute the values of consistency index for the linear BWM, which are required to calculate the consistency ratio - a consistency indicator used for assessing inconsistency in input data. Finally, we illustrate the validity and applicability of the proposed approach through five numerical examples and a real-world case study that ranks eighteen drivers across three categories - Industry 4.0, sustainability, and circular economy - in relation to sustainable development goals-driven agri-food supply chains.
☆ Enhanced PDHG for Linear Programming with Online Preconditioning
We present an online preconditioning technique for the primal-dual hybrid gradient (PDHG) algorithm for linear programming (LP). The method adaptively updates primal and dual preconditioners using an online optimization framework. To improve its practical performance, we introduce several algorithmic enhancements, including using normalized online loss functions and updating preconditioners infrequently. We implement the technique on top of vanilla PDHG and the GPU-based LP solver cuPDLP.jl, and benchmark its performance on standard LP datasets. Our numerical experiments demonstrate that online preconditioning effectively reduces both iteration counts and overall solving time.
☆ A $C^0$ weak Galerkin method with preconditioning for constrained optimal control problems with general tracking
This paper presents a $C^0$ weak Galerkin ($C^0$-WG) method combined with an additive Schwarz preconditioner for solving optimal control problems (OCPs) governed by partial differential equations with general tracking cost functionals and pointwise state constraints. These problems pose significant analytical and numerical challenges due to the presence of fourth-order variational inequalities and the reduced regularity of solutions. Our first contribution is the design of a $C^0$-WG method based on globally continuous quadratic Lagrange elements, enabling efficient elementwise stiffness matrix assembly and parameter-free implementation while maintaining accuracy, as supported by a rigorous error analysis. As a second contribution, we develop an additive Schwarz preconditioner tailored to the $C^0$-WG method to improve solver performance for the resulting ill-conditioned linear systems. Numerical experiments confirm the effectiveness and robustness of the proposed method and preconditioner for both biharmonic and optimal control problems.
♻ ☆ Convex Constrained Controller Synthesis for Evolution Equations
We propose a convex controller synthesis framework for a large class of constrained linear systems, including those described by (deterministic and stochastic) partial differential equations and integral equations, commonly used in fluid dynamics, thermo-mechanical systems, quantum control, or transportation networks. Most existing control techniques rely on a (finite-dimensional) discrete description of the system, via ordinary differential equations. Here, we work instead with more general (infinite-dimensional) Hilbert spaces. This enables the discretization to be applied after the optimization (optimize-then-discretize). Using output-feedback SLS, we formulate the controller synthesis as a convex optimization problem. Structural constraints like sensor and communication delays, and locality constraints, are incorporated while preserving convexity, allowing parallel implementation and extending key SLS properties to infinite dimensions. The proposed approach and its benefits are demonstrated on a linear Boltzmann equation.
♻ ☆ Optimal Storage Design: An $L^{\infty}$ infused Inventory Control
Inventory and queueing systems are often designed by controlling weighted combination of some time-averaged performance metrics (like cumulative holding, shortage, server-utilization or congestion costs); but real-world constraints, like fixed storage or limited waiting space, require attention to peak levels reached during the operating period. This work formulates such control problems, which are any arbitrary weighted combination of some integral cost terms and an L-infinity(peak-level) term. The resultant control problem does not fall into standard control framework, nor does it have standard solution in terms of some partial differential equations. We introduce an auxiliary state variable to track the instantaneous peak-levels, enabling reformulation into the classical framework. We then propose a smooth approximation to handle the resultant discontinuities, and show the existence of unique value function that uniquely solves the corresponding Hamilton-Jacobi-Bellman equation. We apply this framework to two key applications to obtain an optimal design that includes controlling the peak-levels. Surprisingly, the numerical results show peak inventory can be minimized with negligible revenue loss (under 6%); without considering peak-control, the peak levels were significantly higher. The peak-optimal policies for queueing-system can reduce peak-congestion by up to 27%, however, at the expense of higher cumulative-congestion costs. Thus, for inventory-control, the performance of the average-terms did not degrade much, while the same is not true for queueing-system. Hence, one would require a judiciously chosen weighted design of all the costs involved including the peak-levels for any application and such a design can now be derived numerically using the proposed framework.
♻ ☆ A model predictive control framework with robust stability guarantees under large disturbances
To address feasibility issues in model predictive control (MPC), most implementations relax state constraints by using slack variables and adding a penalty to the cost. We propose an alternative strategy: relaxing the initial state constraint with a penalty. Compared to state-of-the-art soft constrained MPC formulations, the proposed formulation has two key features: (i) input-to-state stability and bounds on the cumulative constraint violation for large disturbances; (ii) close-to-optimal performance under nominal operating conditions. The idea is initially presented for open-loop asymptotically stable nonlinear systems by designing the penalty as a Lyapunov function, but we also show how to relax this condition to: i) Lyapunov stable systems; ii) stabilizable systems; and iii) utilizing an implicit characterization of the Lyapunov function. In the special case of linear systems, the proposed MPC formulation reduces to a quadratic program, and the offline design and online computational complexity are only marginally increased compared to a nominal design. Numerical examples demonstrate benefits compared to state-of-the-art soft-constrained MPC formulations.
♻ ☆ Automatic Generation of Explicit Quadratic Programming Solvers
We consider a family of convex quadratic programs in which the coefficients of the linear objective term and the righthand side of the constraints are affine functions of a parameter. It is well known that the solution of such a parametrized quadratic program is a piecewise affine function of the parameter. The number of (polyhedral) regions in the solution map can grow exponentially in problem size, but when the number of regions is moderate, a so-called explicit solver is practical. Such a solver computes the coefficients of the affine functions and the linear inequalities defining the polyhedral regions offline; to solve a problem instance online it simply evaluates this explicit solution map. Potential advantages of an explicit solver over a more general purpose iterative solver can include transparency, interpretability, reliability, and speed. In this paper we describe how code generation can be used to automatically generate an explicit solver from a high level description of a parametrized quadratic program. Our method has been implemented in the open-source software CVXPYgen, which is part of CVXPY, a domain specific language for general convex optimization.
♻ ☆ Closed-loop Performance Optimization of Model Predictive Control with Robustness Guarantees
Model mismatch and process noise are two frequently occurring phenomena that can drastically affect the performance of model predictive control (MPC) in practical applications. We propose a principled way to tune the cost function and the constraints of linear MPC schemes to improve the closed-loop performance and robust constraint satisfaction on uncertain nonlinear dynamics with additive noise. The tuning is performed using a novel MPC tuning algorithm based on backpropagation developed in our earlier work. Using the scenario approach, we provide probabilistic bounds on the likelihood of closed-loop constraint violation over a finite horizon. We showcase the effectiveness of the proposed method on linear and nonlinear simulation examples.
♻ ☆ Curse of Dimensionality in Neural Network Optimization
This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^\delta)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2\delta)r}{d-2r}}$. Understanding how function smoothness influences the curse of dimensionality in neural network optimization theory is an important and underexplored direction that this work aims to address.
comment: 32 pages, 1 figure
♻ ☆ Auto-Lesion Segmentation with a Novel Intensity Dark Channel Prior for COVID-19 Detection
During the COVID-19 pandemic, medical imaging techniques like computed tomography (CT) scans have demonstrated effectiveness in combating the rapid spread of the virus. Therefore, it is crucial to conduct research on computerized models for the detection of COVID-19 using CT imaging. A novel processing method has been developed, utilizing radiomic features, to assist in the CT-based diagnosis of COVID-19. Given the lower specificity of traditional features in distinguishing between different causes of pulmonary diseases, the objective of this study is to develop a CT-based radiomics framework for the differentiation of COVID-19 from other lung diseases. The model is designed to focus on outlining COVID-19 lesions, as traditional features often lack specificity in this aspect. The model categorizes images into three classes: COVID-19, non-COVID-19, or normal. It employs enhancement auto-segmentation principles using intensity dark channel prior (IDCP) and deep neural networks (ALS-IDCP-DNN) within a defined range of analysis thresholds. A publicly available dataset comprising COVID-19, normal, and non-COVID-19 classes was utilized to validate the proposed model's effectiveness. The best performing classification model, Residual Neural Network with 50 layers (Resnet-50), attained an average accuracy, precision, recall, and F1-score of 98.8%, 99%, 98%, and 98% respectively. These results demonstrate the capability of our model to accurately classify COVID-19 images, which could aid radiologists in diagnosing suspected COVID-19 patients. Furthermore, our model's performance surpasses that of more than 10 current state-of-the-art studies conducted on the same dataset.
comment: The study requires withdrawal due to technical inconsistencies in the reported data that affect the conclusions. We apologize for any inconvenience
♻ ☆ Discrete-Time LQ Stochastic Two-Person Nonzero-Sum Difference Games with Random Coefficients:~Open-Loop Nash Equilibrium
This paper presents a pioneering investigation into discrete-time two-person non-zero-sum linear quadratic (LQ) stochastic games with random coefficients. We derive necessary and sufficient conditions for the existence of open-loop Nash equilibria using convex variational calculus. To obtain explicit expressions for the Nash equilibria, we introduce fully coupled forward-backward stochastic difference equations (FBS$\Delta$E, for short), which provide a dual characterization of these Nash equilibria. Additionally, we develop non-symmetric stochastic Riccati equations that decouple the stochastic Hamiltonian system for each player, enabling the derivation of closed-loop feedback forms for open-loop Nash equilibrium strategies. A notable aspect of this research is the complete randomness of the coefficients, which results in the corresponding Riccati equations becoming fully nonlinear higher-order backward stochastic difference equations. It distinguishes our non-zero-sum difference game from the deterministic case, where the Riccati equations reduce to algebraic forms.
♻ ☆ Completely Parameter-Free Single-Loop Algorithms for Nonconvex-Concave Minimax Problems
Due to their importance in various emerging applications, efficient algorithms for solving minimax problems have recently received increasing attention. However, many existing algorithms require prior knowledge of the problem parameters in order to achieve optimal iteration complexity. In this paper, three completely parameter-free single-loop algorithms, namely PF-AGP-NSC algorithm, PF-AGP-NC algorithm and PF-AGP-NL algorithm, are proposed to solve the smooth nonconvex-strongly concave, nonconvex-concave minimax problems and nonconvex-linear minimax problems respectively using line search without requiring any prior knowledge about parameters such as the Lipschtiz constant $L$ or the strongly concave modulus $\mu$. Furthermore, we prove that the total number of gradient calls required to obtain an $\varepsilon$-stationary point for the PF-AGP-NSC algorithm, the PF-AGP-NC algorithm, and the PF-AGP-NL algorithm are upper bounded by $\mathcal{O}\left( L^2\kappa^3\varepsilon^{-2} \right)$, $\mathcal{O}\left( \log^2(L)L^4\varepsilon^{-4} \right)$, and $\mathcal{O}\left( L^3\varepsilon^{-3} \right)$, respectively, where $\kappa$ is the condition number. To the best of our knowledge, PF-AGP-NC and PF-AGP-NL are the first completely parameter-free algorithms for solving nonconvex-concave and nonconvex-linear minimax problems, respectively. PF-AGP-NSC is a completely parameter-free algorithm for solving nonconvex-strongly concave minimax problems, achieving the best known complexity with respect to $\varepsilon$. Numerical results demonstrate the efficiency of the three proposed algorithms.
♻ ☆ Reset Controller Analysis and Design for Unstable Linear Plants using Scaled Relative Graphs
In this technical communique, we develop a graphical design procedure for reset controllers for unstable LTI plants based on recent developments on Scaled Relative Graph analysis, yielding an $L_2$-gain performance bound. The stabilizing controller consists of a second order reset element in parallel with a proportional gain. The proposed method goes beyond existing approaches that are limited to stable systems only, providing a well-applicable approach to design problems in practice where the plant is unstable.
comment: 5 pages, submitted on 16-06-2025 as a technical communique to Automatica. Second version with minor corrections 21-06-2025
♻ ☆ Infinite Horizon Fully Coupled Nonlinear Forward-Backward Stochastic Difference Equations and Their Application to LQ Optimal Control Problems
This paper focuses on the study of infinite horizon fully coupled nonlinear forward-backward stochastic difference equations (FBS$\bigtriangleup$Es). Firstly, we establish a pair of priori estimates for the solutions to forward stochastic difference equations (S$\bigtriangleup$Es) and backward stochastic difference equations (BS$\bigtriangleup$Es), respectively. Then, to achieve broader applicability, we utilize a set of domination-monotonicity conditions that are more lenient than standard assumptions. Using these conditions, we apply continuation methods to prove the unique solvability of infinite horizon fully coupled FBS$\bigtriangleup$Es and derive a set of solution estimates. Furthermore, our results have considerable implications for a variety of related linear quadratic (LQ) problems, especially when the stochastic Hamiltonian system is consistent with FBS$\bigtriangleup$Es satisfying the introduced domination-monotonicity conditions. Thus, by solving the associated stochastic Hamiltonian system, we explicitly characterize the unique optimal control. This is the first work establishing solvability of fully coupled nonlinear FBS$\bigtriangleup$Es under domination-monotonicity conditions in infinite horizon discrete-time setting.
comment: arXiv admin note: text overlap with arXiv:2410.01749
♻ ☆ Anticipated backward stochastic evolution equations and maximum principle for path-dependent systems in infinite dimensions
For a class of path-dependent stochastic evolution equations driven by cylindrical $Q$-Wiener process, we study the Pontryagin's maximum principle for the stochastic recursive optimal control problem. In this infinite-dimensional control system, the state process depends on its past trajectory, the control is delayed via an integral with respect to a general finite measure, and the final cost relies on the delayed state.To obtain the maximum principle, we introduce a functional adjoint operator for the non-anticipative path derivative and establish the well-posedness of an anticipated backward stochastic evolution equation in the path-dependent form, which serves as the adjoint equation.
♻ ☆ Distributed Optimization of Clique-Wise Coupled Problems via Three-Operator Splitting
This study explores distributed optimization problems with clique-wise coupling via operator splitting and how we can utilize this framework for performance analysis and enhancement. This framework extends beyond conventional pairwise coupled problems (e.g., consensus optimization) and is applicable to broader examples. To this end, we first introduce a new distributed optimization algorithm by leveraging a clique-based matrix and the Davis-Yin splitting (DYS), a versatile three-operator splitting method. We then demonstrate that this approach sheds new light on conventional algorithms in the following way: (i) Existing algorithms (NIDS, Exact diffusion, diffusion, and our previous work) can be derived from our proposed method; (ii) We present a new mixing matrix based on clique-wise coupling, which surfaces when deriving the NIDS. We prove its preferable distribution of eigenvalues, enabling fast consensus; (iii) These observations yield a new linear convergence rate for the NIDS with non-smooth objective functions. Remarkably our linear rate is first established for the general DYS with a projection for a subspace. This case is not covered by any prior results, to our knowledge. Finally, numerical examples showcase the efficacy of our proposed approach.
comment: 32 pages, 5 figures. Several modifications have been made mainly in numerical results
Systems and Control 24
☆ Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
☆ Open Sky, Open Threats: Replay Attacks in Space Launch and Re-entry Phases
This paper examines the effects of replay attacks on the integrity of both uplink and downlink communications during critical phases of spacecraft communication. By combining software-defined radios (SDRs) with a real-time channel emulator, we replicate realistic attack conditions on the Orion spacecraft's communication systems in both launch and reentry. Our evaluation shows that, under replay attacks, the attacker's signal can overpower legitimate transmissions, leading to a Signal to Noise Ratio (SNR) difference of up to -7.8 dB during reentry and -6.5 dB during launch. To mitigate these threats, we propose a more secure receiver design incorporating a phase-coherency-dependent decision-directed (DD) equalizer with a narrowed phase-locked loop (PLL) bandwidth. This configuration enhances resilience by making synchronization more sensitive to phase distortions caused by replay interference.
☆ Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control RSS
Recent advancements in parallel simulation and successful robotic applications are spurring a resurgence in sampling-based model predictive control. To build on this progress, however, the robotics community needs common tooling for prototyping, evaluating, and deploying sampling-based controllers. We introduce Judo, a software package designed to address this need. To facilitate rapid prototyping and evaluation, Judo provides robust implementations of common sampling-based MPC algorithms and standardized benchmark tasks. It further emphasizes usability with simple but extensible interfaces for controller and task definitions, asynchronous execution for straightforward simulation-to-hardware transfer, and a highly customizable interactive GUI for tuning controllers interactively. While written in Python, the software leverages MuJoCo as its physics backend to achieve real-time performance, which we validate across both consumer and server-grade hardware. Code at https://github.com/bdaiinstitute/judo.
comment: Accepted at the 2025 RSS Workshop on Fast Motion Planning and Control in the Era of Parallelism. 5 Pages
☆ A Set-valued Impact Law Approach for Modeling and Analysis of Rigid Contact Universal Joint with Clearance
This study presents a dynamic model of a universal joint (U-Joint) with radial clearance, focusing on the rigid unilateral frictional contacts at the crosspiece and yoke interfaces. Unlike previous models that neglect crosspiece inertia and interface friction, this work incorporates these effects using a set-valued impact law based on Signorini's condition with Coulomb friction, capturing the complex non-smooth dynamics introduced by radial clearance. Numerical simulations of a 2 degrees-of-freedom (DOF) shaft system reveal the critical influence of clearance on U-Joint dynamic behavior, including impact-induced oscillations, quasi-periodic motion, and chaotic dynamics, which are essential for accurate driveline modeling and real-time control in automotive, aerospace, and precision medical applications.
☆ A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectives
This paper presents a systematic approach to the advanced control of continuous crystallization processes using model predictive control. We provide a tutorial introduction to controlling complex particle size distributions by integrating population balance equations with detailed models of various continuous crystallizers. Since these high-fidelity models are often too complex for online optimization, we propose the use of data-driven surrogate models that enable efficient optimization-based control. Through two case studies, one with a low-complexity system allowing direct comparison with traditional methods and another involving a spatially distributed crystallizer, we demonstrate how our approach enables real-time model predictive control while maintaining accuracy. The presented methodology facilitates the use of complex models in a model-based control framework, allowing precise control of key particle size distribution characteristics, such as the median particle size $d_{50}$ and the width $d_{90} - d_{10}$. This addresses a critical challenge in pharmaceutical and fine chemical manufacturing, where product quality depends on tight control of particle characteristics.
☆ A Vision for Trustworthy, Fair, and Efficient Socio-Technical Control using Karma Economies
Control systems will play a pivotal role in addressing societal-scale challenges as they drive the development of sustainable future smart cities. At the heart of these challenges is the trustworthy, fair, and efficient allocation of scarce public resources, including renewable energy, transportation, data, computation, etc.. Historical evidence suggests that monetary control -- the prototypical mechanism for managing resource scarcity -- is not always well-accepted in socio-technical resource contexts. In this vision article, we advocate for karma economies as an emerging non-monetary mechanism for socio-technical control. Karma leverages the repetitive nature of many socio-technical resources to jointly attain trustworthy, fair, and efficient allocations; by budgeting resource consumption over time and letting resource users ``play against their future selves.'' To motivate karma, we review related concepts in economics through a control systems lens, and make a case for a) shifting the viewpoint of resource allocations from single-shot and static to repeated and dynamic games; and b) adopting long-run Nash welfare as the formalization of ``fairness and efficiency'' in socio-technical contexts. We show that in many dynamic resource settings, karma Nash equilibria maximize long-run Nash welfare. Moreover, we discuss implications for a future smart city built on multi-karma economies: by choosing whether to combine different socio-technical resources, e.g., electricity and transportation, in a single karma economy, or separate into resource-specific economies, karma provides new flexibility to design the scope of fairness and efficiency.
☆ Closed-Loop Molecular Communication with Local and Global Degradation: Modeling and ISI Analysis
This paper presents a novel physics-based model for signal propagation in closed-loop molecular communication (MC) systems, which are particularly relevant for many envisioned biomedical applications, such as health monitoring or drug delivery within the closed-loop human cardiovascular system (CVS). Compared to open-loop systems, which are mostly considered in MC, closed-loop systems exhibit different characteristic effects influencing signaling molecule (SM) propagation. One key phenomenon are the periodic SM arrivals at the receiver (RX), leading to various types of inter-symbol interference (ISI) inherent to closed-loop system. To capture these characteristic effects, we propose an analytical model for the SM propagation inside closed-loop systems. The model accounts for arbitrary spatio-temporal SM release patterns at the transmitter (TX), and incorporates several environmental effects such as fluid flow, SM diffusion, and SM degradation. Moreover, to capture a wide range of practically relevant degradation and clearance mechanisms, the model includes both local removal (e.g., due to SM absorption into organs) and global removal (e.g., due to chemical degradation) of SMs. The accuracy of the proposed model is validated with three-dimensional (3-D) particle-based simulations (PBSs). Moreover, we utilize the proposed model to develop a rigorous characterization of the various types of ISI encountered in closed-loop MC systems.
☆ Robust black start of an offshore wind farm with DRU based HVDC link using power synchronization control
This paper introduces a universal power synchronization controller for grid-side control of the wind turbine conversion systems in an offshore wind farm with a diode rectifier in the offshore substation of the HVDC link. The controller incorporates voltage-power droop controllers in the outer loop to enable the operation of this setup. To effectively handle the impact of large delays during black start and power ramp phases, virtual active and reactive power quantities are defined. These quantities are computed based on the current references prior to any modifications that might be needed to meet converter current and voltage limits or source constraints. Utilizing them in the outer loop ensures a balanced power sharing and a stable operation whenever the original (unmodified) current references are not realized. Case studies confirm the robustness of the proposed controller.
☆ Evaluating the Impact of Model Accuracy for Optimizing Battery Energy Storage Systems
This study investigates two models of varying complexity for optimizing intraday arbitrage energy trading of a battery energy storage system using a model predictive control approach. Scenarios reflecting different stages of the system's lifetime are analyzed. The findings demonstrate that the equivalent-circuit-model-based non-linear optimization model outperforms the simpler linear model by delivering more accurate predictions of energy losses and system capabilities. This enhanced accuracy enables improved operational strategies, resulting in increased roundtrip efficiency and revenue, particularly in systems with batteries exhibiting high internal resistance, such as second-life batteries. However, to fully leverage the model's benefits, it is essential to identify the correct parameters.
comment: 5 pages, 4 figures. Submitted to IEEE ISGT Europe 2025
☆ Opportunities for real-time process control of electrode properties in lithium-ion battery manufacturing
Lithium-ion batteries (LIBs) have an important role in the shift required to achieve a global net-zero carbon target of 2050. Electrode manufacture is amongst the most expensive steps of the LIB manufacturing process and, despite its apparent maturity, optimised manufacturing conditions are arrived at by largely trial and error. Currently, LIB manufacturing plants are controlled to follow the fixed "recipe" obtained by trial and error, which may nonetheless be suboptimal. Moreover, regulating the process as a whole to conform to the set conditions is not widespread. Inspired by control approaches used in other film and sheet processes, we discuss opportunities for implementing real-time process control of electrode-related products, which has the potential to reduce the electrode manufacturing cost, CO2 emissions, usage of resources by increases in process yield, and throughput. We highlight the challenges and significant opportunities of implementing real-time process control in LIB electrode production lines.
☆ On the input admittance of a universal power synchronization controller with droop controllers
Recent work has proposed a universal framework that integrates the well-established power synchronization control into vector current control. Using this controller for the parallel operation of grid-forming converters, and/or with loads that have strong voltage magnitude sensitivity, requires additional loops manipulating the voltage magnitude, e.g., $QV$ and $PV$ voltage-power droop controllers. This paper derives the input admittance of the resulting overall scheme. Sensitivity analyses based on the passivity index demonstrate the benefits of the proportional components of $QV$ and $PV$ control. A case study is presented where a grid-forming converter is operated in parallel with a generator.
☆ Trajectory tracking control of USV with actuator constraints in the presence of disturbances
All practical systems often pose a problem of finite control capability, which can notably degrade the performance if not properly addressed. Since actuator input bounds are typically known, integrating actuator saturation considerations into the control law design process can lead to enhanced performance and more precise trajectory tracking. Also, the actuators cannot provide the demanded forces or torques instantaneously; hence, there is a limitation on the rate of magnitude. This work proposes nonlinear feedback controller designs developed using the Lyapunov stability and backstepping method while actively considering the actuator magnitude and rate constraints. The system dynamics are augmented with a smooth control input saturation model. Additionally, an observer is incorporated to estimate the disturbance vector. Through Lyapunov stability analysis, we demonstrate the system's stability under the proposed controller for the Uncrewed Surface Vessel (USV), ensuring adherence to actuator constraints provided their initial values fall within the prescribed bounds. Extensive numerical simulations performed by considering various trajectories and multiple initial conditions demonstrate the effectiveness of the controller in maintaining tracking performance without violating actuator constraints. This work also relaxes the assumption of equally capable actuators to be used to control the motion of USVs, affirming the viability of the controller in practical applications.
☆ Formal Control for Uncertain Systems via Contract-Based Probabilistic Surrogates (Extended Version)
The requirement for identifying accurate system representations has not only been a challenge to fulfill, but it has compromised the scalability of formal methods, as the resulting models are often too complex for effective decision making with formal correctness and performance guarantees. Focusing on probabilistic simulation relations and surrogate models of stochastic systems, we propose an approach that significantly enhances the scalability and practical applicability of such simulation relations by eliminating the need to compute error bounds directly. As a result, we provide an abstraction-based technique that scales effectively to higher dimensions while addressing complex nonlinear agent-environment interactions with infinite-horizon temporal logic guarantees amidst uncertainty. Our approach trades scalability for conservatism favorably, as demonstrated on a complex high-dimensional vehicle intersection case study.
comment: 26 pages, 5 figures, extended version of paper accepted for publication at QEST 2025
☆ Orbital Collision: An Indigenously Developed Web-based Space Situational Awareness Platform
This work presents an indigenous web based platform Orbital Collision (OrCo), created by the Space Systems Laboratory at IIIT Delhi, to enhance Space Situational Awareness (SSA) by predicting collision probabilities of space objects using Two Line Elements (TLE) data. The work highlights the growing challenges of congestion in the Earth's orbital environment, mainly due to space debris and defunct satellites, which increase collision risks. It employs several methods for propagating orbital uncertainty and calculating the collision probability. The performance of the platform is evaluated through accuracy assessments and efficiency metrics, in order to improve the tracking of space objects and ensure the safety of the satellite in congested space.
comment: This work has been already submitted for STEP-IPSC 2025 Conference Proceedings
☆ Vision-Based Multirotor Control for Spherical Target Tracking: A Bearing-Angle Approach
This work addresses the problem of designing a visual servo controller for a multirotor vehicle, with the end goal of tracking a moving spherical target with unknown radius. To address this problem, we first transform two bearing measurements provided by a camera sensor into a bearing-angle pair. We then use this information to derive the system's dynamics in a new set of coordinates, where the angle measurement is used to quantify a relative distance to the target. Building on this system representation, we design an adaptive nonlinear control algorithm that takes advantage of the properties of the new system geometry and assumes that the target follows a constant acceleration model. Simulation results illustrate the performance of the proposed control algorithm.
comment: This paper has been accepted for presentation at the 2025 IEEE European Control Conference (ECC)
☆ Distributed Affine Formation Control of Linear Multi-agent Systems with Adaptive Event-triggering
Concerning general multi-agent systems with limited communication, this paper proposes distributed formation control protocols under adaptive event-triggered schemes to operate affine transformations of nominal formations. To accommodate more practical system mechanics, we develop an event-triggered controller that drives the leader to a desired state by bringing in the compensation term. Based on triggering instants' state information, an affine formation control method with adaptive event-triggering is designed for each follower, making the whole protocol effective in refraining from successive communication while not relying on predefined global information. In particular, mitigating the effect of partial state availability, an output-based control solution is presented to expand the protocol's serviceable range. Finally, we perform numerical simulations on the formation and its affine transformations to verify the effectiveness of the control protocol and the feasibility of the event-triggered mechanism.
♻ ☆ A Synthetic Texas Power System with Time-Series Weather-Dependent Spatiotemporal Profiles
We developed a synthetic Texas 123-bus backbone transmission system (TX-123BT) with spatio-temporally correlated grid profiles of solar power, wind power, dynamic line ratings and loads at one-hour resolution for five continuous years, which demonstrates unique advantages compared to conventional test cases that offer single static system profile snapshots. Three weather-dependent models are used to create the hourly wind power productions, solar power productions, and dynamic line ratings respectively. The actual historical weather information is also provided along with this dataset, which is suitable for machine learning models. Security-constrained unit commitment is conducted on TX-123BT daily grid profiles and numerical results are compared with the actual Texas system for validation. The created hourly DLR profiles can cut operating cost from USD 8.09 M to USD 7.95 M (-1.7 %), raises renewable dispatch by 1.3 %, and lowers average LMPs from USD 18.66 to USD 17.98 /MWh (-3.6 %). Two hydrogen options -- a 200 MW dual hub and a 500 MW hydrogen-energy transmission and conversion system -- reduce high-load Q3 daily costs by 13.9 % and 14.1 %, respectively. Sensitivity tests show that suppressing the high-resolution weather-driven profiles can push system cost up by as much as 15 %, demonstrating the economic weight of temporal detail.
comment: 12 pages, 14 figures, 10 tables
♻ ☆ Safe Guaranteed Exploration for Non-linear Systems
Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. We improve the efficiency of this general framework by proposing an algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC leverages three key techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.
comment: Accepted paper in IEEE Transactions on Automatic Control, 2025
♻ ☆ A Fully Digital Relaxation-Aware Analog Programming Technique for HfOx RRAM Arrays
For neuromorphic engineering to emulate the human brain, improving memory density with low power consumption is an indispensable but challenging goal. In this regard, emerging RRAMs have attracted considerable interest for their unique qualities like low power consumption, high integration potential, durability, and CMOS compatibility. Using RRAMs to imitate the more analog storage behavior of brain synapses is also a promising strategy for further improving memory density and power efficiency. However, RRAM devices display strong stochastic behavior, together with relaxation effects, making it more challenging to precisely control their multi-level storage capability. To address this, researchers have reported different multi-level programming strategies, mostly involving the precise control of analog parameters like compliance current during write operations and/or programming voltage amplitudes. Here, we present a new fully digital relaxation-aware method for tuning the conductance of analog RRAMs. The method is based on modulating digital pulse widths during erase operations while keeping other parameters fixed, and therefore requires no precise alterations to analog parameters like compliance currents or programming voltage amplitudes. Experimental results, with and without relaxation effect awareness, on a 64 RRAM 1T1R HfOx memory array of cells, fabricated in 130nm CMOS technology, indicate that it is possible to obtain 2-bit memory per cell multi-value storage at the array level, verified 1000 seconds after programming.
comment: 5 pages, 10 figures, 2 tables
♻ ☆ Kernel-based error bounds of bilinear Koopman surrogate models for nonlinear data-driven control
We derive novel deterministic bounds on the approximation error of data-based bilinear surrogate models for unknown nonlinear systems. The surrogate models are constructed using kernel-based extended dynamic mode decomposition to approximate the Koopman operator in a reproducing kernel Hilbert space. Unlike previous methods that require restrictive assumptions on the invariance of the dictionary, our approach leverages kernel-based dictionaries that allow us to control the projection error via pointwise error bounds, overcoming a significant limitation of existing theoretical guarantees. The derived state- and input-dependent error bounds allow for direct integration into Koopman-based robust controller designs with closed-loop guarantees for the unknown nonlinear system. Numerical examples illustrate the effectiveness of the proposed framework.
comment: Accepted for publication in IEEE Control Systems Letters (L-CSS)
♻ ☆ Towards Wireless Native Big AI Model: The Mission and Approach Differ From Large Language Model
Research on leveraging big artificial intelligence model (BAIM) technology to drive the intelligent evolution of wireless networks is emerging. However, breakthroughs in generalization brought about by BAIM techniques mainly occur in natural language processing. There is a lack of a clear technical direction on how to efficiently apply BAIM techniques to wireless systems, which typically have many additional peculiarities. To this end, this paper reviews recent research on BAIM for wireless systems and assesses the current state of the field. It then analyzes and compares the differences between language intelligence and wireless intelligence on multiple levels, including scientific foundations, core usages, and technical details. It highlights the necessity and scientific significance of developing wireless native BAIM technologies, as well as specific issues that need to be considered for technical implementation. Finally, by synthesizing the evolutionary laws of language models with the particularities of wireless systems, this paper provides several instructive methodologies for developing wireless native BAIM.
comment: Accepted by SCIENCE CHINA Information Sciences. Paper Link: https://www.sciengine.com/SCIS/doi/10.1007/s11432-024-4464-8
♻ ☆ DualGuard MPPI: Safe and Performant Optimal Control by Combining Sampling-Based MPC and Hamilton-Jacobi Reachability
Designing controllers that are both safe and performant is inherently challenging. This co-optimization can be formulated as a constrained optimal control problem, where the cost function represents the performance criterion and safety is specified as a constraint. While sampling-based methods, such as Model Predictive Path Integral (MPPI) control, have shown great promise in tackling complex optimal control problems, they often struggle to enforce safety constraints. To address this limitation, we propose DualGuard-MPPI, a novel framework for solving safety-constrained optimal control problems. Our approach integrates Hamilton-Jacobi reachability analysis within the MPPI sampling process to ensure that all generated samples are provably safe for the system. On the one hand, this integration allows DualGuard-MPPI to enforce strict safety constraints; at the same time, it facilitates a more effective exploration of the environment with the same number of samples, reducing the effective sampling variance and leading to better performance optimization. Through several simulations and hardware experiments, we demonstrate that the proposed approach achieves much higher performance compared to existing MPPI methods, without compromising safety.
comment: 8 pages, 7 figures
♻ ☆ Physics-Informed Recurrent Network for State-Space Modeling of Gas Pipeline Networks
As a part of the integrated energy system (IES), gas pipeline networks can provide additional flexibility to power systems through coordinated optimal dispatch. An accurate pipeline network model is critical for the optimal operation and control of IESs. However, inaccuracies or unavailability of accurate pipeline parameters often introduce errors in the state-space models of such networks. This paper proposes a physics-informed recurrent network (PIRN) to identify the state-space model of gas pipelines. It fuses sparse measurement data with fluid-dynamic behavior expressed by partial differential equations. By embedding the physical state-space model within the recurrent network, parameter identification becomes an end-to-end PIRN training task. The model can be realized in PyTorch through modifications to a standard RNN backbone. Case studies demonstrate that our proposed PIRN can accurately estimate gas pipeline models from sparse terminal node measurements, providing robust performance and significantly higher parameter efficiency. Furthermore, the identified state-space model of the pipeline network can be seamlessly integrated into optimization frameworks.
comment: 9 Pages
♻ ☆ Worst-Case Services and State-Based Scheduling
In this paper, we shed new light on a classical scheduling problem: given a slot-timed, constant-capacity server, what short-run scheduling decisions must be made to provide long-run service guarantees to competing flows of unit-sized tasks? We model each flow's long-run guarantee as a worst-case service that maps each queued arrival vector recording the flow's cumulative task arrivals, including those initially queued, to a worst-case acceptable departure vector lower-bounding its cumulative served tasks. We show that these maps are states that can be updated as tasks arrive and are served, introduce state-based scheduling, find the schedulability condition necessary and sufficient to maintain all flows' long-run guarantees, and use this condition to identify all short-run scheduling decisions that preserve schedulability. Our framework is general but computationally complex. To reduce complexity, we consider three specializations. First, we show that when satisfactory short-run scheduling decisions exist, at least one can be efficiently identified by maximizing the server's capacity slack, a generalization of the earliest-deadline-first rule. Second, we show that a special class of worst-case services, min-plus services, can be efficiently specified and updated using properties of the min-plus algebra. Finally, we show that efficiency can be further improved by restricting attention to a min-plus service subclass, dual-curve services. This last specialization turns out to be a dynamic extension of service curves that maintains all essential features of our framework while approaching near practical viability.
Optimization and Control 29
☆ Regularization of Nonlinear Inverse Problems -- From Functional Analysis to Data-Driven Approaches
The focus of this book is on the analysis of regularization methods for solving \emph{nonlinear inverse problems}. Specifically, we place a strong emphasis on techniques that incorporate supervised or unsupervised data derived from prior experiments. This approach enables the integration of data-driven insights into the solution of inverse problems governed by physical models. \emph{Inverse problems}, in general, aim to uncover the \emph{inner mechanisms} of an observed system based on indirect or incomplete measurements. This field has far-reaching applications across various disciplines, such as medical or geophysical imaging, as well as, more broadly speaking, industrial processes where identifying hidden parameters is essential.
☆ Convergent Proximal Multiblock ADMM for Nonconvex Dynamics-Constrained Optimization
This paper proposes a provably convergent multiblock ADMM for nonconvex optimization with nonlinear dynamics constraints, overcoming the divergence issue in classical extensions. We consider a class of optimization problems that arise from discretization of dynamics-constrained variational problems that are optimization problems for a functional constrained by time-dependent ODEs or PDEs. This is a family of $n$-sum nonconvex optimization problems with nonlinear constraints. We study the convergence properties of the proximal alternating direction method of multipliers (proximal ADMM) applied to those problems. Taking the advantage of the special problem structure, we show that under local Lipschitz and local $L$-smooth conditions, the sequence generated by the proximal ADMM is bounded and all accumulation points are KKT points. Based on our analysis, we also design a procedure to determine the penalty parameters $\rho_i$ and the proximal parameters $\eta_i$. We further prove that among all the subsequences that converge, the fast one converges at the rate of $o(1/k)$. The numerical experiments are performed on 4D variational data assimilation problems and as the solver of implicit schemes for stiff problems. The proposed proximal ADMM has more stable performance than gradient-based methods. We discuss the implementation to solve the subproblems, a new way to solve the implicit schemes, and the advantages of the proposed algorithm.
comment: 32 pages, 6 figures
☆ Worst-case convergence analysis of relatively inexact gradient descent on smooth convex functions
We consider the classical gradient descent algorithm with constant stepsizes, where some error is introduced in the computation of each gradient. More specifically, we assume some relative bound on the inexactness, in the sense that the norm of the difference between the true gradient and its approximate value is bounded by a certain fraction of the gradient norm. This paper presents a worst-case convergence analysis of this so-called relatively inexact gradient descent on smooth convex functions, using the Performance Estimation Problem (PEP) framework. We first derive the exact worst-case behavior of the method after one step. Then we study the case of several steps and provide computable upper and lower bounds using the PEP framework. Finally, we discuss the optimal choice of constant stepsize according to the obtained worst-case convergence rates.
☆ Non-linear estimators for the observation and stabilisation of falling liquid films
Falling liquid films are complex physical processes modelled by infinite-dimensional dynamical systems. The interfacial behaviour is characterised by the Reynolds number, a nondimensional parameter, and above a critical value the uniform film becomes unstable and travelling waves occur. By injecting and removing fluid from the base at discrete locations, we aim to stabilise otherwise unstable flat interfaces, with the additional limitation that observations of the state are restricted to finitely many measurements of the film height. Successful feedback control has recently been achieved in the case of full observations using linear-quadratic regulator controls coupled to asymptotic approximations of the Navier-Stokes equations, but restricted observations severely curtailed their performance. In this study we couple the well understood full-information feedback control strategy to a non-linear estimator. The dynamics of the estimator are designed to approximate those of the film, and we apply a forcing term chosen to ensure that measurements of the estimator match the available measurements of the film. Using this method, we restore the performance of the controls to a level approaching their full information counterparts, even at moderately large Reynolds numbers. We also briefly investigate the effects of the relative positioning of actuators and observers on the resulting dynamics.
comment: 22 pages, 15 figures, submitted to RS Proc A
☆ Analytic Full Potential Adjoint Solution for Two-dimensional Subcritical Flows
We investigate the analytic adjoint solution of the two-dimensional (2D) full potential equation for subcritical flows for a cost function measuring aerodynamic force by the Green's function approach and by the connection of the adjoint full potential and compressible Euler equations. The solution for aerodynamic lift is written in terms of two unknown functions encoding the effect of perturbations to the Kutta condition. We analyze the properties of these functions from an analytic viewpoint and also by examining numerical adjoint solutions. We also discuss a possible formulation of the Kutta condition within the adjoint framework
comment: 25 pages. arXiv admin note: text overlap with arXiv:2503.15121
☆ Optimal Depth of Neural Networks
Determining the optimal depth of a neural network is a fundamental yet challenging problem, typically resolved through resource-intensive experimentation. This paper introduces a formal theoretical framework to address this question by recasting the forward pass of a deep network, specifically a Residual Network (ResNet), as an optimal stopping problem. We model the layer-by-layer evolution of hidden representations as a sequential decision process where, at each layer, a choice is made between halting computation to make a prediction or continuing to a deeper layer for a potentially more refined representation. This formulation captures the intrinsic trade-off between accuracy and computational cost. Our primary theoretical contribution is a proof that, under a plausible condition of diminishing returns on the residual functions, the expected optimal stopping depth is provably finite, even in an infinite-horizon setting. We leverage this insight to propose a novel and practical regularization term, $\mathcal{L}_{\rm depth}$, that encourages the network to learn representations amenable to efficient, early exiting. We demonstrate the generality of our framework by extending it to the Transformer architecture and exploring its connection to continuous-depth models via free-boundary problems. Empirical validation on ImageNet confirms that our regularizer successfully induces the theoretically predicted behavior, leading to significant gains in computational efficiency without compromising, and in some cases improving, final model accuracy.
☆ A Hybrid Zernike-Lyapunov Framework for Aberration-Based Statistical Wavefront Reconstruction of Chaotic Optical Surfaces
We present a comprehensive theoretical framework that unifies chaotic wavefront dynamics with classical aberration theory through a Statistical Wavefront Reconstruction Framework (SWRF) formalism. By establishing rigorous connections between ray trajectory deflections and wave-optical phase perturbations through the eikonal equation, we decompose chaotic wavefront perturbations into modified Zernike-Lyapunov hybrid expansions, establishing mathematical equivalences between Lyapunov exponents, fractal dimensions, and traditional aberration coefficients. This chaotic aberration theory enables systematic incorporation of non-integrable wavefront dynamics into deterministic design frameworks, providing a rigorous foundation for controlled chaos in optical systems. We derive analytical relationships connecting surface chaos parameters to optical performance metrics, demonstrate the framework's validity through phase space analysis, and establish convergence criteria for the chaotic expansion. The theory reveals how chaotic surface geometries can be intentionally designed to achieve specific optical functionalities, including beam homogenization, speckle reduction, and novel wavefront shaping capabilities, while maintaining mathematical rigor comparable to classical aberration analysis.
☆ General Linear-Quadratic Mean Field Stochastic Differential Game with Common Noise: a Direct Method
This paper investigates a class of general linear-quadratic mean field games with common noise, where the diffusion terms of the system contain the state variables, control variables, and the average state terms. We solve the problem using both the direct method and the fixed-point method in the paper. First, by using the variational method to solve a finite $N$-players game problem, we obtain the necessary and sufficient conditions for the centralized open-loop Nash equilibrium strategy. Subsequently, by employing the decoupling technique, we derive the state feedback representation of the centralized open-loop Nash equilibrium strategy in terms of Riccati equations. Next, by studying the asymptotic solvability of the Riccati equations, we construct the decentralized open-loop asymptotic Nash equilibrium strategies. Finally, through some estimates, we prove the asymptotic optimality of the decentralized open-loop Nash equilibrium strategies. Moreover, we find that the decentralized open-loop asymptotic Nash equilibrium strategies obtained via the fixed-point method are identical to those derived using the direct method.
comment: 28 pages
☆ A class of nonconvex semidefinite programming in which every KKT point is globally optimal
We consider a special class of nonconvex semidefinite programming problems and show that every point satisfying the Karush--Kuhn--Tucker (KKT) conditions is globally optimal despite nonconvexity. This property is related to pseudoconvex optimization. This class of problems is motivated by an eigenfrequency topology optimization problem in structural engineering, but is presented in a more general form.
comment: 10 pages, 1 figure
☆ A Minimalist Optimizer Design for LLM Pretraining
Training large language models (LLMs) typically relies on adaptive optimizers such as Adam, which require significant memory to maintain first- and second-moment matrices, known as optimizer states. While recent works such as GaLore, Fira, and APOLLO have proposed state-compressed variants to reduce memory consumption, a fundamental question remains: What is the minimal amount of optimizer state that is truly necessary to retain state-of-the-art performance in LLM pretraining? In this work, we systematically investigate this question using a bottom-up approach. We find that two memory- and compute-efficient optimization techniques are particularly effective: (1) column-wise gradient normalization significantly boosts the performance of plain SGD without requiring momentum; and (2) adding first-order momentum only to the output layer - where gradient variance is highest - yields performance competitive with fully adaptive methods such as Muon. Based on these insights, we propose SCALE (Stochastic Column-normalized Last-layer Momentum), a new optimizer that combines column-normalized SGD with last-layer momentum, where column normalization refers to normalizing the gradient along the output dimension. Across multiple LLaMA models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira, and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For the LLaMA 7B model, SCALE outperforms the state-of-the-art method APOLLO in terms of both perplexity and memory consumption. In addition, our method serves as a minimalist baseline for more sophisticated optimizer design.
♻ ☆ LieDetect: Detection of representation orbits of compact Lie groups from point clouds
We suggest a new algorithm to estimate representations of compact Lie groups from finite samples of their orbits. Different from other reported techniques, our method allows the retrieval of the precise representation type as a direct sum of irreducible representations. Moreover, the knowledge of the representation type permits the reconstruction of its orbit, which is useful for identifying the Lie group that generates the action, from a finite list of candidates. Our algorithm is general for any compact Lie group, but only instantiations for SO(2), T^d, SU(2), and SO(3) are considered. Theoretical guarantees of robustness in terms of Hausdorff and Wasserstein distances are derived. Our tools are drawn from geometric measure theory, computational geometry, and optimization on matrix manifolds. The algorithm is tested for synthetic data up to dimension 32, as well as real-life applications in image analysis, harmonic analysis, density estimation, equivariant neural networks, chemical conformational spaces, and classical mechanics systems, achieving very accurate results.
comment: To appear in Foundations of Computational Mathematics. 110 pages, 26 figures, 16 tables
♻ ☆ On the true detection probability of the uniformly optimal search plan
The gold standard for designing a search plan is to select a target distribution and then find the uniformly optimal search plan based on it. This approach has been successfully applied in several high-profile civil and military search missions. Since the target distribution is subjective and chosen at an analyst's discretion, it is natural to ask whether this approach can generate a search plan that maximizes the true detection probability at each moment. This article gives a negative answer by establishing that, under mild conditions, for a given target distribution and the uniformly optimal search plan based on it, there is another target distribution whose induced uniformly optimal search plan leads to an increased true detection probability at every moment. In particular, it implies that the problem of finding a search plan that maximizes the true detection probability at each moment remains unsolved.
comment: Please disregard earlier versions. In earlier versions, there was an error in the proof of Theorem 4.1. In this version, Theorem 4.1 has been generalized to regular detection functions and a new proof is given
♻ ☆ Particle Filter Optimization: A Bayesian Approach for Global Stochastic Optimization
This paper proposes a novel global optimization algorithm, Particle Filter-Based Optimization (PFO), designed for a class of stochastic optimization problems in which the objective function lacks an analytical form and is subject to noisy evaluations. PFO utilizes the Bayesian inference framework of Particle Filters (PF) by reformulating the optimization task as a state estimation problem. In this context, evaluations of the objective function are interpreted as measurements, and a customized transition model based on covariance ellipsoids is introduced to guide particle propagation. This model serves as a surrogate for classical acquisition functions, equipping the PF framework with local search capabilities and supporting efficient exploration of the global optimum. To mitigate the adverse effects of measurement noise, the Unscented Transform (UT) is employed to approximate the underlying mean of the objective function, enhancing the accuracy of particle updates. The algorithm offers notable improvements over existing stochastic optimization algorithms for black-box multi-modal objective functions. First, PFO provides a fully probabilistic definition of particle weights, enhancing adaptability and robustness. Second, PFO integrates exploration and exploitation within a unified Bayesian framework, ensuring a non-zero probability of sampling from unexplored regions throughout the optimization process. This approach contrasts with traditional particle filter methods that are primarily used for state estimation, and heuristic optimization algorithms that lack theoretical guarantees. The novelty of PFO lies in its unique integration of particle filtering with a dynamic search space prediction, offering a theoretically grounded alternative to acquisition functions in Bayesian Optimization (BO).
♻ ☆ EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is $O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$). Recently, Richt\'arik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum, and bidirectional compression. To the best of our knowledge, several of these techniques have not been previously analyzed in combination with EF, and in cases where prior analysis exists -- such as for bidirectional compression -- our theoretical convergence guarantees significantly improve upon existing results.
comment: Accepted for publication in JMLR. This extended version includes detailed proofs of several extensions and additional experimental results
♻ ☆ Safe Guaranteed Exploration for Non-linear Systems
Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. We improve the efficiency of this general framework by proposing an algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC leverages three key techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.
comment: Accepted paper in IEEE Transactions on Automatic Control, 2025
♻ ☆ Transport maps as flows of control-affine systems
We consider the problem of transporting \nota{one probability measure into another through} the flow of a given driftless control-affine system. Under suitable regularity conditions, the controllability of the system by means of open-loop controls is a sufficient condition for the existence of time-varying feedback controls such that the $1$-time flow of the system is the optimal transport map for the quadratic cost.
♻ ☆ COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework
In LLM alignment and many other ML applications, one often faces the Multi-Objective Fine-Tuning (MOFT) problem, i.e., fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose a Conditioned One-Shot fine-tuning framework (COS-DPO) that extends the Direct Preference Optimization technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By direct conditioning on the weight across auxiliary objectives, our Weight-COS-DPO method enjoys an efficient one-shot training process for profiling the Pareto front and is capable of achieving comprehensive trade-off solutions even in the post-training stage. Based on our theoretical findings on the linear transformation properties of the loss function, we further propose the Temperature-COS-DPO method that augments the temperature parameter to the model input, enhancing the flexibility of post-training control over the trade-offs between the main and auxiliary objectives. We demonstrate the effectiveness and efficiency of the COS-DPO framework through its applications to various tasks, including the Learning-to-Rank (LTR) and LLM alignment tasks, highlighting its viability for large-scale ML deployments.
comment: Published at UAI 2025
♻ ☆ New global Carleman estimates and null controllability for forward/backward semi-linear parabolic SPDEs
In this paper, we study the null controllability for parabolic SPDEs involving both the state and the gradient of the state. To start with, an improved global Carleman estimate for linear forward (resp. backward) parabolic SPDEs with general random coefficients and square-integrable source terms is derived. Based on this, we further develop a new global Carleman estimate for linear forward (resp. backward) parabolic SPDEs with source terms in the Sobolev space of negative order, which enables us to deal with the global null controllability for linear backward (resp. forward) parabolic SPDEs with gradient terms. As a byproduct, a special weighted energy-type estimate for the controlled system that explicitly depends on the parameters $\lambda,\mu$ and the weighted function $\theta$ is obtained, which makes it possible to extend the linear null controllability to semi-linear backward (resp. forward) parabolic SPDEs by applying the fixed-point argument in an appropriate Banach space.
♻ ☆ Kernel-based error bounds of bilinear Koopman surrogate models for nonlinear data-driven control
We derive novel deterministic bounds on the approximation error of data-based bilinear surrogate models for unknown nonlinear systems. The surrogate models are constructed using kernel-based extended dynamic mode decomposition to approximate the Koopman operator in a reproducing kernel Hilbert space. Unlike previous methods that require restrictive assumptions on the invariance of the dictionary, our approach leverages kernel-based dictionaries that allow us to control the projection error via pointwise error bounds, overcoming a significant limitation of existing theoretical guarantees. The derived state- and input-dependent error bounds allow for direct integration into Koopman-based robust controller designs with closed-loop guarantees for the unknown nonlinear system. Numerical examples illustrate the effectiveness of the proposed framework.
comment: Accepted for publication in IEEE Control Systems Letters (L-CSS)
♻ ☆ On the Computation of the Efficient Frontier in Advanced Sparse Portfolio Optimization
In this work, we deal with the problem of computing a comprehensive front of efficient solutions in multi-objective portfolio optimization problems in presence of sparsity constraints. We start the discussion pointing out some weaknesses of the classical linear scalarization approach when applied to the considered class of problems. We are then motivated to propose a suitable algorithmic framework that is designed to overcome these limitations: the novel algorithm combines a gradient-based exploration-refinement strategy with a tailored initialization scheme based on memetic or multi-start descent procedures. Thorough computational experiments highlight how the proposed method is far superior to both linear scalarization and popular genetic algorithms.
♻ ☆ Moment Constrained Optimal Transport for Control Applications
This paper concerns the application of techniques from optimal transport (OT) to mean field control, in which the probability measures of interest in OT correspond to empirical distributions associated with a large collection of controlled agents. The control objective of interest motivates a one-sided relaxation of OT, in which the first marginal is fixed and the second marginal is constrained to a moment class: a set of probability measures defined by generalized moment constraints. This relaxation is particularly interesting for control problems as it enables the coordination of agents without the need to know the desired distribution beforehand. The inclusion of an entropic regularizer is motivated by both computational considerations, and also to impose hard constraints on agent behavior. A computational approach inspired by the Sinkhorn algorithm is proposed to solve this problem. This new approach to distributed control is illustrated with an application of charging a fleet of electric vehicles while satisfying grid constraints. An online version is proposed and applied in a case study on the ElaadNL dataset containing 10,000 EV charging transactions in the Netherlands. This empirical validation demonstrates the effectiveness of the proposed approach to optimizing flexibility while respecting grid constraints.
♻ ☆ IID Prophet Inequality with Random Horizon: Going Beyond Increasing Hazard Rates
Prophet inequalities are a central object of study in optimal stopping theory. In the iid model, a gambler sees values in an online fashion, sampled independently from a given distribution. Upon observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is maximising the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations. This model has been studied with infinite, finite and unknown number of values. When the gambler faces a random number of values, the model is said to have random horizon. We consider the model in which the gambler is given a priori knowledge of the horizon's distribution. Alijani et al. (2020) designed a single-threshold algorithm achieving a ratio of $1/2$ when the random horizon has an increasing hazard rate and is independent of the values. We prove that with a single threshold, a ratio of $1/2$ is actually achievable for several larger classes of horizon distributions, with the largest being known as the $\mathcal{G}$ class in reliability theory. Moreover, we show that this does not extend to its dual, the $\overline{\mathcal{G}}$ class (which includes the decreasing hazard rate class), while it can be extended to low-variance horizons. Finally, we construct the first example of a family of horizons, for which multiple thresholds are necessary to achieve a nonzero ratio. We establish that the Secretary Problem optimal stopping rule provides one such algorithm, paving the way towards the study of the model beyond single-threshold algorithms.
comment: 47 pages, 1 figure
♻ ☆ Solving a class of stochastic optimal control problems by physics-informed neural networks
The aim of this work is to develop a deep learning method for solving high-dimensional stochastic control problems based on the Hamilton--Jacobi--Bellman (HJB) equation and physics-informed learning. Our approach is to parameterize the feedback control and the value function using a decoupled neural network with multiple outputs. We train this network by using a loss function with penalty terms that enforce the HJB equation along the sampled trajectories generated by the controlled system. More significantly, numerical results on various applications are carried out to demonstrate that the proposed approach is efficient and applicable.
comment: 8 pages
♻ ☆ Computing and Learning Stationary Mean Field Equilibria with Scalar Interactions: Algorithms and Applications
Mean field equilibrium (MFE) has emerged as a computationally tractable solution concept for large dynamic games. However, computing MFE remains challenging due to nonlinearities and the absence of contraction properties, limiting its reliability for counterfactual analysis and comparative statics. This paper focuses on MFE in dynamic models where agents interact through a scalar function of the population distribution, referred to as the scalar interaction function. Such models naturally arise in a wide range of applications involving market dynamics and strategic competition. The main contribution of this paper is to introduce iterative algorithms that leverage the scalar interaction structure and are guaranteed to converge to the MFE under mild assumptions. Leveraging this structure, we also establish an MFE existence result for non-compact state spaces and analytical comparative statics. To the best of our knowledge, these are the first algorithms with global convergence guarantees in such settings. Unlike existing approaches, our algorithms do not rely on monotonicity or contraction properties, significantly broadening their applicability. Furthermore, we provide a model-free algorithm that learns the MFE via simulation and reinforcement learning techniques such as Q-learning and policy gradient methods without requiring prior knowledge of payoff or transition functions. We apply our algorithms to classic models of dynamic competition, such as capacity competition, and to competitive models motivated by online marketplaces, including ridesharing and inventory competition, as well as to social learning models. We show how key market parameters influence equilibrium outcomes through reliable comparative statics in these representative models, providing insights into the design of competitive systems.
♻ ☆ A multiscale Consensus-Based algorithm for multi-level optimization
A novel multiscale consensus-based optimization (CBO) algorithm for solving bi- and tri-level optimization problems is introduced. Existing CBO techniques are generalized by the proposed method through the employment of multiple interacting populations of particles, each of which is used to optimize one level of the problem. These particle populations are evolved through multiscale-in-time dynamics, which are formulated as a singularly perturbed system of stochastic differential equations. Theoretical convergence analysis for the multiscale CBO model to an averaged effective dynamics as the time-scale separation parameter approaches zero is provided. The resulting algorithm is presented for both bi-level and tri-level optimization problems. The effectiveness of the approach in tackling complex multi-level optimization tasks is demonstrated through numerical experiments on various benchmark functions. Additionally, it is shown that the proposed method performs well on min-max optimization problems, comparing favorably with existing CBO algorithms for saddle point problems.
comment: Published in: Mathematical Models and Methods in Applied Sciences
♻ ☆ Worst-Case Services and State-Based Scheduling
In this paper, we shed new light on a classical scheduling problem: given a slot-timed, constant-capacity server, what short-run scheduling decisions must be made to provide long-run service guarantees to competing flows of unit-sized tasks? We model each flow's long-run guarantee as a worst-case service that maps each queued arrival vector recording the flow's cumulative task arrivals, including those initially queued, to a worst-case acceptable departure vector lower-bounding its cumulative served tasks. We show that these maps are states that can be updated as tasks arrive and are served, introduce state-based scheduling, find the schedulability condition necessary and sufficient to maintain all flows' long-run guarantees, and use this condition to identify all short-run scheduling decisions that preserve schedulability. Our framework is general but computationally complex. To reduce complexity, we consider three specializations. First, we show that when satisfactory short-run scheduling decisions exist, at least one can be efficiently identified by maximizing the server's capacity slack, a generalization of the earliest-deadline-first rule. Second, we show that a special class of worst-case services, min-plus services, can be efficiently specified and updated using properties of the min-plus algebra. Finally, we show that efficiency can be further improved by restricting attention to a min-plus service subclass, dual-curve services. This last specialization turns out to be a dynamic extension of service curves that maintains all essential features of our framework while approaching near practical viability.
♻ ☆ A BSDE approach to the asymmetric risk-sensitive optimization and its applications
This paper is devoted to proposing a new asymmetric risk-sensitive criterion involving different risk attitudes toward varying risk sources. The criterion can only be defined through the initial value of the minimal solutions of quadratic backward stochastic differential equations (BSDEs). Before uncovering the mean-variance representation for the introduced criterion by the variational approach, some axioms are given for the first time to characterize a variance decomposition of square integrable random variables. The stochastic control problems under this criterion are described as a kind of stochastic recursive control problems that includes controlled quadratic BSDEs. An asymmetric risk-sensitive global stochastic maximum principle is derived when the quadratic BSDEs are equipped with bounded data. A closed-form solution of a stochastic linear-quadratic risk-sensitive control problem is obtained by introducing a novel completion-of-squares technique for controlled quadratic BSDEs. In addition, a dynamic portfolio optimization problem featuring a stochastic return rate is provided as an application of the asymmetric risk-sensitive control.
comment: 28 pages; All the authors contributed equally to this work
♻ ☆ Integral Form of Legendre-Gauss-Lobatto Collocation for Optimal Control
A new method is described for solving optimal control problems using direct collocation at Legendre-Gauss-Lobatto points. The approach of this paper employs a polynomial approximation of the right-hand side vector field of the differential equations and leads to the following important outcomes. First, the first-order optimality conditions of the LGL integral form are derived, which lead to a full-rank transformed adjoint system and novel costate estimate. Next, a derivative-like form of the LGL collocation method is obtained by multiplying the system by the inverse of an appropriate full-rank block of the integration matrix. The first-order optimality conditions of the LGL derivative-like form are then derived, leading to an equivalent full-rank transformed adjoint system and secondary novel costate estimate which is related to the costate estimate of the integral form via a linear transformation. Then, it is shown that a second integral form can be constructed by including an additional noncollocated support point, but such a point is superfluous and has no impact on the solution to the nonlinear programming problem. Finally, the method is demonstrated on two benchmark problems: a one-dimensional initial value optimal control problem with an analytic solution and a time-variant orbit raising optimal control problem.
comment: 28 pages, 6 figures, 1 table
♻ ☆ Cooperation and the Design of Public Goods
We consider the cooperative elements that arise in the design of public goods, such as transportation policies and infrastructure. These involve a variety of stakeholders: governments, businesses, advocates, and users. Their eventual deployment depends on the decision maker's ability to garner sufficient support from each of these groups; we formalize these strategic requirements from the perspective of cooperative game theory. Specifically, we introduce non-transferable utility, linear production (NTU LP) games, which combine the game-theoretic tensions inherent in public decision-making with the modeling flexibility of linear programming. We derive structural properties regarding the non-emptiness, representability and complexity of the core, a solution concept that models the viability of cooperation. In particular, we provide fairly general sufficient conditions under which the core of an NTU LP game is guaranteed to be non-empty, prove that determining membership in the core is co-NP-complete, and develop a cutting plane algorithm to optimize various social welfare objectives subject to core membership. Lastly, we apply these results in a data-driven case study on service plan optimization for the Chicago bus system. As our study illustrates, cooperation is necessary for the successful deployment of transportation service plans and similar public goods, but it may also have adverse or counterintuitive distributive implications.
comment: 26th ACM Conference on Economics and Computation (EC '25)
Systems and Control 27
☆ Power Handling Improvement in Cross-Sectional Lame Mode Resonators Operating in the Ku-band
This study presents power handling improvements in cross-sectional Lame-Mode Resonators (CLMRs) designed for operation in the Ku-band. Previously fabricated CLMR devices failed at approximately 8 dBm of input power, primarily due to electromigration in the aluminum interdigitated electrodes (IDTs). To better understand this mechanism in CLMRs, a data driven thermal model is developed to analyze localized heating effects within the resonator body, which are known to accelerate electromigration. Based on insights from this model, Aluminum Silicon Copper (AlSiCu) was selected for the IDTs due to its superior thermal stability and resistance to electromigration. Devices fabricated with AlSiCu exhibited no signs of performance degradation, with the best-performing resonator achieving a mechanical quality factor (Qm) of 360, a maximum Bode quality factor (QBode) of 500, and an electromechanical coupling coefficient (kt2) of 6.3%. Moreover, the use of AlSiCu significantly increased the maximum input power the device can withstand, showing an improvement of up to 6 dBm over previous devices. These improvements in power handling make the devices strong candidates for high-power Ku-band filtering applications.
comment: 5 pages, 3 figures, IFCS2025 Queretaro
☆ DRIVE Through the Unpredictability:From a Protocol Investigating Slip to a Metric Estimating Command Uncertainty
Off-road autonomous navigation is a challenging task as it is mainly dependent on the accuracy of the motion model. Motion model performances are limited by their ability to predict the interaction between the terrain and the UGV, which an onboard sensor can not directly measure. In this work, we propose using the DRIVE protocol to standardize the collection of data for system identification and characterization of the slip state space. We validated this protocol by acquiring a dataset with two platforms (from 75 kg to 470 kg) on six terrains (i.e., asphalt, grass, gravel, ice, mud, sand) for a total of 4.9 hours and 14.7 km. Using this data, we evaluate the DRIVE protocol's ability to explore the velocity command space and identify the reachable velocities for terrain-robot interactions. We investigated the transfer function between the command velocity space and the resulting steady-state slip for an SSMR. An unpredictability metric is proposed to estimate command uncertainty and help assess risk likelihood and severity in deployment. Finally, we share our lessons learned on running system identification on large UGV to help the community.
comment: This version is the preprint of a journal article with the same title, accepted in the IEEE Transactions on Field Robotics. To have a look at the early access version, use the following link https://ieeexplore.ieee.org/document/11037776
☆ Online Feedback Optimization for Monotone Systems without Timescale Separation
Online Feedback Optimization (OFO) steers a dynamical plant to a cost-efficient steady-state, only relying on input-output sensitivity information, rather than on a full plant model. Unlike traditional feedforward approaches, OFO leverages real-time measurements from the plant, thereby inheriting the robustness and adaptability of feedback control. Unfortunately, existing theoretical guarantees for OFO assumes that the controller operates on a slower timescale than the plant, which can affect responsiveness and transient performance. In this paper, we focus on relaxing this ``timescale separation'' assumption. Specifically, we consider the class of monotone systems, and we prove that OFO can achieve an optimal operating point, regardless of the time constants of controller and plant. By leveraging a small gain theorem for monotone systems, we derive several sufficient conditions for global convergence. Notably, these conditions depend only on the steady-state behavior of the plant, and they are entirely independent of the transient dynamics.
☆ An Optimization-Augmented Control Framework for Single and Coordinated Multi-Arm Robotic Manipulation IROS 2025
Robotic manipulation demands precise control over both contact forces and motion trajectories. While force control is essential for achieving compliant interaction and high-frequency adaptation, it is limited to operations in close proximity to the manipulated object and often fails to maintain stable orientation during extended motion sequences. Conversely, optimization-based motion planning excels in generating collision-free trajectories over the robot's configuration space but struggles with dynamic interactions where contact forces play a crucial role. To address these limitations, we propose a multi-modal control framework that combines force control and optimization-augmented motion planning to tackle complex robotic manipulation tasks in a sequential manner, enabling seamless switching between control modes based on task requirements. Our approach decomposes complex tasks into subtasks, each dynamically assigned to one of three control modes: Pure optimization for global motion planning, pure force control for precise interaction, or hybrid control for tasks requiring simultaneous trajectory tracking and force regulation. This framework is particularly advantageous for bimanual and multi-arm manipulation, where synchronous motion and coordination among arms are essential while considering both the manipulated object and environmental constraints. We demonstrate the versatility of our method through a range of long-horizon manipulation tasks, including single-arm, bimanual, and multi-arm applications, highlighting its ability to handle both free-space motion and contact-rich manipulation with robustness and precision.
comment: 8 pages, 8 figures, accepted for oral presentation at IROS 2025. Supplementary site: https://sites.google.com/view/komo-force/home
☆ BIDA: A Bi-level Interaction Decision-making Algorithm for Autonomous Vehicles in Dynamic Traffic Scenarios
In complex real-world traffic environments, autonomous vehicles (AVs) need to interact with other traffic participants while making real-time and safety-critical decisions accordingly. The unpredictability of human behaviors poses significant challenges, particularly in dynamic scenarios, such as multi-lane highways and unsignalized T-intersections. To address this gap, we design a bi-level interaction decision-making algorithm (BIDA) that integrates interactive Monte Carlo tree search (MCTS) with deep reinforcement learning (DRL), aiming to enhance interaction rationality, efficiency and safety of AVs in dynamic key traffic scenarios. Specifically, we adopt three types of DRL algorithms to construct a reliable value network and policy network, which guide the online deduction process of interactive MCTS by assisting in value update and node selection. Then, a dynamic trajectory planner and a trajectory tracking controller are designed and implemented in CARLA to ensure smooth execution of planned maneuvers. Experimental evaluations demonstrate that our BIDA not only enhances interactive deduction and reduces computational costs, but also outperforms other latest benchmarks, which exhibits superior safety, efficiency and interaction rationality under varying traffic conditions.
comment: 6 pages, 3 figures, 4 tables, accepted for IEEE Intelligent Vehicles (IV) Symposium 2025
☆ eCAV: An Edge-Assisted Evaluation Platform for Connected Autonomous Vehicles
As autonomous vehicles edge closer to widespread adoption, enhancing road safety through collision avoidance and minimization of collateral damage becomes imperative. Vehicle-to-everything (V2X) technologies, which include vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-cloud (V2C), are being proposed as mechanisms to achieve this safety improvement. Simulation-based testing is crucial for early-stage evaluation of Connected Autonomous Vehicle (CAV) control systems, offering a safer and more cost-effective alternative to real-world tests. However, simulating large 3D environments with many complex single- and multi-vehicle sensors and controllers is computationally intensive. There is currently no evaluation framework that can effectively evaluate realistic scenarios involving large numbers of autonomous vehicles. We propose eCAV -- an efficient, modular, and scalable evaluation platform to facilitate both functional validation of algorithmic approaches to increasing road safety, as well as performance prediction of algorithms of various V2X technologies, including a futuristic Vehicle-to-Edge control plane and correspondingly designed control algorithms. eCAV can model up to 256 vehicles running individual control algorithms without perception enabled, which is $8\times$ more vehicles than what is possible with state-of-the-art alternatives. %faster than state-of-the-art alternatives that can simulate $8\times$ fewer vehicles. With perception enabled, eCAV simulates up to 64 vehicles with a step time under 800ms, which is $4\times$ more and $1.5\times$ faster than the state-of-the-art OpenCDA framework.
☆ Closed-Loop Control of Electrical Stimulation through Spared Motor Unit Ensembles Restores Foot Movements after Spinal Cord Injury
Restoring movement of a paralyzed foot is a key challenge in helping individuals with neurological conditions such as spinal cord injury (SCI) to improve their quality of life. Neuroprostheses based on functional electrical stimulation (FES) can restore the physiological range of motion by stimulating the affected muscles using surface electrodes. We have previously shown that, despite chronic motor-complete SCI, it is possible to capture paralyzed hand movements in individuals with tetraplegia using spared and modulated motor unit (MU) activity decoded with non-invasive electromyography (EMG) sensors. This study investigated whether a wearable high-density surface EMG system could capture and control paralyzed foot kinematics in closed-loop control with an FES system. We found that all our participants with SCI (2 with chronic SCI and 3 with acute SCI) retained distinct spared EMG activity for at least three ankle movements, which allowed them to reliably control a digital cursor using their spared tibialis anterior and triceps surae MU activity. Movement separability was further reconfirmed by extracting task-modulated MU activity during foot flexion/extension (3-7 modulated MUs/participant). Three participants were further able to modulate and maintain their foot flexion/extension EMG levels with an accuracy of >70%. Lastly, we show that real-time control of a FES system using EMG from the affected limb can restore foot movements in a highly intuitive way, significantly improving the lost or pathological foot range of motion. Our system provides an intuitive approach for closed-loop control of FES that has the potential to assist individuals with SCI in regaining lost motor functions.
comment: 26 pages, 7 figures
☆ Emission-Aware Operation of Electrical Energy Storage Systems
Since the beginning of this century, there has been a growing body of research and developments supporting the participation of energy storage systems (ESS) in the emission reduction mandates. However, regardless of these efforts and despite the need for an accelerated energy transition, we have yet to see a practical framework for operational carbon accounting and credit trading for energy storage systems. In this context, this paper proposes an emission performance credits (EPCs) framework that allows ESS, down to the prosumer level, to participate in the carbon market. Thus, a mechanism is proposed, for the first time, to calculate the grid's real-time marginal emission intensity (MEI). The MEI is then used to optimize the cumulative operational emission of ESS through carbon-aware dispatch. Consequently, the framework tracks the operational emissions and converts them into EPCs, which are then sold to regulated entities under compliance programs. Simulation results support the potential of ESS, regardless of their size, to participate in the broader carbon mitigation objectives.
☆ Full-Pose Tracking via Robust Control for Over-Actuated Multirotors
This paper presents a robust cascaded control architecture for over-actuated multirotors. It extends the Incremental Nonlinear Dynamic Inversion (INDI) control combined with structured H_inf control, initially proposed for under-actuated multirotors, to a broader range of multirotor configurations. To achieve precise and robust attitude and position tracking, we employ a weighted least-squares geometric guidance control allocation method, formulated as a quadratic optimization problem, enabling full-pose tracking. The proposed approach effectively addresses key challenges, such as preventing infeasible pose references and enhancing robustness against disturbances, as well as considering multirotor's actual physical limitations. Numerical simulations with an over-actuated hexacopter validate the method's effectiveness, demonstrating its adaptability to diverse mission scenarios and its potential for real-world aerial applications.
☆ State-Space Kolmogorov Arnold Networks for Interpretable Nonlinear System Identification
While accurate, black-box system identification models lack interpretability of the underlying system dynamics. This paper proposes State-Space Kolmogorov-Arnold Networks (SS-KAN) to address this challenge by integrating Kolmogorov-Arnold Networks within a state-space framework. The proposed model is validated on two benchmark systems: the Silverbox and the Wiener-Hammerstein benchmarks. Results show that SS-KAN provides enhanced interpretability due to sparsity-promoting regularization and the direct visualization of its learned univariate functions, which reveal system nonlinearities at the cost of accuracy when compared to state-of-the-art black-box models, highlighting SS-KAN as a promising approach for interpretable nonlinear system identification, balancing accuracy and interpretability of nonlinear system dynamics.
comment: Accepted for IEEE Control Systems Letters
☆ Detailed Small-Signal Stability Analysis of the Cigré High-Voltage Network Penetrated by Grid-Following Inverter-Based Resources
This paper presents a detailed small-signal stability analysis of a modified version of the Cigr\'e European high-voltage network, where one of the synchronous generators is replaced by a grid-following inverter-based resource (IBR). The analysis focuses on the influence of the parameters defining the grid-following IBR control scheme on the stability of the system. Given a set of potential grid configurations and the value of the IBR control parameters, stability is verified by the direct eigenvalue analysis of a high-detailed linearized model of the overall Cigr\'e network. Starting from this procedure, we propose an adaptive sampling method for training a support vector machine classifier able to estimate the probability of stability of the power system over a domain defined by candidate intervals of the considered parameters. The training of the classifier is refined to identify with more accuracy the boundaries of the parameters' stability regions. The obtained results are then compared with those obtained by representing the grid with the classical Th\'evenin equivalent. Results suggest that, when the Th\'evenin equivalent is accurate, the predicted stability region is conservative yet contained within that of the full network.
☆ Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency
This study explores the effectiveness of predictive maintenance models and the optimization of intelligent Operation and Maintenance (O&M) systems in improving wind power generation efficiency. Through qualitative research, structured interviews were conducted with five wind farm engineers and maintenance managers, each with extensive experience in turbine operations. Using thematic analysis, the study revealed that while predictive maintenance models effectively reduce downtime by identifying major faults, they often struggle with detecting smaller, gradual failures. Key challenges identified include false positives, sensor malfunctions, and difficulties in integrating new models with older turbine systems. Advanced technologies such as digital twins, SCADA systems, and condition monitoring have significantly enhanced turbine maintenance practices. However, these technologies still require improvements, particularly in AI refinement and real-time data integration. The findings emphasize the need for continuous development to fully optimize wind turbine performance and support the broader adoption of renewable energy.
comment: 7 pages, 3 figures
☆ Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Hard constraints in reinforcement learning (RL), whether imposed via the reward function or the model architecture, often degrade policy performance. Lagrangian methods offer a way to blend objectives with constraints, but often require intricate reward engineering and parameter tuning. In this work, we extend recent advances that connect Hamilton-Jacobi (HJ) equations with RL to propose two novel value functions for dual-objective satisfaction. Namely, we address: (1) the Reach-Always-Avoid problem - of achieving distinct reward and penalty thresholds - and (2) the Reach-Reach problem - of achieving thresholds of two distinct rewards. In contrast with temporal logic approaches, which typically involve representing an automaton, we derive explicit, tractable Bellman forms in this context by decomposing our problem into reach, avoid, and reach-avoid problems, as to leverage these aforementioned recent advances. From a mathematical perspective, the Reach-Always-Avoid and Reach-Reach problems are complementary and fundamentally different from standard sum-of-rewards problems and temporal logic problems, providing a new perspective on constrained decision-making. We leverage our analysis to propose a variation of Proximal Policy Optimization (DO-HJ-PPO), which solves these problems. Across a range of tasks for safe-arrival and multi-target achievement, we demonstrate that DO-HJ-PPO produces qualitatively distinct behaviors from previous approaches and out-competes a number of baselines in various metrics.
♻ ☆ Dynamics and Optimal Control of State-Triggered Affine Hybrid Systems
A study of the dynamics and control for linear and affine hybrid systems subjected to either temporally- or spatially-triggered resets is presented. Hybrid trajectories are capable of degeneracies not found in continuous-time systems namely beating, blocking, and Zeno. These pathologies are commonly avoided by enforcing a lower bound on the time between events. While this constraint is straightforward to implement for temporally-triggered resets, it is impossible to do so for spatially-triggered systems. In particular, linear spatially-triggered hybrid systems always posses trajectories that are beating and blocking while affine systems may also include Zeno trajectories. The hybrid Pontryagin maximum principle is studied in the context of affine hybrid systems. The existence/uniqueness of the induced co-state jump conditions is studied which introduces the notion of strongly and weakly actuated resets. Finally, optimal control in the context of beating and Zeno is discussed. This work concludes with numerical examples.
comment: 30 pages, 8 figures. Comments welcome!
♻ ☆ Local Differential Privacy for Distributed Stochastic Aggregative Optimization with Guaranteed Optimality
Distributed aggregative optimization underpins many cooperative optimization and multi-agent control systems, where each agent's objective function depends both on its local optimization variable and an aggregate of all agents' optimization variables. Existing distributed aggregative optimization approaches typically require access to accurate gradients of the objective functions, which, however, are often hard to obtain in real-world applications. For example, in machine learning, gradients are commonly contaminated by two main sources of noise: the randomness inherent in sampled data, and the additional variability introduced by mini-batch computations. In addition to the issue of relying on accurate gradients, existing distributed aggregative optimization approaches require agents to share explicit information, which could breach the privacy of participating agents. We propose an algorithm that can solve both problems with existing distributed aggregative optimization approaches: not only can the proposed algorithm guarantee mean-square convergence to an exact optimal solution when the gradients are subject to noise, it also simultaneously ensures rigorous differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. To the best of our knowledge, this is the first algorithm able to guarantee both accurate convergence and rigorous differential privacy in distributed aggregative optimization. Besides characterizing the convergence rates under nonconvex/convex/strongly convex conditions, we also rigorously quantify the cost of differential privacy in terms of convergence rates. Experimental results on personalized machine learning using benchmark datasets confirm the efficacy of the proposed algorithm.
comment: 21 pages, 6 figures
♻ ☆ Model Reduction of a Flexible Nonsmooth Oscillator Recovers its Entire Bifurcation Structure
We study the reduced order modeling of a piecewise-linear, globally nonlinear flexible oscillator in which a Bernoulli-Euler beam is subjected to a position-triggered kick force and a piecewise restoring force at its tip. The nonsmooth boundary conditions, which determine different regions of a hybrid phase space, can generally be expected to excite many degrees of freedom. With kick strength as parameter, the system's bifurcation diagram is found to exhibit a range of periodic and chaotic behaviors. Proper orthogonal decomposition (POD) is used to obtain a single set of global basis functions spanning all of the hybrid regions. The reduced order model (ROM) dimension is chosen using previously developed energy closure analysis, ensuring approximate energy balance on the reduced subspace. This yields accurate ROMs with 8 degrees of freedom. Remarkably, we find that ROMs formulated using using data from individual periodic steady states can nevertheless be used to reconstruct the entire bifurcation structure of the original system without updating. This demonstrates that, despite being constructed with steady state data, the ROMs model sufficiently small transients with enough accuracy to permit using simple continuation for the bifurcation diagram. We also find ROM subspaces obtained for different values of the bifurcation parameter are essentially identical. Thus, POD augmented with energy closure analysis is found to reliably yield effective dimension estimates and ROMs for this nonlinear, nonsmooth system that are robust across stability transitions, including even period doubling cascades to chaos, thereby greatly reducing data requirements and computational costs.
comment: 32 pages, 8 figures
♻ ☆ Driving Towards Stability and Efficiency: A Variable Time Gap Strategy for Adaptive Cruise Control
Automated vehicle technologies offer a promising avenue for enhancing traffic efficiency, safety, and energy consumption. Among these, Adaptive Cruise Control (ACC) systems stand out as a prevalent form of automation on today's roads, with their time gap settings holding paramount importance. While decreasing the average time headway tends to enhance traffic capacity, it simultaneously raises concerns regarding safety and string stability. This study introduces a novel variable time gap feedback control policy aimed at striking a balance between maintaining a minimum time gap setting under equilibrium car-following conditions, thereby improving traffic capacity, while ensuring string stability to mitigate disturbances away from the equilibrium flow. Leveraging nonlinear $H_\infty$ control technique, the strategy employs a variable time gap component as the manipulated control signal, complemented by a constant time gap component that predominates during car-following equilibrium. The effectiveness of the proposed scheme is evaluated against its constant time-gap counterpart calibrated using field platoon data from the OpenACC dataset. Through numerical and traffic simulations, our findings illustrate that the proposed algorithm effectively dampens perturbations within vehicle platoons, leading to a more efficient and safer mixed traffic flow.
♻ ☆ Observations on robust diffusive stability and common Lyapunov functions
We consider the problem of robust diffusive stability (RDS) for a pair of coupled stable discrete-time positive linear-time invariant (LTI) systems. We first show that the existence of a common diagonal Lyapunov function is sufficient for RDS and highlight how this condition differs from recent results using linear copositive Lyapunov functions. We also present an extension of these results, showing that the weaker condition of \emph{joint} linear copositive function existence is also sufficient for RDS. Finally, we present two results on RDS for extended Leslie matrices arising in population dynamics.
comment: Introduction reworded and preliminaries section added for clarity. New result on joint Lyapunov functions added in Section 4 as well as some new references
♻ ☆ Discrete nonlinear elastodynamics in a port-Hamiltonian framework
We provide a fully nonlinear port-Hamiltonian formulation for discrete elastodynamical systems as well as a structure-preserving time discretization. The governing equations are obtained in a variational manner and represent index-1 differential algebraic equations. Performing an index reduction one obtains the port-Hamiltonian state space model, which features the nonlinear strains as an independent state next to position and velocity. Moreover, hyperelastic material behavior is captured in terms of a nonlinear stored energy function. The model exhibits passivity and losslessness and has an underlying symmetry yielding the conservation of angular momentum. We perform temporal discretization using the midpoint discrete gradient, such that the beneficial properties are inherited by the developed time stepping scheme in a discrete sense. The numerical results obtained in a representative example are demonstrated to validate the findings.
comment: 9 pages, 7 figures, Submitted to Proceedings in Applied Mathematics and Mechanics 2023
♻ ☆ JammingSnake: A follow-the-leader continuum robot with variable stiffness based on fiber jamming
Follow-the-leader (FTL) motion is essential for continuum robots operating in fragile and confined environments. It allows the robot to exert minimal force on its surroundings, reducing the risk of damage. This paper presents a novel design of a snake-like robot capable of achieving FTL motion by integrating fiber jamming modules (FJMs). The proposed robot can dynamically adjust its stiffness during propagation and interaction with the environment. An algorithm is developed to independently control the tendon and FJM insertion movements, allowing the robot to maintain its shape while minimizing the forces exerted on surrounding structures. To validate the proposed design, comparative tests were conducted between a traditional tendon-driven robot and the novel design under different configurations. The results demonstrate that our design relies significantly less on contact with the surroundings to maintain its shape. This highlights its potential for safer and more effective operations in delicate environments, such as minimally invasive surgery (MIS) or industrial in-situ inspection.
comment: 8 pages, 4 figures, published in T-MECH
♻ ☆ Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments
The rapid detection of abnormal body temperatures in urban populations is essential for managing public health risks, especially during outbreaks of infectious diseases. Multi-drone thermal screening systems offer promising solutions for fast, large-scale, and non-intrusive human temperature monitoring. However, trajectory planning for multiple drones in complex urban environments poses significant challenges, including collision avoidance, coverage efficiency, and constrained flight environments. In this study, we propose an enhanced trust region sequential convex optimization (TR-SCO) algorithm for optimal trajectory planning of multiple drones performing thermal screening tasks. Our improved algorithm integrates a refined convex optimization formulation within a trust region framework, effectively balancing trajectory smoothness, obstacle avoidance, altitude constraints, and maximum screening coverage. Simulation results demonstrate that our approach significantly improves trajectory optimality and computational efficiency compared to conventional convex optimization methods. This research provides critical insights and practical contributions toward deploying efficient multi-drone systems for real-time thermal screening in urban areas. For reader who are interested in our research, we release our source code at https://github.com/Cherry0302/Enhanced-TR-SCO.
♻ ☆ A Robust Optimization Framework for Flexible Industrial Energy Scheduling: Application to a Cement Plant with Market Participation
This paper presents a scenario based robust optimization framework for short term energy scheduling in electricity intensive industrial plants, explicitly addressing uncertainty in planning decisions. The model is formulated as a two-stage Mixed Integer Linear Program (MILP) and integrates a hybrid scenario generation method capable of representing uncertain inputs such as electricity prices, renewable generation, and internal demand. A convex objective function combining expected and worst case operational costs allows for tunable risk aversion, enabling planners to balance economic performance and robustness. The resulting schedule ensures feasibility across all scenarios and supports coordinated use of industrial flexibility assets, including battery energy storage and shiftable production. To isolate the effects of market volatility, the framework is applied to a real world cement manufacturing case study considering only day-ahead electricity price uncertainty, with all other inputs treated deterministically. Results show improved resilience to forecast deviations, reduced cost variability, and more consistent operations. The proposed method offers a scalable and risk-aware approach for industrial flexibility planning under uncertainty.
♻ ☆ Conservative Linear Envelopes for Nonlinear, High-Dimensional, Hamilton-Jacobi Reachability
Hamilton-Jacobi reachability (HJR) provides a value function that encodes the set of states from which a system with bounded control inputs can reach or avoid a target despite any bounded disturbance, and the corresponding robust, optimal control policy. Though powerful, traditional methods for HJR rely on dynamic programming (DP) and suffer from exponential computation growth with respect to state dimension. The recently favored Hopf formula mitigates this ``curse of dimensionality'' by providing an efficient and space-parallelizable approach for solving the reachability problem. However, the Hopf formula can only be applied to linear time-varying systems. To overcome this limitation, we show that the error between a nonlinear system and a linear model can be transformed into an adversarial bounded artificial disturbance. One may then solve the dimension-robust generalized Hopf formula for a linear game with this ``antagonistic error" to perform guaranteed conservative reachability analysis and control synthesis of nonlinear systems; this can be done for problem formulations in which no other HJR method is both computationally feasible and guaranteed. In addition, we offer several technical methods for reducing conservativeness in the analysis. We demonstrate the effectiveness of our results through one illustrative example (the controlled Van der Pol system) that can be compared to standard DP, and one higher-dimensional 15D example (a 5-agent pursuit-evasion game with Dubins cars).
♻ ☆ Multi-agent Multi-armed Bandits with Minimum Reward Guarantee Fairness
We investigate the problem of maximizing social welfare while ensuring fairness in a multi-agent multi-armed bandit (MA-MAB) setting. In this problem, a centralized decision-maker takes actions over time, generating random rewards for various agents. Our goal is to maximize the sum of expected cumulative rewards, a.k.a. social welfare, while ensuring that each agent receives an expected reward that is at least a constant fraction of the maximum possible expected reward. Our proposed algorithm, RewardFairUCB, leverages the Upper Confidence Bound (UCB) technique to achieve sublinear regret bounds for both fairness and social welfare. The fairness regret measures the positive difference between the minimum reward guarantee and the expected reward of a given policy, whereas the social welfare regret measures the difference between the social welfare of the optimal fair policy and that of the given policy. We show that RewardFairUCB algorithm achieves instance-independent social welfare regret guarantees of $\tilde{O}(T^{1/2})$ and a fairness regret upper bound of $\tilde{O}(T^{3/4})$. We also give the lower bound of $\Omega(\sqrt{T})$ for both social welfare and fairness regret. We evaluate RewardFairUCB's performance against various baseline and heuristic algorithms using simulated data and real world data, highlighting trade-offs between fairness and social welfare regrets.
♻ ☆ Koopman-Hopf Hamilton-Jacobi Reachability and Control
The Hopf formula for Hamilton-Jacobi reachability (HJR) analysis has been proposed to solve high-dimensional differential games, producing the set of initial states and corresponding controller required to reach (or avoid) a target despite bounded disturbances. As a space-parallelizable method, the Hopf formula avoids the curse of dimensionality that afflicts standard dynamic-programming HJR, but is restricted to linear time-varying systems. To compute reachable sets for high-dimensional nonlinear systems, we pair the Hopf solution with Koopman theory for global linearization. By first lifting a nonlinear system to a linear space and then solving the Hopf formula, approximate reachable sets can be efficiently computed that are much more accurate than local linearizations. Furthermore, we construct a Koopman-Hopf disturbance-rejecting controller, and test its ability to drive a 10-dimensional nonlinear glycolysis model. We find that it significantly out-competes expectation-minimizing and game-theoretic model predictive controllers with the same Koopman linearization in the presence of bounded stochastic disturbance. In summary, we demonstrate a dimension-robust method to approximately solve HJR, allowing novel application to analyze and control high-dimensional, nonlinear systems with disturbance. An open-source toolbox in Julia is introduced for both Hopf and Koopman-Hopf reachability and control.
♻ ☆ State-Augmented Linear Games with Antagonistic Error for High-Dimensional, Nonlinear Hamilton-Jacobi Reachability
Hamilton-Jacobi Reachability (HJR) is a popular method for analyzing the liveness and safety of a dynamical system with bounded control and disturbance. The corresponding HJ value function offers a robust controller and characterizes the reachable sets, but is traditionally solved with Dynamic Programming (DP) and limited to systems of dimension less than six. Recently, the space-parallelizeable, generalized Hopf formula has been shown to also solve the HJ value with a nearly three-log increase in dimension limit, but is limited to linear systems. To extend this potential, we demonstrate how state-augmented (SA) spaces, which are well-known for their improved linearization accuracy, may be used to solve tighter, conservative approximations of the value function with any linear model in this SA space. Namely, we show that with a representation of the true dynamics in the SA space, a series of inequalities confirms that the value of a SA linear game with antagonistic error is a conservative envelope of the true value function. It follows that if the optimal controller for the HJ SA linear game with error may succeed, it will also succeed in the true system. Unlike previous methods, this result offers the ability to safely approximate reachable sets and their corresponding controllers with the Hopf formula in a non-convex manner. Finally, we demonstrate this in the slow manifold system for clarity, and in the controlled Van der Pol system with different lifting functions.
♻ ☆ Prediction of the Conditional Probability Densities of Time Interval Extrema with Application to Risk-Sensitive Scheduling
Planning and scheduling activities in the electrical power system, such as the commitment of reserve generation, often involve the statistical characterization of peak demand. Due to the stationarity assumption of classical extreme value analysis (EVA), existing approaches in the industry apply EVA on simulated annual peaks created by weather-dependent surrogate models using Monte-Carlo simulations on a per-scenario basis. In day-ahead scheduling, the daily peak demand changes upon various factors besides temperature, Monte-Carlo experiments become intractable, and state-of-the-art generalized additive model for location, scale and shape (GAMLSS)-based nonstationary EVA is often impractical due to convergence issues on high-dimensional covariates. This article explores uncharted territories and proposes a novel nonstationary EVA estimator that predicts the probable peaks of high-resolution time intervals and their corresponding conditional probability densities based on calendar information and weather conditions where historical peaks are observed. Compared to GAMLSS, our method automatically discovers and robustly models complex relationships between the covariate and the peak demand density. We present a case study on the determination of day-ahead scheduling capacity and demonstrate that compared to the industry approach, our approach results in a 38% reduction in the yearly total committed capacity while maintaining the given risk requirement.
comment: Resolved incorrect upload for figure 13b
Optimization and Control 28
☆ Online Feedback Optimization for Monotone Systems without Timescale Separation
Online Feedback Optimization (OFO) steers a dynamical plant to a cost-efficient steady-state, only relying on input-output sensitivity information, rather than on a full plant model. Unlike traditional feedforward approaches, OFO leverages real-time measurements from the plant, thereby inheriting the robustness and adaptability of feedback control. Unfortunately, existing theoretical guarantees for OFO assumes that the controller operates on a slower timescale than the plant, which can affect responsiveness and transient performance. In this paper, we focus on relaxing this ``timescale separation'' assumption. Specifically, we consider the class of monotone systems, and we prove that OFO can achieve an optimal operating point, regardless of the time constants of controller and plant. By leveraging a small gain theorem for monotone systems, we derive several sufficient conditions for global convergence. Notably, these conditions depend only on the steady-state behavior of the plant, and they are entirely independent of the transient dynamics.
☆ A Dynamic Strategic Plan for Transition to Campus-Scale Clean Electricity Using Multi-Stage Stochastic Programming
The transition to clean energy systems at large-scale campuses is a critical step toward achieving global decarbonization goals. However, this transition poses significant challenges, including substantial capital requirements, technological uncertainties, and the operational complexities of integrating renewable energy technologies. This study presents a dynamic strategic planning framework for campus-scale clean electricity transitions, utilizing a multi-stage stochastic programming model that jointly optimizes technology deployment, storage operations, and grid interactions. The model enables adaptive decision-making by incorporating uncertainties in technological cost trajectories and efficiency improvements, providing a tool for large-scale electricity consumers aiming to achieve clean and self-sufficient electricity systems. The framework is applied to a case study of Middle East Technical University, which has committed to achieving carbon-neutral electricity by 2040. By integrating solar photovoltaic, wind, and lithium-ion battery technologies, the model generates annual investment and operational strategies aligned with high-resolution temporal demand and generation profiles. The results demonstrate that uncertainty-aware, temporally detailed planning can significantly enhance the economic viability and operational feasibility of campus-scale clean energy transitions.
comment: 40 pages, 21 figures
☆ On the Convergence Rates of Iterative Regularization Algorithms for Composite Bi-Level Optimization
This paper investigates iterative methods for solving bi-level optimization problems where both inner and outer functions have a composite structure. We provide novel theoretical results, including the first convergence rate analysis for the Iteratively REgularized Proximal Gradient (IRE-PG) method, a variant of Solodov's algorithm. Our results establish simultaneous convergence rates for the inner and outer functions, highlighting the inherent trade-offs between their respective convergence rates. We further extend this analysis to an accelerated version of IRE-PG, proving faster convergence rates under specific settings. Additionally, we propose a new scheme for handling cases where these methods cannot be directly applied to the bi-level problem due to the difficulty of computing the associated proximal operator. This scheme offers surrogate functions to approximate the original problem and a framework to translate convergence rates between the surrogate and original functions. Our results show that the accelerated method's advantage diminishes under this translation.
☆ Output Regulation for Impedance Passive Systems with Input Saturation
We consider output tracking and disturbance rejection for abstract infinite-dimensional systems with input saturation. We solve the control problem using a control law which consists of a stabilising error feedback term and a feedforward term based on the reference and disturbance signals. We use our results to design control laws for output tracking and disturbance rejection for a two-dimensional heat equation and a one-dimensional wave equation with boundary control and observation.
comment: 25 pages, 3 figures. Submitted
☆ Train Unit Scheduling Optimization Considering Unit Ordering
The train unit scheduling problem (TUSP) is an important part of the scheduling process for passenger railway operators. Currently, scholars in various countries have proposed a variety of optimization models based on specific local railway situations and scheduling needs. This research investigates the train unit scheduling problem in the UK. We propose an Enhanced Train Unit Scheduling Problem with Unit Ordering based on existing integer multicommodity flow models. We innovatively introduce unit ordering variables representing the order in which train units are coupled for serving the same trip as well as train direction parameters so that our model can provide unit order information and avoid unit blockage in stations. We present experimental results based on three different sizes of artificial data, as well as real-world data based on the Trans Pennine Express' Anglo-Scottish route. The experimental results showed that our model is able to provide the ordering information corresponding to each train unit and prevents the blockage in the station.
☆ Making Non-Negative Polynomials into Sums of Squares
We investigate linear operators $A:\mathbb{R}[x_1,\dots,x_n]\to\mathbb{R}[x_1,\dots,x_n]$. We give explicit operators $A$ such that, for fixed $d\in\mathbb{N}_0$ and closed $K\subseteq\mathbb{R}^n$, $e^A\mathrm{Pos}(K)_{\leq 2d}\subseteq\sum\mathbb{R}[x_1,\dots,x_n]_{\leq d}^2$. We give an explicit operator $A$ such that $e^A\mathrm{Pos}(\mathbb{R}^n)\subseteq\sum\mathbb{R}[x_1,\dots,x_n]^2$. For $K\subseteq\mathbb{R}^n$, we give a condition such that $A$ exists with $e^A\mathrm{Pos}(K)\subseteq\sum\mathbb{R}[x_1,\dots,x_n]^2$. In the framework of regular Fr\'echet Lie groups and Lie algebras we investigate the linear operators $A$ such that $e^{tA}:\mathbb{R}[x_1,\dots,x_n]\to\mathbb{R}[x_1,\dots,x_n]$ is well-defined for all $t\in\mathbb{R}$. We give a three-line-proof of Stochel's Theorem.
☆ Optimal Online Bookmaking for Any Number of Outcomes
We study the Online Bookmaking problem, where a bookmaker dynamically updates betting odds on the possible outcomes of an event. In each betting round, the bookmaker can adjust the odds based on the cumulative betting behavior of gamblers, aiming to maximize profit while mitigating potential loss. We show that for any event and any number of betting rounds, in a worst-case setting over all possible gamblers and outcome realizations, the bookmaker's optimal loss is the largest root of a simple polynomial. Our solution shows that bookmakers can be as fair as desired while avoiding financial risk, and the explicit characterization reveals an intriguing relation between the bookmaker's regret and Hermite polynomials. We develop an efficient algorithm that computes the optimal bookmaking strategy: when facing an optimal gambler, the algorithm achieves the optimal loss, and in rounds where the gambler is suboptimal, it reduces the achieved loss to the optimal opportunistic loss, a notion that is related to subgame perfect Nash equilibrium. The key technical contribution to achieve these results is an explicit characterization of the Bellman-Pareto frontier, which unifies the dynamic programming updates for Bellman's value function with the multi-criteria optimization framework of the Pareto frontier in the context of vector repeated games.
comment: Accepted for presentation at the Conference on Learning Theory (COLT) 2025
☆ Solving Zero-Sum Convex Markov Games ICML 2025
We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.
comment: To appear in the Proceedings of the 2025 International Conference on Machine Learning (ICML 2025)
☆ Infinite horizon discounted LQ optimal control problems for mean-field switching diffusions
This paper investigates an infinite horizon discounted linear-quadratic (LQ) optimal control problem for stochastic differential equations (SDEs) incorporating regime switching and mean-field interactions. The regime switching is modeled by a finite-state Markov chain acting as common noise, while the mean-field interactions are characterized by the conditional expectation of the state process given the history of the Markov chain. To address system stability in the infinite horizon setting, a discounted factor is introduced. Within this framework, the well-posedness of the state equation and adjoint equation -- formulated as infinite horizon mean-field forward and backward SDEs with Markov chains, respectively -- is established, along with the asymptotic behavior of their solutions as time approaches infinity. A candidate optimal feedback control law is formally derived based on two algebraic Riccati equations (AREs), which are introduced for the first time in this context. The solvability of these AREs is proven through an approximation scheme involving a sequence of Lyapunov equations, and the optimality of the proposed feedback control law is rigorously verified using the completion of squares method. Finally, numerical experiments are conducted to validate the theoretical findings, including solutions to the AREs, the optimal control process, and the corresponding optimal (conditional) state trajectory. This work provides a comprehensive framework for solving infinite horizon discounted LQ optimal control problems in the presence of regime switching and mean-field interactions, offering both theoretical insights and practical computational tools.
☆ A Scalable Factorization Approach for High-Order Structured Tensor Recovery
Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal processing, machine learning, and quantum physics. A computationally and memory-efficient approach to these problems is to optimize directly over the factors using local search algorithms such as gradient descent, a strategy known as the factorization approach in matrix and tensor optimization. However, the resulting optimization problems are highly nonconvex due to the multiplicative interactions between factors, posing significant challenges for convergence analysis and recovery guarantees. In this paper, we present a unified framework for the factorization approach to solving various tensor decomposition problems. Specifically, by leveraging the canonical form of tensor decompositions--where most factors are constrained to be orthonormal to mitigate scaling ambiguity--we apply Riemannian gradient descent (RGD) to optimize these orthonormal factors on the Stiefel manifold. Under a mild condition on the loss function, we establish a Riemannian regularity condition for the factorized objective and prove that RGD converges to the ground-truth tensor at a linear rate when properly initialized. Notably, both the initialization requirement and the convergence rate scale polynomially rather than exponentially with $N$, improving upon existing results for Tucker and tensor-train format tensors.
☆ Tikhonov regularized second-order dynamical systems with Hessian-driven damping for solving convex optimization problems
This paper deals with a Tikhonov regularized second-order dynamical system that incorporates time scaling, asymptotically vanishing damping and Hessian-driven damping for solving convex optimization problems. Under appropriate setting of the parameters, we first obtain fast convergence results of the function value along the trajectory generated by the dynamical system. Then, we show that the trajectory generated by the dynamical system converges weakly to a minimizer of the convex optimization problem. We also demonstrate that, by properly tuning these parameters, both the fast convergence rate of the function value and the strong convergence of the trajectory towards the minimum norm solution of the convex optimization problem can be achieved simultaneously. Finally, we present numerical experiments to illustrate the obtained results.
♻ ☆ Dynamics and Optimal Control of State-Triggered Affine Hybrid Systems
A study of the dynamics and control for linear and affine hybrid systems subjected to either temporally- or spatially-triggered resets is presented. Hybrid trajectories are capable of degeneracies not found in continuous-time systems namely beating, blocking, and Zeno. These pathologies are commonly avoided by enforcing a lower bound on the time between events. While this constraint is straightforward to implement for temporally-triggered resets, it is impossible to do so for spatially-triggered systems. In particular, linear spatially-triggered hybrid systems always posses trajectories that are beating and blocking while affine systems may also include Zeno trajectories. The hybrid Pontryagin maximum principle is studied in the context of affine hybrid systems. The existence/uniqueness of the induced co-state jump conditions is studied which introduces the notion of strongly and weakly actuated resets. Finally, optimal control in the context of beating and Zeno is discussed. This work concludes with numerical examples.
comment: 30 pages, 8 figures. Comments welcome!
♻ ☆ Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts ICLR 2024
Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
comment: ICLR 2024 Spotlight
♻ ☆ Harmonizing Safety and Speed: A Human-Algorithm Approach to Enhance the FDA's Medical Device Clearance Policy
The United States Food and Drug Administration's (FDA's) Premarket Notification 510(k) pathway allows manufacturers to gain approval for a medical device by demonstrating its substantial equivalence to another legally marketed device. However, the inherent ambiguity of this regulatory procedure has led to high recall rates for many devices cleared through this pathway. This trend has raised significant concerns regarding the efficacy of the FDA's current approach, prompting a reassessment of the 510(k) regulatory framework. In this paper, we develop a combined human-algorithm approach to assist the FDA in improving its 510(k) medical device clearance process by reducing the risk of recalls and the workload imposed on the FDA. We first develop machine learning methods to estimate the risk of recall of 510(k) medical devices based on the information available at submission time. We then propose a data-driven clearance policy that recommends acceptance, rejection, or deferral to FDA's committees for in-depth evaluation. We conduct an empirical study using a unique large-scale dataset of over 31,000 medical devices that we assembled based on data sources from the FDA and Centers for Medicare and Medicaid Service (CMS). A conservative evaluation of our proposed policy based on this data shows a 32.9% improvement in the recall rate and a 40.5% reduction in the FDA's workload. Our analyses also indicate that implementing our policy could result in significant annual cost savings of $1.7 billion, which highlights the value of using a holistic and data-driven approach to improve the FDA's current 510(k) medical device evaluation pathway.
♻ ☆ Kantorovich duality for optimal transport on completely regular Hausdorff spaces
We introduce a new intermediate optimization problem situated between Kantorovich's primal and dual formulations. This new problem extends Kantorovich's duality to separable Baire measures, which are strictly more general than tight (or Radon) measures in completely regular Hausdorff spaces. In the special case where the measures are Radon, our intermediate problem aligns with the classical Kantorovich's primal problem. Existence of solutions for all three formulations are also provided within this comprehensive framework.
♻ ☆ On the Robustness of Decision-Focused Learning
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted. DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives. DFL has shown a lot of promise and holds the capacity to revolutionize decision-making in many real-world applications. However, very little is known about the performance of these models under adversarial attacks. We adopt ten unique DFL methods and benchmark their performance under two distinctly focused attacks adapted towards the Predict-then-Optimize problem setting. Our study proposes the hypothesis that the robustness of a model is highly correlated with its ability to find predictions that lead to optimal decisions without deviating from the ground-truth label. Furthermore, we provide insight into how to target the models that violate this condition and show how these models respond differently depending on the achieved optimality at the end of their training cycles.
comment: 17 pages, 45 figures
♻ ☆ Optimal alignment of Lorentz orientation and generalization to matrix Lie groups
There exist elegant methods of aligning point clouds in $\mathbb R^3$. Unfortunately, these methods rely on the positive definite property of the Euclidean metric, and do not easily extend to the indefinite Minkowski metric. In this paper, we propose two solutions to the following problem: given inertial reference frames $A$ and $B$, and given (possibly noisy) measurements of a set of 4-vectors $\{v_i\}$ made in those reference frames with components $\{v_{A,i}\}$ and $\{v_{B,i}\}$, find the optimal Lorentz transformation $\Lambda$ such that $\Lambda v_{A,i}=v_{B,i}$. The method we outline is conceptually simple and easily extends to alignment problems in other matrix Lie groups.
comment: 8 pages, 2 figures
♻ ☆ Optimality conditions and subdifferential calculus for infinite sums of functions
The paper extends the widely used in optimisation theory decoupling techniques to infinite collections of functions. Extended concepts of uniform lower semicontinuity and firm uniform lower semicontinuity are discussed. The main theorems give fuzzy subdifferential necessary conditions (multiplier rules) for a local minimum of the sum of an infinite collection of functions and fuzzy subdifferential sum rules without the traditional Lipschitz continuity assumptions. More subtle "quasi" versions of the uniform infimum and uniform lower semicontinuity properties are also discussed.
comment: 27 pages The last section has been removed. It will be submitted as a separate paper
♻ ☆ Online Learning of Interaction Dynamics with Dual Model Predictive Control for Multi-Agent Systems Using Gaussian Processes
The control of a single agent in complex and uncertain multi-agent environments requires careful consideration of the interactions between the agents. In this context, this paper proposes a dual model predictive control (MPC) method using Gaussian process (GP) models for multi-agent systems. While Gaussian process MPC (GP-MPC) has been shown to be effective in predicting the dynamics of other agents, current methods do not consider the influence of the control input on the covariance of the predictions, and hence lack the dual control effect. Therefore, we propose a dual MPC that directly optimizes the actions of the ego agent, and the belief of the other agents by jointly optimizing their state trajectories as well as the associated covariance while considering their interactions through a GP. We demonstrate our GP-MPC method in a simulation study on autonomous driving, showing improved prediction quality compared to a baseline stochastic MPC. The results show that GP-MPC can learn the interactions between the agents online, demonstrating the potential of GPs for dual MPC in uncertain and unseen scenarios.
comment: 6 pages, 2 figures. Accepted for American Control Conference 2025
♻ ☆ A Deterministic and Linear Model of Dynamic Optimization
We introduce a model of infinite horizon linear dynamic optimization and obtain results concerning existence of solution and satisfaction of the competitive condition and transversality condition being unconditionally sufficient for optimality of a trajectory. We also show that under some mild restrictions the optimal trajectory satisfies the Euler condition and a related transversality condition. The optimal trajectory satisfies the functional equation of dynamic programming. Under an additional convexity assumption for the two-period constraint sets, we show that the optimal value function is concave and continuous. Linearity bites when it comes to the definition of optimal decision rules which can no longer be guaranteed to be single-valued. We show that if all the two-period constraint sets are convex, then the optimal decision rule is an upper semi-continuous correspondence. For linear cake-eating problems, we obtain monotonicity results for the optimal value function and a conditional monotonicity result for optimal decision rules. We also introduce the concept of a two-phase linear cake eating problem and obtain a necessary condition that must be satisfied by all solutions of such problems. We show that for a class of linear dynamic optimization problems, known as interlinked linear dynamic optimization problems, a slightly modified version of the functional equation of dynamic programming is satisfied.
comment: 24 pages; JEL Classification Codes: C44, C61; the latest version has undergone substantial editing (not to be confused with "peer review")
♻ ☆ Symmetry breaking for local minimizers of a free discontinuity problem
We study a functional defined on the class of piecewise constant functions, combining a jump penalization, which discourages discontinuities, with a fidelity term that penalizes deviations from a given linear function, called the forcing term. In one dimension, it is not difficult to see that local minimizers form staircases that approximate the forcing term. Here we show that in two dimensions symmetry breaking occurs, leading to the emergence of exotic minimizers whose level sets are not simple stripes with boundaries orthogonal to the gradient of the forcing term. The proof relies on a suitable adaptation of the calibration method for free discontinuity problems; as a side benefit, our version requires less regularity than the classical one.
comment: 30 pages, 2 figures. v2 contains a new version of the calibration method, which requires less regularity
♻ ☆ Continuous-time persuasion by filtering
We frame dynamic persuasion in a partial observation stochastic control Leader-Follower game with an ergodic criterion. The Receiver controls the dynamics of a multidimensional unobserved state process. Information is provided to the Receiver through a device designed by the Sender that generates the observation process. The commitment of the Sender is enforced. We develop this approach in the case where all dynamics are linear and the preferences of the Receiver are linear-quadratic. We prove a verification theorem for the existence and uniqueness of the solution of the HJB equation satisfied by the Receiver's value function. An extension to the case of persuasion of a mean field of interacting Receivers is also provided. We illustrate this approach in two applications: the provision of information to electricity consumers with a smart meter designed by an electricity producer; the information provided by carbon footprint accounting rules to companies engaged in a best-in-class emissions reduction effort. In the first application, we link the benefits of information provision to the mispricing of electricity production. In the latter, we show that information provision is a substitute for best-in-class objective.
comment: 2 figures
♻ ☆ Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments
The rapid detection of abnormal body temperatures in urban populations is essential for managing public health risks, especially during outbreaks of infectious diseases. Multi-drone thermal screening systems offer promising solutions for fast, large-scale, and non-intrusive human temperature monitoring. However, trajectory planning for multiple drones in complex urban environments poses significant challenges, including collision avoidance, coverage efficiency, and constrained flight environments. In this study, we propose an enhanced trust region sequential convex optimization (TR-SCO) algorithm for optimal trajectory planning of multiple drones performing thermal screening tasks. Our improved algorithm integrates a refined convex optimization formulation within a trust region framework, effectively balancing trajectory smoothness, obstacle avoidance, altitude constraints, and maximum screening coverage. Simulation results demonstrate that our approach significantly improves trajectory optimality and computational efficiency compared to conventional convex optimization methods. This research provides critical insights and practical contributions toward deploying efficient multi-drone systems for real-time thermal screening in urban areas. For reader who are interested in our research, we release our source code at https://github.com/Cherry0302/Enhanced-TR-SCO.
♻ ☆ Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
We consider the problem of asynchronous stochastic optimization, where an optimization algorithm makes updates based on stale stochastic gradients of the objective that are subject to an arbitrary (possibly adversarial) sequence of delays. We present a procedure which, for any given $q \in (0,1]$, transforms any standard stochastic first-order method to an asynchronous method with convergence guarantee depending on the $q$-quantile delay of the sequence. This approach leads to convergence rates of the form $O(\tau_q/qT+\sigma/\sqrt{qT})$ for non-convex and $O(\tau_q^2/(q T)^2+\sigma/\sqrt{qT})$ for convex smooth problems, where $\tau_q$ is the $q$-quantile delay, generalizing and improving on existing results that depend on the average delay. We further show a method that automatically adapts to all quantiles simultaneously, without any prior knowledge of the delays, achieving convergence rates of the form $O(\inf_{q} \tau_q/qT+\sigma/\sqrt{qT})$ for non-convex and $O(\inf_{q} \tau_q^2/(q T)^2+\sigma/\sqrt{qT})$ for convex smooth problems. Our technique is based on asynchronous mini-batching with a careful batch-size selection and filtering of stale gradients.
comment: 22 pages
♻ ☆ Conservative Linear Envelopes for Nonlinear, High-Dimensional, Hamilton-Jacobi Reachability
Hamilton-Jacobi reachability (HJR) provides a value function that encodes the set of states from which a system with bounded control inputs can reach or avoid a target despite any bounded disturbance, and the corresponding robust, optimal control policy. Though powerful, traditional methods for HJR rely on dynamic programming (DP) and suffer from exponential computation growth with respect to state dimension. The recently favored Hopf formula mitigates this ``curse of dimensionality'' by providing an efficient and space-parallelizable approach for solving the reachability problem. However, the Hopf formula can only be applied to linear time-varying systems. To overcome this limitation, we show that the error between a nonlinear system and a linear model can be transformed into an adversarial bounded artificial disturbance. One may then solve the dimension-robust generalized Hopf formula for a linear game with this ``antagonistic error" to perform guaranteed conservative reachability analysis and control synthesis of nonlinear systems; this can be done for problem formulations in which no other HJR method is both computationally feasible and guaranteed. In addition, we offer several technical methods for reducing conservativeness in the analysis. We demonstrate the effectiveness of our results through one illustrative example (the controlled Van der Pol system) that can be compared to standard DP, and one higher-dimensional 15D example (a 5-agent pursuit-evasion game with Dubins cars).
♻ ☆ Koopman-Hopf Hamilton-Jacobi Reachability and Control
The Hopf formula for Hamilton-Jacobi reachability (HJR) analysis has been proposed to solve high-dimensional differential games, producing the set of initial states and corresponding controller required to reach (or avoid) a target despite bounded disturbances. As a space-parallelizable method, the Hopf formula avoids the curse of dimensionality that afflicts standard dynamic-programming HJR, but is restricted to linear time-varying systems. To compute reachable sets for high-dimensional nonlinear systems, we pair the Hopf solution with Koopman theory for global linearization. By first lifting a nonlinear system to a linear space and then solving the Hopf formula, approximate reachable sets can be efficiently computed that are much more accurate than local linearizations. Furthermore, we construct a Koopman-Hopf disturbance-rejecting controller, and test its ability to drive a 10-dimensional nonlinear glycolysis model. We find that it significantly out-competes expectation-minimizing and game-theoretic model predictive controllers with the same Koopman linearization in the presence of bounded stochastic disturbance. In summary, we demonstrate a dimension-robust method to approximately solve HJR, allowing novel application to analyze and control high-dimensional, nonlinear systems with disturbance. An open-source toolbox in Julia is introduced for both Hopf and Koopman-Hopf reachability and control.
♻ ☆ State-Augmented Linear Games with Antagonistic Error for High-Dimensional, Nonlinear Hamilton-Jacobi Reachability
Hamilton-Jacobi Reachability (HJR) is a popular method for analyzing the liveness and safety of a dynamical system with bounded control and disturbance. The corresponding HJ value function offers a robust controller and characterizes the reachable sets, but is traditionally solved with Dynamic Programming (DP) and limited to systems of dimension less than six. Recently, the space-parallelizeable, generalized Hopf formula has been shown to also solve the HJ value with a nearly three-log increase in dimension limit, but is limited to linear systems. To extend this potential, we demonstrate how state-augmented (SA) spaces, which are well-known for their improved linearization accuracy, may be used to solve tighter, conservative approximations of the value function with any linear model in this SA space. Namely, we show that with a representation of the true dynamics in the SA space, a series of inequalities confirms that the value of a SA linear game with antagonistic error is a conservative envelope of the true value function. It follows that if the optimal controller for the HJ SA linear game with error may succeed, it will also succeed in the true system. Unlike previous methods, this result offers the ability to safely approximate reachable sets and their corresponding controllers with the Hopf formula in a non-convex manner. Finally, we demonstrate this in the slow manifold system for clarity, and in the controlled Van der Pol system with different lifting functions.
♻ ☆ Data Envelopment Analysis with Robust and Closest Targets:Integrating Full-Dimensional Efficient Facets for Risk-Resilient Benchmarking
As the external environment become increasingly volatile and unpredictable, the selection of benchmarking targets in data envelopment analysis should account for their ability to consider risks; however, this aspect has not received sufficient attention. We propose a robust benchmarking target defined by the intersection of the maximum number of full-dimensional efficient facets, each representing a unique marginal substitution relationship. These targets can serve as robust projections for decision making units that are lacking prior risk information because they incorporate the maximum number of marginal substitution relationships. This enables decision makers to adjust their production through these relationships, thereby maximizing the likelihood of achieving globally optimal outcomes. Furthermore, we propose a novel, well-defined efficiency measure based on robust and closest targets. Finally, we demonstrate the application of the proposed measure using a dataset comprising 38 universities from China's 985 Project.
Systems and Control 28
☆ Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme
The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world. However, in such a complex environment the selection of optimum trajectory of UAVs is of utmost importance. UAV trajectory optimization deals with finding the shortest path in the minimal possible time. In this paper, a cluster optimization scheme (COS) is proposed using the Henry gas optimization (HGO) metaheuristic algorithm to identify the shortest path having minimal transportation cost and algorithm complexity. The mathematical model is designed for COS using the HGO algorithm and compared with the state-of-the-art metaheuristic algorithms such as particle swarm optimization (PSO), grey wolf optimization (GWO), cuckoo search algorithm (CSA) and barnacles mating optimizer (BMO). In order to prove the robustness of the proposed model, four different scenarios are evaluated that includes ambient environment, constrict environment, tangled environment, and complex environment. In all the aforementioned scenarios, the HGO algorithm outperforms the existing algorithms. Particularly, in the ambient environment, the HGO algorithm achieves a 39.3% reduction in transportation cost and a 16.8% reduction in computational time as compared to the PSO algorithm. Hence, the HGO algorithm can be used for autonomous trajectory optimization of UAVs in smart cities.
comment: 12 pages, 9 figuers
☆ Pieceformer: Similarity-Driven Knowledge Transfer via Scalable Graph Transformer in VLSI
Accurate graph similarity is critical for knowledge transfer in VLSI design, enabling the reuse of prior solutions to reduce engineering effort and turnaround time. We propose Pieceformer, a scalable, self-supervised similarity assessment framework, equipped with a hybrid message-passing and graph transformer encoder. To address transformer scalability, we incorporate a linear transformer backbone and introduce a partitioned training pipeline for efficient memory and parallelism management. Evaluations on synthetic and real-world CircuitNet datasets show that Pieceformer reduces mean absolute error (MAE) by 24.9% over the baseline and is the only method to correctly cluster all real-world design groups. We further demonstrate the practical usage of our model through a case study on a partitioning task, achieving up to 89% runtime reduction. These results validate the framework's effectiveness for scalable, unbiased design reuse in modern VLSI systems.
comment: 7 pages, 4 figures, 1 table, submitted
☆ Advancing Autonomous Racing: A Comprehensive Survey of the RoboRacer (F1TENTH) Platform
The RoboRacer (F1TENTH) platform has emerged as a leading testbed for advancing autonomous driving research, offering a scalable, cost-effective, and community-driven environment for experimentation. This paper presents a comprehensive survey of the platform, analyzing its modular hardware and software architecture, diverse research applications, and role in autonomous systems education. We examine critical aspects such as bridging the simulation-to-reality (Sim2Real) gap, integration with simulation environments, and the availability of standardized datasets and benchmarks. Furthermore, the survey highlights advancements in perception, planning, and control algorithms, as well as insights from global competitions and collaborative research efforts. By consolidating these contributions, this study positions RoboRacer as a versatile framework for accelerating innovation and bridging the gap between theoretical research and real-world deployment. The findings underscore the platform's significance in driving forward developments in autonomous racing and robotics.
☆ A Small-Scale Robot for Autonomous Driving: Design, Challenges, and Best Practices
Small-scale autonomous vehicle platforms provide a cost-effective environment for developing and testing advanced driving systems. However, specific configurations within this scale are underrepresented, limiting full awareness of their potential. This paper focuses on a one-sixth-scale setup, offering a high-level overview of its design, hardware and software integration, and typical challenges encountered during development. We discuss methods for addressing mechanical and electronic issues common to this scale and propose guidelines for improving reliability and performance. By sharing these insights, we aim to expand the utility of small-scale vehicles for testing autonomous driving algorithms and to encourage further research in this domain.
☆ Robust control for multi-legged elongate robots in noisy environments
Modern two and four legged robots exhibit impressive mobility on complex terrain, largely attributed to advancement in learning algorithms. However, these systems often rely on high-bandwidth sensing and onboard computation to perceive/respond to terrain uncertainties. Further, current locomotion strategies typically require extensive robot-specific training, limiting their generalizability across platforms. Building on our prior research connecting robot-environment interaction and communication theory, we develop a new paradigm to construct robust and simply controlled multi-legged elongate robots (MERs) capable of operating effectively in cluttered, unstructured environments. In this framework, each leg-ground contact is thought of as a basic active contact (bac), akin to bits in signal transmission. Reliable locomotion can be achieved in open-loop on "noisy" landscapes via sufficient redundancy in bacs. In such situations, robustness is achieved through passive mechanical responses. We term such processes as those displaying mechanical intelligence (MI) and analogize these processes to forward error correction (FEC) in signal transmission. To augment MI, we develop feedback control schemes, which we refer to as computational intelligence (CI) and such processes analogize automatic repeat request (ARQ) in signal transmission. Integration of these analogies between locomotion and communication theory allow analysis, design, and prediction of embodied intelligence control schemes (integrating MI and CI) in MERs, showing effective and reliable performance (approximately half body lengths per cycle) on complex landscapes with terrain "noise" over twice the robot's height. Our work provides a foundation for systematic development of MER control, paving the way for terrain-agnostic, agile, and resilient robotic systems capable of operating in extreme environments.
☆ A Data-Integrated Framework for Learning Fractional-Order Nonlinear Dynamical Systems
This paper presents a data-integrated framework for learning the dynamics of fractional-order nonlinear systems in both discrete-time and continuous-time settings. The proposed framework consists of two main steps. In the first step, input-output experiments are designed to generate the necessary datasets for learning the system dynamics, including the fractional order, the drift vector field, and the control vector field. In the second step, these datasets, along with the memory-dependent property of fractional-order systems, are used to estimate the system's fractional order. The drift and control vector fields are then reconstructed using orthonormal basis functions. To validate the proposed approach, the algorithm is applied to four benchmark fractional-order systems. The results confirm the effectiveness of the proposed framework in learning the system dynamics accurately. Finally, the same datasets are used to learn equivalent integer-order models. The numerical comparisons demonstrate that fractional-order models better capture long-range dependencies, highlighting the limitations of integer-order representations.
☆ On Exact Solutions to the Linear Bellman Equation
This paper presents sufficient conditions for optimal control of systems with dynamics given by a linear operator, in order to obtain an explicit solution to the Bellman equation that can be calculated in a distributed fashion. Further, the class of Linearly Solvable MDP is reformulated as a continuous-state optimal control problem. It is shown that this class naturally satisfies the conditions for explicit solution of the Bellman equation, motivating the extension of previous results to semilinear dynamics to account for input nonlinearities. The applicability of the given conditions is illustrated in scenarios with linear and quadratic cost, corresponding to the Stochastic Shortest Path and Linear-Quadratic Regulator problems.
comment: Preprint to be published in the Control Systems Letters
☆ DATA-DRIVEN PRONTO: a Model-free Solution for Numerical Optimal Control
This article addresses the problem of data-driven numerical optimal control for unknown nonlinear systems. In our scenario, we suppose to have the possibility of performing multiple experiments (or simulations) on the system. Experiments are performed by relying on a data-driven tracking controller able to steer the system towards a desired reference. Our proposed DATA-DRIVEN PRONTO algorithm iteratively refines a tentative solution of the optimal control problem by computing an approximate descent direction via a local trajectory perturbation. At each iteration, multiple trajectories are gathered by perturbing the current trajectory with a suitable dither signal, and then used to obtain a data-driven, time-varying linearization. The exploration is guided by the tracking controller, so that perturbed trajectories are obtained in closed loop. We show local convergence of DATA-DRIVEN PRONTO to a ball about an isolated optimal solution, whose radius depends on the amplitude of the dither signal. We corroborate the theoretical results by applying it to an underactuated robot.
Model Predictive Path-Following Control for a Quadrotor
Automating drone-assisted processes is a complex task. Many solutions rely on trajectory generation and tracking, whereas in contrast, path-following control is a particularly promising approach, offering an intuitive and natural approach to automate tasks for drones and other vehicles. While different solutions to the path-following problem have been proposed, most of them lack the capability to explicitly handle state and input constraints, are formulated in a conservative two-stage approach, or are only applicable to linear systems. To address these challenges, the paper is built upon a Model Predictive Control-based path-following framework and extends its application to the Crazyflie quadrotor, which is investigated in hardware experiments. A cascaded control structure including an underlying attitude controller is included in the Model Predictive Path-Following Control formulation to meet the challenging real-time demands of quadrotor control. The effectiveness of the proposed method is demonstrated through real-world experiments, representing, to the best of the authors' knowledge, a novel application of this MPC-based path-following approach to the quadrotor. Additionally, as an extension to the original method, to allow for deviations of the path in cases where the precise following of the path might be overly restrictive, a corridor path-following approach is presented.
comment: 15 pages, 11 figures, submitted to PAMM 2025
☆ Multi-dimensional evaluation on a rural integrated energy system including solar, wind, biomass and geothermal energy
This study focuses on the novel municipal-scale rural integrated energy system (RIES), which encompasses energy supply and application. By constructing a seven-dimensional evaluation system including energy efficiency, energy supply, low-carbon sustainability, environmental impact, energy economy, social benefits, and integrated energy system development, this research combines the improved analytic hierarchy process (IAHP) and entropy weight method (EWM) by sum of squares of deviations to balance expert experience and data objectivity. Furthermore, the cloud model is introduced to handle the fuzziness and randomness in the evaluation. This method can quantify the differences in system performance before and after the planning implementation. The results indicate that after planning, the comprehensive score has increased from 83.12 to 87.55, the entropy value has decreased from 6.931 to 5.336, indicating enhanced system stability. The hyper-entropy has dropped from 3.08 to 2.278, reflecting a reduction in uncertainty. The research findings provide a scientific basis for the planning optimization, policy-making, and sustainable development of rural integrated energy systems, possessing both theoretical innovation and practical guiding value.
☆ Disruption of parkinsonian brain oscillations
Deep brain stimulation (DBS) is an advanced surgical treatment for the symptoms of Parkinson's disease (PD), involving electrical stimulation of neurons within the basal ganglia region of the brain. DBS is traditionally delivered in an open-loop manner using fixed stimulation parameters, which may lead to suboptimal results. In an effort to overcome these limitations, closed loop DBS, using pathological subthalamic beta (13--30 Hz) activity as a feedback signal, offers the potential to adapt DBS automatically in response to changes in patient symptoms and side effects. However, clinically implemented closed-loop techniques have been limited to date to simple control algorithms, due to the inherent uncertainties in the dynamics involved. Model-free control, which has already seen successful applications in the field of bioengineering, offers a way to avoid this limitation and provides an alternative method to apply modern control approach to selective suppression of pathological oscillations.
comment: 23rd Internat. Conf. Computational Methods in Systems Biology (CMSB 2025), 10-12 september 2025, Lyon, France
☆ Comparison of Innovative Strategies for the Coverage Problem: Path Planning, Search Optimization, and Applications in Underwater Robotics
In many applications, including underwater robotics, the coverage problem requires an autonomous vehicle to systematically explore a defined area while minimizing redundancy and avoiding obstacles. This paper investigates coverage path planning strategies to enhance the efficiency of underwater gliders, particularly in maximizing the probability of detecting a radioactive source while ensuring safe navigation. We evaluate three path-planning approaches: the Traveling Salesman Problem (TSP), Minimum Spanning Tree (MST), and Optimal Control Problem (OCP). Simulations were conducted in MATLAB, comparing processing time, uncovered areas, path length, and traversal time. Results indicate that OCP is preferable when traversal time is constrained, although it incurs significantly higher computational costs. Conversely, MST-based approaches provide faster but less optimal solutions. These findings offer insights into selecting appropriate algorithms based on mission priorities, balancing efficiency and computational feasibility.
☆ Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation
In this letter, we propose Quantum-Preconditioned Policy Gradient (QPPG), a natural gradient-based algorithm for link adaptation that whitens policy updates using the full inverse quantum Fisher information with Tikhonov regularization. QPPG bridges classical and quantum geometry, achieving stable learning even under noise. Evaluated on classical and quantum environments, including noisy single-qubit Gym tasks and Rayleigh-fading channels, QPPG converges 4 times faster than REINFORCE and sustains a 1 dB gain under uncertainty. It reaches a 90 percent return in one hundred episodes with high noise robustness, showcasing the advantages of full QFI-based preconditioning for scalable quantum reinforcement learning.
comment: 5 pages, 3 figures, submitted to IEEE Communications Letters
☆ Microgrid Operation Control with Adaptable Droop Gains
Modern low-carbon power systems come with many challenges, such as increased inverter penetration and increased uncertainty from renewable sources and loads. In this context, the microgrid concept is a promising approach, which is based on a segmentation of the grid into independent smaller cells that can run either in grid-connected or standalone mode.In microgrids, droop control is widely used for primary control. It enables proportional power sharing, depending on the droop gains. Operation control schemes considering droop control often assume fixed droop gains. However, using adaptive droop gains for grid-forming units allow to shape power sharing in presence of fluctuations, enhancing flexibility while maintaining a safe microgrid operation, particularly under uncertainty. This work introduces a bilinear formulation for microgrid operation control that finds optimal power setpoints and droop gains on a timescale of minutes by solving a finite horizon optimization problem. In detail, a robust minmax model predictive control scheme is designed for a standalone microgrid, comprising a fuel cell, a photovoltaic system and an energy storage. Closed-loop simulations are performed with and without variable droop gains. The results show an increase in renewable utilization of up to 7.5 % while reducing the power output of the fuel cell by 6 %, when allowing variable droop gains.
☆ Islanding Strategy for Smart Grids Oriented to Resilience Enhancement and Its Power Supply Range Optimization
With the increasing prevalence of distributed generators, islanded operation based on distributed generation is considered a vital means to enhance the reliability and resilience of smart grids. This paper investigates the main factors in islanding partition of smart grids and establishes a mathematical model for islanding division. A method to determine the maximum power supply range of distributed energy resources (DERs) based on the reachability matrix and power circle algorithm is proposed to improve computational efficiency. A dynamic programming method based on breadth-first search (BFS) is used to solve the islanding partition scheme, and a region correction method is applied to modify the maximum power supply area by considering controllable loads and prioritizing critical load restoration, thereby enhancing system resilience. Finally, simulation results verify the effectiveness of the proposed algorithm in improving smart grid resilience.
☆ Human Locomotion Implicit Modeling Based Real-Time Gait Phase Estimation
Gait phase estimation based on inertial measurement unit (IMU) signals facilitates precise adaptation of exoskeletons to individual gait variations. However, challenges remain in achieving high accuracy and robustness, particularly during periods of terrain changes. To address this, we develop a gait phase estimation neural network based on implicit modeling of human locomotion, which combines temporal convolution for feature extraction with transformer layers for multi-channel information fusion. A channel-wise masked reconstruction pre-training strategy is proposed, which first treats gait phase state vectors and IMU signals as joint observations of human locomotion, thus enhancing model generalization. Experimental results demonstrate that the proposed method outperforms existing baseline approaches, achieving a gait phase RMSE of $2.729 \pm 1.071%$ and phase rate MAE of $0.037 \pm 0.016%$ under stable terrain conditions with a look-back window of 2 seconds, and a phase RMSE of $3.215 \pm 1.303%$ and rate MAE of $0.050 \pm 0.023%$ under terrain transitions. Hardware validation on a hip exoskeleton further confirms that the algorithm can reliably identify gait cycles and key events, adapting to various continuous motion scenarios. This research paves the way for more intelligent and adaptive exoskeleton systems, enabling safer and more efficient human-robot interaction across diverse real-world environments.
☆ A Force Feedback Exoskeleton for Teleoperation Using Magnetorheological Clutches
This paper proposes an upper-limb exoskeleton teleoperation system based on magnetorheological (MR) clutches, aiming to improve operational accuracy and enhance the immersive experience during lunar sampling tasks. Conventional exoskeleton teleoperation systems commonly employ active force feedback solutions, such as servo motors, which typically suffer from high system complexity and increased energy consumption. Furthermore, force feedback devices utilizing motors and gear reducers generally compromise backdrivability and pose safety risks to operators due to active force output. To address these limitations, we propose a semi-active force feedback strategy based on MR clutches. Dynamic magnetic field control enables precise adjustment of joint stiffness and damping, thereby providing smooth and high-resolution force feedback. The designed MR clutch exhibits outstanding performance across key metrics, achieving a torque-to-mass ratio (TMR) of 93.6 Nm/kg, a torque-to-volume ratio (TVR) of 4.05 x 10^5 Nm/m^3, and a torque-to-power ratio (TPR) of 4.15 Nm/W. Notably, the TMR represents an improvement of approximately 246% over a representative design in prior work. Experimental results validate the system's capability to deliver high-fidelity force feedback. Overall, the proposed system presents a promising solution for deep-space teleoperation with strong potential for real-world deployment in future missions.
☆ Skew-Induced Insertion Loss Deviation (SILD) and FOM_SILD: Metrics for Quantifying P/N Skew Effects in High-Speed Channels
The rise of AI workloads and growing data center demands have driven the need for ultra-high-speed interconnects exceeding 200 Gb/s. As unit intervals (UI) shrink, even a few picoseconds of P/N skew can degrade serializer-deserializer (SerDes) performance. Traditional methods for quantifying skew fall short in capturing its impact. We introduce two new metrics: 1) Skew-Induced Insertion Loss Deviation (SILD) and 2) its complementary Figure of Merit (FOM_SILD), analytically developed to assess P/N skew effects. Measured S-parameters confirm FOM_SILD reciprocity, while simulations of 224G PAM4 SerDes show strong correlation with bit error rate (BER) trends. This approach offers a robust framework for analyzing skew in next-generation ultra-high-speed interconnects.
☆ Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks IROS 2025
This study presents a novel environment-aware reinforcement learning (RL) framework designed to augment the operational capabilities of autonomous underwater vehicles (AUVs) in underwater environments. Departing from traditional RL architectures, the proposed framework integrates an environment-aware network module that dynamically captures flow field data, effectively embedding this critical environmental information into the state space. This integration facilitates real-time environmental adaptation, significantly enhancing the AUV's situational awareness and decision-making capabilities. Furthermore, the framework incorporates AUV structure characteristics into the optimization process, employing a large language model (LLM)-based iterative refinement mechanism that leverages both environmental conditions and training outcomes to optimize task performance. Comprehensive experimental evaluations demonstrate the framework's superior performance, robustness and adaptability.
comment: This paper has been accepted by IROS 2025
♻ ☆ AUTOBargeSim: MATLAB(R) toolbox for the design and analysis of the guidance and control system for autonomous inland vessels
This paper introduces AUTOBargeSim, a simulation toolbox for autonomous inland vessel guidance and control system design. AUTOBargeSim is developed using MATLAB and provides an easy-to-use introduction to various aspects of autonomous inland navigation, including mapping, modelling, control design, and collision avoidance, through examples and extensively documented code. Applying modular design principles in the simulator structure allows it to be easily modified according to the user's requirements. Furthermore, a GUI interface facilitates a simple and quick execution. Key performance indices for evaluating the performance of the controller and collision avoidance method in confined space are also provided. The current version of AUTOBargeSim attempts to improve reproducibility in the design and simulation of marine systems while serving as a foundation for simulating and evaluating vessel behaviour considering operational, system, and environmental constraints.
comment: This work has been accepted for publication in IFAC CAMS 2025
♻ ☆ Wasserstein-Barycenter Consensus for Cooperative Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) demands principled mechanisms to align heterogeneous policies while preserving the capacity for specialized behavior. We introduce a novel consensus framework that defines the team strategy as the entropic-regularized $p$-Wasserstein barycenter of agents' joint state--action visitation measures. By augmenting each agent's policy objective with a soft penalty proportional to its Sinkhorn divergence from this barycenter, the proposed approach encourages coherent group behavior without enforcing rigid parameter sharing. We derive an algorithm that alternates between Sinkhorn-barycenter computation and policy-gradient updates, and we prove that, under standard Lipschitz and compactness assumptions, the maximal pairwise policy discrepancy contracts at a geometric rate. Empirical evaluation on a cooperative navigation case study demonstrates that our OT-barycenter consensus outperforms an independent learners baseline in convergence speed and final coordination success.
♻ ☆ Enforcing contraction via data
We present data-based conditions for enforcing contractivity via feedback control and obtain desired asymptotic properties of the closed-loop system. We focus on unknown nonlinear control systems whose vector fields are expressible via a dictionary of functions and derive data-dependent semidefinite programs whose solution returns the controller that guarantees contractivity. When data are perturbed by disturbances that are linear combinations of sinusoids of known frequencies (but unknown amplitude and phase) and constants, we remarkably obtain conditions for contractivity that do not depend on the magnitude of the disturbances, with imaginable positive consequences for the synthesis of the controller. Finally, we show how to design from data an integral controller for nonlinear systems that achieves constant reference tracking and constant disturbance rejection.
♻ ☆ Modeling, Analysis, and Optimization of Cascaded Power Amplifiers
This paper deals with modeling, analysis, and optimization of power amplifiers (PAs) placed in a cascaded structure, particularly the effect of cascaded nonlinearities is studied by showing potential ways to minimize the total nonlinearities. The nonlinear least-squares algorithm is proposed to optimize the PA parameters along with the input power level, and thereby minimize the total nonlinearities in the cascaded structure. The simulation results demonstrate that the performance of the optimized configurations for up to five PAs using the proposed framework can improve the linearity properties of the overall cascade.
♻ ☆ EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models
High-resolution time series data are crucial for the operation and planning of energy systems such as electrical power systems and heating systems. Such data often cannot be shared due to privacy concerns, necessitating the use of synthetic data. However, high-resolution time series data is difficult to model due to its inherent high dimensionality and complex temporal dependencies. Leveraging the recent development of generative AI, especially diffusion models, we propose EnergyDiff, a universal data generation framework for energy time series data. EnergyDiff builds on state-of-the-art denoising diffusion probabilistic models, utilizing a proposed denoising network dedicated to high-resolution time series data and introducing a novel Marginal Calibration technique. Our extensive experimental results demonstrate that EnergyDiff achieves significant improvement in capturing the temporal dependencies and marginal distributions compared to baselines, particularly at the 1-minute resolution. EnergyDiff's universality is validated across diverse energy domains (e.g., electricity demand, heat pump, PV, multiple time resolutions (1 minute, 15 minutes, 30 minutes and 1 hour), and at both customer and transformer levels.
comment: 15 pages
♻ ☆ A Neural Network Approach to a Modified Quadratic Boost Multiport Resonant Converter for Electric Vehicle Chargers
This topology can achieve a high step-up gain by utilizing a switched capacitor and switched inductor-based VMC network arrangement.Furthermore, the proposed topology can achieve an output gain of approximately three times at a nominal duty ratio with reduced voltage and current stress across the switch, and enhance the maximum efficiency to 96.7
comment: Errors in experimental results. Need to improve
♻ ☆ From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care
In this work, we reflect on the data-driven modeling paradigm that is gaining ground in AI-driven automation of patient care. We argue that the repurposing of existing real-world patient datasets for machine learning may not always represent an optimal approach to model development as it could lead to undesirable outcomes in patient care. We reflect on the history of data analysis to explain how the data-driven paradigm rose to popularity, and we envision ways in which systems thinking and clinical domain theory could complement the existing model development approaches in reaching human-centric outcomes. We call for a purpose-driven machine learning paradigm that is grounded in clinical theory and the sociotechnical realities of real-world operational contexts. We argue that understanding the utility of existing patient datasets requires looking in two directions: upstream towards the data generation, and downstream towards the automation objectives. This purpose-driven perspective to AI system development opens up new methodological opportunities and holds promise for AI automation of patient care.
comment: The work is under review at ACM Health
♻ ☆ Integrating Cascade Pumped Micro-Hydro Storage: A Sustainable Approach to Energy and Water Management
As traditional large hydropower has been extensively exploited, micro-hydro systems have caught research increasing interest. New engineering challenges arise in developing micro-hydro systems in areas with significant elevation but prohibitive horizontal distances between primary reservoirs. This study addresses these challenges by proposing a cascade-pumped micro-hydro storage (CPMHS) system that leverages intermediate reservoirs to bridge long horizontal distances, enabling efficient energy transfer and storage. The methodology utilizes naturally occurring lakes with substantial head heights but limited feasibility for direct pumped storage due to horizontal separations. Integrating smaller, strategically placed intermediate reservoirs maximizes energy capture along the cascading path, making pumped storage viable in geographically constrained locations. The proposed system will enhance energy generation potential and provide additional benefits for water management. Using geographical data and a detailed case study focused on Mountain Lake and surrounding lakes, this paper demonstrates the energy efficiency and viability of cascade-based micro-hydro storage. A practical methodology for implementing CPMHS systems is proposed and validated by case studies. An optimization framework is developed for efficient energy capture in regions with challenging topography.
♻ ☆ Joint Observer Gain and Input Design for Asymptotic Active Fault Diagnosis
This paper proposes a joint gain and input design method for observer-based asymptotic active fault diagnosis, which is based on a newly-defined notion named the excluding degree of the origin from a zonotope. Using the excluding degree, a quantitative specification is obtained to characterize the performance of set-based robust fault diagnosis. Furthermore, a single gain design method and a joint gain and input design method are proposed, respectively. This is the first work to achieve a joint observer gain and input design for set-based active fault diagnosis. Compared with the existing methods that design gains and input separately, the proposed joint gain and input design method has advantages to exploit the fault diagnosis potential of observer-based schemes. Finally, several examples are used to illustrate the effectiveness of the proposed methods.
comment: Accepted by Automatica as Regular Paper
Optimization and Control 42
☆ Modeling society with a responsible elite
Within the framework of the ViSE (Voting in a Stochastic Environment) model, we examine the dynamics in a society, part of which can be considered an elite. The model allows us to analyze the influence of social attitudes, such as collectivism, individualism, altruism on the well-being of agents. The dynamics is determined by collective decisions and changes in the structure of society, in particular, by the formation of groups of cooperating agents. It is found that the presence of a "responsible elite", combining the support of other agents with limited concern for their own benefit, stabilizes society and eliminates the "pit of losses" paradox. The benefit to society from having a responsible elite is comparable to that from having a prosocial group of the same size. If the elite radically increases the weight of the group component in its combined voting strategy, then its incomes rise sharply, while society's incomes decline. If, in response to the selfish transformation of the elite, a new responsible elite emerges, proportionally larger than the previous one, then society will stabilize again, and the old elite will lose its dominant position. This process can be repeated as long as the size of society allows the formation of new responsible elites of the required size.
comment: 24 pages, 19 figures, in Russian
☆ Ahlfors-regularity for minimizers of a multiphase optimal design problem
We establish an Alhfors-regularity result for minimizers of a multiphase optimal design problem. It is a variant of the classical variational problem which involves a finite number of chambers $\mathcal{E}(i)$ of prescribed volume that partition a given domain $\Omega\subset\mathbb{R}^n$. The cost functional associated with a configuration $\left(\{\mathcal{E}(i)\}_i,u\right)$ is made up of the perimeter of the partition interfaces and a Dirichlet energy term, which is discontinuous across the interfaces. We prove that the union of the optimal interfaces is $(n-1)$-Alhfors-regular via a penalization method and decay estimates of the energy.
☆ Heavy Ball and Nesterov Accelerations with Hessian-driven Damping for Nonconvex Optimization
In this work, we investigate a second-order dynamical system with Hessian-driven damping tailored for a class of nonconvex functions called strongly quasiconvex. Buil\-ding upon this continuous-time model, we derive two discrete-time gra\-dient-based algorithms through time discretizations. The first is a Heavy Ball method with Hessian correction, incorporating cur\-va\-tu\-re-dependent terms that arise from discretizing the Hessian damping component. The second is a Nesterov-type accelerated method with adaptive momentum, fea\-tu\-ring correction terms that account for local curvature. Both algorithms aim to enhance stability and convergence performance, particularly by mi\-ti\-ga\-ting oscillations commonly observed in cla\-ssi\-cal momentum me\-thods. Furthermore, in both cases we establish li\-near convergence to the optimal solution for the iterates and functions values. Our approach highlights the rich interplay between continuous-time dynamics and discrete optimization algorithms in the se\-tting of strongly quasiconvex objectives. Numerical experiments are presented to support obtained results.
☆ Primal-Dual Coordinate Descent for Nonconvex-Nonconcave Saddle Point Problems Under the Weak MVI Assumption
We introduce two novel primal-dual algorithms for addressing nonconvex, nonconcave, and nonsmooth saddle point problems characterized by the weak Minty Variational Inequality (MVI). The first algorithm, Nonconvex-Nonconcave Primal-Dual Hybrid Gradient (NC-PDHG), extends the well-known Primal-Dual Hybrid Gradient (PDHG) method to this challenging problem class. The second algorithm, Nonconvex-Nonconcave Stochastic Primal-Dual Hybrid Gradient (NC-SPDHG), incorporates a randomly extrapolated primal-dual coordinate descent approach, extending the Stochastic Primal-Dual Hybrid Gradient (SPDHG) algorithm. To our knowledge, designing a coordinate-based algorithm to solve nonconvex-nonconcave saddle point problems is unprecedented, and proving its convergence posed significant difficulties. This challenge motivated us to utilize PEPit, a Python-based tool for computer-assisted worst-case analysis of first-order optimization methods. By integrating PEPit with automated Lyapunov function techniques, we successfully derived the NC-SPDHG algorithm. Both methods are effective under a mild condition on the weak MVI parameter, achieving convergence with constant step sizes that adapt to the structure of the problem. Numerical experiments on logistic regression with squared loss and perceptron-regression problems validate our theoretical findings and show their efficiency compared to existing state-of-the-art algorithms, where linear convergence is observed. Additionally, we conduct a convex-concave least-squares experiment to show that NC-SPDHG performs competitively with SAGA, a leading algorithm in the smooth convex setting.
☆ Multi-Agent, Multi-Scale Systems with the Koopman Operator
The Koopman Operator (KO) takes nonlinear state dynamics and ``lifts'' those dynamics to an infinite-dimensional functional space of observables in which those dynamics are linear. Computational applications typically use a finite-dimensional approximation to the KO. The KO can also be applied to controlled dynamical systems, and the linearity of the KO then facilitates analysis and control calculations. In principle, the potential benefits provided by the KO, and the way that it facilitates the use of game theory via its linearity, would suggest it as a powerful approach for dealing with multi-agent control problems. In practice, though, there has not been much work in this space: most multi-agent KO work has treated those agents as different components of a single system rather than as distinct decision-making entities. This paper develops a KO formulation for multi-agent systems that structures the interactions between decision-making agents and extends this formulation to systems in which the agents have hierarchical control structures and time scale separated dynamics. We solve the multi-agent control problem in both cases as both a centralized optimization and as a general-sum game theory problem. The comparison of the two sets of optimality conditions defining the control solutions illustrates how coupling between agents can create differences between the social optimum and the Nash equilibrium.
☆ Time Scale Separation and Hierarchical Control with the Koopman Operator
The Koopman Operator (KO) is a mathematical construct that maps nonlinear (state space) dynamics to corresponding linear dynamics in an infinite-dimensional functional space. For practical applications, finite-dimensional approximations can be constructed with machine learning. The linearity of the KO facilitates the use of linear tools and theories on nonlinear dynamical systems without relying on local approximations. Additionally, the KO has the ability to incorporate forms of domain knowledge such as stability and known variable interactions. It therefore constitutes part of the broader interest in domain-aware or physics-informed machine learning. Hierarchical control and time scale separation are two common properties in engineered systems -- often found together -- that can pose both computational and practical challenges. These properties provide opportunities for further KO-based domain knowledge incorporation, but this has seldom been taken advantage of in the KO literature. This paper focuses on developing and using domain-aware Koopman formulations for systems with hierarchical control, systems with time scale separation, and systems with both. We show how those formulations can be leveraged to quantify the impact of cross-scale interactions on overall stability and to calculate optimal control policies at both fast and slow time scales.
☆ Long run control of nonhomogeneous Markov processes
In the paper average reward per unit time and average risk sensitive reward functionals are considered for controlled nonhomogeneous Markov processes. Existence of solutions to suitable Bellman equations is shown. Continuity of the value functions with respect to risk parameter is also proved. Finally stability of functionals with respect to pointwise convergence of Markov controls is studied.
☆ A Simplified Analysis of SGD for Linear Regression with Weight Averaging
Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides sharp rates for SGD optimization in linear regression using constant learning rate, both with and without tail iterate averaging, based on a bias-variance decomposition of the risk. In our work, we provide a simplified analysis recovering the same bias and variance bounds provided in~\citep{zou2021benign} based on simple linear algebra tools, bypassing the requirement to manipulate operators on positive semi-definite (PSD) matrices. We believe our work makes the analysis of SGD on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling, leading to improvements in the training of realistic models.
☆ On Exact Solutions to the Linear Bellman Equation
This paper presents sufficient conditions for optimal control of systems with dynamics given by a linear operator, in order to obtain an explicit solution to the Bellman equation that can be calculated in a distributed fashion. Further, the class of Linearly Solvable MDP is reformulated as a continuous-state optimal control problem. It is shown that this class naturally satisfies the conditions for explicit solution of the Bellman equation, motivating the extension of previous results to semilinear dynamics to account for input nonlinearities. The applicability of the given conditions is illustrated in scenarios with linear and quadratic cost, corresponding to the Stochastic Shortest Path and Linear-Quadratic Regulator problems.
comment: Preprint to be published in the Control Systems Letters
☆ A polynomial projective algorithm for convex feasibility problems with positive-definite constraints
We study a class of projective transformations of spectraplexes associated with self-dual cones and, on this basis, propose a polynomial-time algorithm for convex feasibility problems with positive definite constraints. At each iteration of the algorithm, either a feasible solution is found or a suitable valid inequality inducing a projective transformation allowing to bring the solution set closer to the center of an associated spectraplex. The closeness to the center is measured in terms of a potential function. The running time of our algorithm makes the existing complexity bounds more precise for the case when the number of equations linking the positive definite variable matrices is not less than the sum of the ranks of the respective positive-semidefinite cones.
☆ On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Simple regression methods provide robust, near-optimal solutions for optimal switching problems in dimensions ranging from 1 to 50. While the theory requires solving intractable PDE systems, the Longstaff-Schwartz algorithm with classical approaches like $k$-NN achieves excellent switching decisions without extensive hyperparameter tuning. Testing eight regression approaches on four benchmark problems, we find that simple methods maintain stable performance across diverse problem characteristics, even after extensive neural network optimization. The contaminated training targets inherent to backward induction-where each target contains both approximation bias and Monte Carlo noise-actually favor these robust approaches over more complex alternatives such as neural networks. Further, we establish concentration bounds for $k$-NN regression under jump-diffusion dynamics and show that PCA enables $k$-NN to scale to high dimensions. For practitioners: simple, minimally-tuned regression methods offer reliable performance for computationally demanding switching problems.
☆ Efficient Online Mirror Descent Stochastic Approximation for Multi-Stage Stochastic Programming
We study the unconstrained and the minimax saddle point variants of the convex multi-stage stochastic programming problem, where consecutive decisions are coupled through the objective functions, rather than through the constraints. Based on the analysis of deterministic mirror descent algorithms with inexact gradients, we introduce the idea of \textit{stochastic conditional gradient oracles}, a multi-stage analog of the stochastic gradient oracles used in (classical) stochastic programming. We show one approach to construct such oracles and prove the convergence of the (accelerated) mirror descent stochastic approximation, both in expectation and with high probability. To further reduce the oracle complexity, we view the problem from a \textit{semi-online} perspective, where the stage $t$ decision variables are constructed $s$ stages in advance, instead of before stage $1$. We show that the delay in decision making allows an asynchronous implementation of the mirror descent stochastic approximation algorithms. By avoiding computing solutions for scenarios that are inconsistent with information available during stage $t$, the complexity is reduced from exponential to linear in the number of stages.
☆ Multi-Timescale Gradient Sliding for Distributed Optimization
We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates. To find an $\epsilon$-suboptimal solution, the complexities of our algorithms achieve optimal dependency on $\epsilon$: MT-GS needs $O(\overline{r}A/\epsilon)$ communication rounds and $O(\overline{r}/\epsilon^2)$ subgradient steps for Lipchitz objectives, and AMT-GS needs $O(\overline{r}A/\sqrt{\epsilon\mu})$ communication rounds and $O(\overline{r}/(\epsilon\mu))$ subgradient steps if the objectives are also $\mu$-strongly convex. Here, $\overline{r}$ measures the ``average rate of updates'' for dual blocks, and $A$ measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on $A$ is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).
☆ Disruption of parkinsonian brain oscillations
Deep brain stimulation (DBS) is an advanced surgical treatment for the symptoms of Parkinson's disease (PD), involving electrical stimulation of neurons within the basal ganglia region of the brain. DBS is traditionally delivered in an open-loop manner using fixed stimulation parameters, which may lead to suboptimal results. In an effort to overcome these limitations, closed loop DBS, using pathological subthalamic beta (13--30 Hz) activity as a feedback signal, offers the potential to adapt DBS automatically in response to changes in patient symptoms and side effects. However, clinically implemented closed-loop techniques have been limited to date to simple control algorithms, due to the inherent uncertainties in the dynamics involved. Model-free control, which has already seen successful applications in the field of bioengineering, offers a way to avoid this limitation and provides an alternative method to apply modern control approach to selective suppression of pathological oscillations.
comment: 23rd Internat. Conf. Computational Methods in Systems Biology (CMSB 2025), 10-12 september 2025, Lyon, France
☆ Superpositions for General Conditional Mckean-Vlasov Stochastic Differential Equations
In this paper, we study the connection between a general class of Conditional Mckean-Vlasov Stochastic Differential Equations (CMVSDEs) and its corresponding (infinite dimensional) Conditional Fokker-Planck Equation. The CMVSDE under consideration is similar to the one studied in [4], which is a non-trivial generalization of the McKean-Vlasov SDE with common noise and is closely related to a new type of non-linear Zakai equation that has not been studied in the literature. The main purpose of this paper is to establish the superposition principles among the three subjects so that their well-posedness can imply each other. More precisely, we shall first prove the superposition principle between the non-linear Zakai equation, a non-linear measure-valued stochastic PDE, and a CMVSDE; and then prove the superposition principle between an infinite dimensional conditional Fokker-Planck equation and the nonlinear Zakai equations. It is worth noting that none of the (weak) well-posedness of these SDEs/SPDEs in such generality have been investigated in the literature.
comment: 30 pages
☆ Optimal Control of Thin-Film Flow on a Flexible Topography
This work presents a mathematical model for the optimal control of thin-film flows over a flexible topography influenced by an external force. Our thin-film model allows for the rupture of films as well as the coalescence of bubbles. The objective is to find the optimal distributed force acting on the topography that minimises the differences between actual and desired thin-film profiles. A nonlinear lubrication equation governing the fluid dynamics and appropriate functional settings for this model are presented. It is also shown that this system satisfies a global energy-dissipation law for a suitable energy functional. Optimality conditions are derived for the solution of the minimisation problem of a specified cost function across a time horizon. These conditions are formulated at a continuous level as a system of coupled, forward-backward PDEs, which are subsequently discretised for numerical investigation. To ensure computational efficiency and stability, first-order Implicit-Explicit (IMEX) time-stepping schemes are employed to handle the nonlinearities in the model, and a reduced gradient descent algorithm is applied to obtain a numerical approximation of the optimal control signal. Numerical results illustrate that controlling the thin film, even during rupture, achieves a precise film profile. This control strategy accelerates convergence towards a steady state, reduces instabilities, stabilises dewetting processes, and meets the desired profile specifications.
☆ When and How Unlabeled Data Provably Improve In-Context Learning
Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^iX^\top y$ with $X$ and $y$ denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be expressed as a function of depth and draw connections to Expectation Maximization, an iterative pseudo-labeling algorithm commonly used in semi-supervised learning. Importantly, the leading polynomial power is exponential in depth, so mild amount of depth/looping suffices. As an application of theory, we propose looping off-the-shelf tabular foundation models to enhance their semi-supervision capabilities. Extensive evaluations on real-world datasets show that our method significantly improves the semisupervised tabular learning performance over the standard single pass inference.
☆ Proximal Operators of Sorted Nonconvex Penalties
This work studies the problem of sparse signal recovery with automatic grouping of variables. To this end, we investigate sorted nonsmooth penalties as a regularization approach for generalized linear models. We focus on a family of sorted nonconvex penalties which generalizes the Sorted L1 Norm (SLOPE). These penalties are designed to promote clustering of variables due to their sorted nature, while the nonconvexity reduces the shrinkage of coefficients. Our goal is to provide efficient ways to compute their proximal operator, enabling the use of popular proximal algorithms to solve composite optimization problems with this choice of sorted penalties. We distinguish between two classes of problems: the weakly convex case where computing the proximal operator remains a convex problem, and the nonconvex case where computing the proximal operator becomes a challenging nonconvex combinatorial problem. For the weakly convex case (e.g. sorted MCP and SCAD), we explain how the Pool Adjacent Violators (PAV) algorithm can exactly compute the proximal operator. For the nonconvex case (e.g. sorted Lq with q in ]0,1[), we show that a slight modification of this algorithm turns out to be remarkably efficient to tackle the computation of the proximal operator. We also present new theoretical insights on the minimizers of the nonconvex proximal problem. We demonstrate the practical interest of using such penalties on several experiments.
☆ DOVA-PATBM: An Intelligent, Adaptive, and Scalable Framework for Optimizing Large-Scale EV Charging Infrastructure
The accelerating uptake of battery-electric vehicles demands infrastructure planning tools that are both data-rich and geographically scalable. Whereas most prior studies optimise charging locations for single cities, state-wide and national networks must reconcile the conflicting requirements of dense metropolitan cores, car-dependent exurbs, and power-constrained rural corridors. We present DOVA-PATBM (Deployment Optimisation with Voronoi-oriented, Adaptive, POI-Aware Temporal Behaviour Model), a geo-computational framework that unifies these contexts in a single pipeline. The method rasterises heterogeneous data (roads, population, night lights, POIs, and feeder lines) onto a hierarchical H3 grid, infers intersection importance with a zone-normalised graph neural network centrality model, and overlays a Voronoi tessellation that guarantees at least one five-port DC fast charger within every 30 km radius. Hourly arrival profiles, learned from loop-detector and floating-car traces, feed a finite M/M/c queue to size ports under feeder-capacity and outage-risk constraints. A greedy maximal-coverage heuristic with income-weighted penalties then selects the minimum number of sites that satisfy coverage and equity targets. Applied to the State of Georgia, USA, DOVA-PATBM (i) increases 30 km tile coverage by 12 percentage points, (ii) halves the mean distance that low-income residents travel to the nearest charger, and (iii) meets sub-transmission headroom everywhere -- all while remaining computationally tractable for national-scale roll-outs. These results demonstrate that a tightly integrated, GNN-driven, multi-resolution approach can bridge the gap between academic optimisation and deployable infrastructure policy.
☆ Polynomial Eigenfunctions and Matrix Lyapunov Equations from Energy Balance Integrals
We establish a unified theoretical framework that connects classical orthogonal polynomial systems to matrix Lyapunov equations through the fundamental physics of energy dissipation in stochastic dynamical systems. Starting from the energy balance principle in infinite-dimensional Hilbert spaces, we derive a master integral representation that naturally encompasses both spectral geometry and covariance dynamics. The theory reveals that established orthogonal polynomials (Zernike, Hermite, spherical harmonics) and matrix Lyapunov equations are dual manifestations of the same underlying energy dissipation structure. We provide rigorous mathematical foundations showing how finite-dimensional projections of infinite-dimensional energy integrals reproduce classical matrix equations, with specific structure determined by the symmetries of noise processes. The framework demonstrates that adding uniform dissipation to classical differential operators preserves their polynomial eigenfunction structure while ensuring the energy balance conditions required for physical consistency.
comment: 18 pages
☆ Minimizing Structural Vibrations via Guided Flow Matching Design Optimization
Structural vibrations are a source of unwanted noise in engineering systems like cars, trains or airplanes. Minimizing these vibrations is crucial for improving passenger comfort. This work presents a novel design optimization approach based on guided flow matching for reducing vibrations by placing beadings (indentations) in plate-like structures. Our method integrates a generative flow matching model and a surrogate model trained to predict structural vibrations. During the generation process, the flow matching model pushes towards manufacturability while the surrogate model pushes to low-vibration solutions. The flow matching model and its training data implicitly define the design space, enabling a broader exploration of potential solutions as no optimization of manually-defined design parameters is required. We apply our method to a range of differentiable optimization objectives, including direct optimization of specific eigenfrequencies through careful construction of the objective function. Results demonstrate that our method generates diverse and manufacturable plate designs with reduced structural vibrations compared to designs from random search, a criterion-based design heuristic and genetic optimization. The code and data are available from https://github.com/ecker-lab/Optimizing_Vibrating_Plates.
☆ Contribution of expert aggregation to temperature prediction part I
Many Numerical Weather Prediction (NWP) models and their associated Model Output Statistics (MOS) are available. Combining all of these predictions in an optimal way is however not straightforward. This can be achieved thanks to Expert Aggregation (EA) [Cesa-Bianchi and Lugosi, 2006, Gaillard et al., 2014, Wintenberger, 2024] which has many advantages, such as being online, being adaptive to model changes and having theoretical guarantees. Hence, in this paper, we propose a method for making deterministic temperature predictions with EA strategies and show how this can improve temperature predictions, even those of post processed NWP models. We also compare different EA strategies in various settings and discuss certain limitations.
☆ Contribution of expert aggregation to temperature prediction part II: Second order bounds with sleeping experts
In this paper we improve on the temperature predictions made with (online) Expert Aggregation (EA) [Cesa-Bianchi and Lugosi, 2006] in Part I. In particular, we make the aggregation more reactive, whilst maintaining at least the same root mean squared error and reducing the number of large errors. We have achieved this by using the Sleeping Expert Framework (SEF) [Freund et al., 1997, Devaine et al., 2013], which allows the more efficient use of biased experts (bad on average but which may be good at some point). To deal with the fact that, unlike in Devaine et al. [2013], we do not know in advance when to use these biased experts, we resorted to gradient boosted regression trees [Chen and Guestrin, 2016] and provide regret bounds against sequences of experts [Mourtada and Maillard, 2017] which take into account this uncertainty. We applied this in a fully online way on BOA [Wintenberger, 2024], an adaptive aggregation with second order regret bounds, which had the best results in Part I. Finally, we made a meta-aggregation with the EA follow the leader. This chooses whether or not to use the SEF in order to limit the possible noise added by the SEF.
☆ Operational Control of a Multi-energy District Heating System: Comparison of Model-Predictive Control and Rule-Based Control
This study focuses on operational control strategies for a multi-energy District Heating Network (DHN). Two control strategies are investigated and compared: (i) a reactive rule-based control (RBC) and (ii) a model predictive control (MPC). For the purpose of the study a small scale district heating network is modelled using Modelica. The production plant combines a heat pump, a gas boiler and a thermal solar field on the production side with a storage tank for flexibility purposes. On the consumption side, the virtual buildings are aggregated into a single consumer. We use our co-simulation and control platform, called Pegase, to implement the studied strategies. For both strategies the goal is to meet the consumers' demand while satisfying technical constraints. In addition MPC has the objective to minimize the operational costs, taking into account variable electricity prices and availability of solar thermal resource. Different scenarios are also defined and compared to study the effect of the heat plant sizing and forecasting error. The operational cost is reduced when switching from RBC to a MPC. As can be expected, MPC is more efficient when dealing with variable energy costs, intermittent solar energy and storage capabilities. This study also demonstrates how our tools enable an easy coupling of Modelica-based simulation with various control strategies. It especially supports the implementation and validation of complex MPC strategies in an efficient way, and yearly simulations are performed within 20 minutes.
☆ An efficient co-simulation and control approach to tackle complex multi-domain energetic systems: concepts and applications of the PEGASE platform
In this paper, we present a novel research software, called PEGASE, suitable for the design, validation and deployment of advanced control strategies for complex multi-domain energy systems. PEGASE especially features a highly efficient cosimulation engine, together with integrated solutions for defining both rule-based control strategies and Model-Predictive Control (MPC). The main principle behind the PEGASE platform is divide-and-conquer. Indeed, rather than trying to solve a problem as a monolithic entity, which can be highly complex for multi-domain large-scale systems, it is often more efficient to decompose it into several domains or sub-problems, and to simulate them in a decoupled way. To provide its cosimulation capabilities, we based PEGASE on two main components. The first one is a framework for integrating simulation models, which can be either compatible with the FMI standard or interfaced through an Application Programming Interface (API). The second one is a multi-threaded sequencer enabling several simulation sequences with different time steps. To provide advanced control capabilities, we also equipped PEGASE with a framework for MPC combining a comprehensive management of predictions data and a modeler dedicated to the formulation of Mixed Integer Linear Programs. We implemented this framework in C++ providing low formulation and resolution times for typical applications. Connection to hardware is also available via standard industry protocols thereby allowing PEGASE to control real energy systems. In this paper, we show how these basic functionalities, combined with dedicated modeling tools, enable setting up simulation and control applications suitable for tackling the complexity of various kinds of energy systems. To illustrate this, we present four application examples from our recent research work. These examples cover several domains, from concentrated solar thermal plants to optimal control of district heating networks. The variety of examples demonstrates the robustness and genericity of the approach.
☆ Muon Optimizes Under Spectral Norm Constraints
The pursuit of faster optimization algorithms remains an active and important research direction in deep learning. Recently, the Muon optimizer [JJB+24] has demonstrated promising empirical performance, but its theoretical foundation remains less understood. In this paper, we bridge this gap and provide a theoretical analysis of Muon by placing it within the Lion-$\mathcal{K}$ family of optimizers [CLLL24]. Specifically, we show that Muon corresponds to Lion-$\mathcal{K}$ when equipped with the nuclear norm, and we leverage the theoretical results of Lion-$\mathcal{K}$ to establish that Muon (with decoupled weight decay) implicitly solves an optimization problem that enforces a constraint on the spectral norm of weight matrices. This perspective not only demystifies the implicit regularization effects of Muon but also leads to natural generalizations through varying the choice of convex map $\mathcal{K}$, allowing for the exploration of a broader class of implicitly regularized and constrained optimization algorithms.
☆ KKT-based optimality conditions for neural network approximation
In this paper, we obtain necessary optimality conditions for neural network approximation. We consider neural networks in Manhattan ($l_1$ norm) and Chebyshev ($\max$ norm). The optimality conditions are based on neural networks with at most one hidden layer. We reformulate nonsmooth unconstrained optimisation problems as larger dimension constrained problems with smooth objective functions and constraints. Then we use KKT conditions to develop the necessary conditions and present the optimality conditions in terms of convex analysis and convex sets.
comment: Submitted to a journal
☆ 2BSDE with uncertain horizon and application to stochastic control in erratic environments
We investigate the existence and uniqueness of non-Markovian second-order backward stochastic differential equations with an uncertain terminal horizon and establish comparison principles under the assumption that the driver is Lipschitz continuous. The terminal time is both random and exogenous, and it may not be adapted to the Brownian filtration, leading to a singular jump in the 2BSDE decomposition. We also provide a connection between this new class of 2BSDE and a fully nonlinear PDE in a Markovian setting. Our theoretical results are applied to non-Markovian stochastic control problems in two settings: (1) when an agent seeks to maximize utility from a payoff received at an uncertain terminal time by controlling both the drift and volatility of a diffusion process; and (2) when the agent contends with volatility uncertainty stemming from an external source, referred to as Nature, and optimizes the drift in a worst-case scenario for the ambiguous volatility. We term this class of problems erratic stochastic control, reflecting the dual uncertainty in both model parameters and the timing of the terminal horizon.
♻ ☆ On the (linear) convergence of Generalized Newton Inexact ADMM
This paper presents GeNI-ADMM, a framework for large-scale composite convex optimization that facilitates theoretical analysis of both existing and new approximate ADMM schemes. GeNI-ADMM encompasses any ADMM algorithm that solves a first- or second-order approximation to the ADMM subproblem inexactly. GeNI-ADMM exhibits the usual $\mathcal{O} (1/t)$-convergence rate under standard hypotheses and converges linearly under additional hypotheses such as strong convexity. Further, the GeNI-ADMM framework provides explicit convergence rates for ADMM variants accelerated with randomized linear algebra, such as NysADMM and sketch-and-solve ADMM, resolving an important open question on the convergence of these methods. This analysis quantifies the benefit of improved approximations and can aid in the design of new ADMM variants with faster convergence.
comment: 31 pages, 4 figures, 2 tables
♻ ☆ Solving equilibrium problems in economies with financial markets, home production, and retention
We propose a new methodology to compute equilibria for general equilibrium problems on exchange economies with real financial markets, home-production, and retention. We demonstrate that equilibrium prices can be determined by solving a related maxinf-optimization problem. We incorporate the non-arbitrage condition for financial markets into the equilibrium formulation and establish the equivalence between solutions to both problems. This reduces the complexity of the original by eliminating the need to directly compute financial contract prices, allowing us to calculate equilibria even in cases of incomplete financial markets. We also introduce a Walrasian bifunction that captures the imbalances and show that maxinf-points of this function correspond to equilibrium points. Moreover, we demonstrate that every equilibrium point can be approximated by a limit of maxinf points for a family of perturbed problems, by relying on the notion of lopsided convergence. Finally, we propose an augmented Walrasian algorithm and present numerical examples to illustrate the effectiveness of this approach. Our methodology allows for efficient calculation of equilibria in a variety of exchange economies and has potential applications in finance and economics.
♻ ☆ Coherent Local Explanations for Mathematical Optimization
The surge of explainable artificial intelligence methods seeks to enhance transparency and explainability in machine learning models. At the same time, there is a growing demand for explaining decisions taken through complex algorithms used in mathematical optimization. However, current explanation methods do not take into account the structure of the underlying optimization problem, leading to unreliable outcomes. In response to this need, we introduce Coherent Local Explanations for Mathematical Optimization (CLEMO). CLEMO provides explanations for multiple components of optimization models, the objective value and decision variables, which are coherent with the underlying model structure. Our sampling-based procedure can provide explanations for the behavior of exact and heuristic solution algorithms. The effectiveness of CLEMO is illustrated by experiments for the shortest path problem, the knapsack problem, and the vehicle routing problem.
♻ ☆ Boundary Control for Stability and Invariance of Traffic Flow Dynamics: A Convex Optimization Approach
In this letter we propose an optimization-based boundary controller for traffic flow dynamics capable of achieving both stability and invariance conditions. The approach is based on the definition of Boundary Control Barrier Functionals, from which sets of invariance-preserving boundary controllers are derived. In combination with sets of stabilizing controllers, we reformulate the problem as a convex optimization program solved at each point in time to synthesize the boundary control inputs. We derive sufficient conditions for the existence of optimal controllers that ensure both stability and invariance.
♻ ☆ $k$-Submodular Interdiction Problems under Distributional Risk-Receptiveness and Robustness: Application to Machine Learning
We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the success of attacks and inherent data noise, and address challenges due to incomplete knowledge of the probability distribution of random parameters. Specifically, we introduce Distributionally Robust $k$-Submodular Interdiction Problem (DRO $k$-SIP) and Distributionally Risk-Receptive $k$-Submodular Interdiction Problem (DRR $k$-SIP) along with finitely convergent exact algorithms for solving them. When solving the DRO $k$-SIP, the attacker optimizes their expected payoff with respect to the worst-case probability distribution within the ambiguity set, and thereby have robust attack strategies despite distributional ambiguity. In contrast, the DRR $k$-SIP identifies attacker strategies with the best-case probability distribution, and identifies critical vulnerabilities for the defender. The optimal values derived from both DRO $k$-SIP and DRR $k$-SIP offer a confidence interval-like range for the expected value of the defender's objective function, capturing distributional ambiguity. We conduct computational experiments on instances of feature selection and sensor placement problems, using Wisconsin breast cancer data and synthetic data, respectively.
♻ ☆ Inexact JKO and proximal-gradient algorithms in the Wasserstein space
This paper studies the convergence properties of the inexact Jordan-Kinderlehrer-Otto (JKO) scheme and proximal-gradient algorithm in the context of Wasserstein spaces. The JKO scheme, a widely-used method for approximating solutions to gradient flows in Wasserstein spaces, typically assumes exact solutions to iterative minimization problems. However, practical applications often require approximate solutions due to computational limitations. This work focuses on the convergence of the scheme to minimizers for the underlying functional and addresses these challenges by analyzing two types of inexactness: errors in Wasserstein distance and errors in energy functional evaluations. The paper provides rigorous convergence guarantees under controlled error conditions, demonstrating that weak convergence can still be achieved with inexact steps. The analysis is further extended to proximal-gradient algorithms, showing that convergence is preserved under inexact evaluations.
♻ ☆ High Probability Convergence of Distributed Clipped Stochastic Gradient Descent with Heavy-tailed Noise
In this paper, the problem of distributed optimization is studied via a network of agents. Each agent only has access to a noisy gradient of its own objective function, and can communicate with its neighbors via a network. To handle this problem, a distributed clipped stochastic gradient descent algorithm is proposed, and the high probability convergence of the algorithm is studied. Existing works on distributed algorithms involving stochastic gradients only consider the light-tailed noises. Different from them, we study the case with heavy-tailed settings. Under mild assumptions on the graph connectivity, we prove that the algorithm converges in high probability under a certain clipping operator. Finally, a simulation is provided to demonstrate the effectiveness of our theoretical results
♻ ☆ CISSIR: Beam Codebooks with Self-Interference Reduction Guarantees for Integrated Sensing and Communication Beyond 5G
We propose a beam codebook design to reduce self-interference (SI) in integrated sensing and communication (ISAC) systems. Our optimization methods, which can be applied to both tapered beamforming and phased arrays, adapt the codebooks to the SI channel such that a certain SI level is achieved. Furthermore, we derive an upper bound on the quantization noise in terms of the achieved SI level, which provides guidelines to pose the optimization problem in order to obtain performance guarantees for sensing. By selecting standard reference codebooks in our simulations, we show substantially improved sensing quality with little impact on 5G-NR communication. Our proposed method is not only less dependent on hyperparameters than other approaches in the literature, but it can also reduce SI further, and thus deliver better sensing and communication performance.
comment: 13 Pages, 6 figures. This work has been submitted to the IEEE for possible publication
♻ ☆ Well-Posedness and Stability of Infinite-Dimensional Systems Under Monotone Feedback
We study the well-posedness and stability of an impedance passive infinite-dimensional linear system under nonlinear feedback of the form $u(t)=\phi(v(t)-y(t))$, where $\phi$ is a monotone function. Our first main result introduces conditions guaranteeing the existence of classical and generalised solutions in a situation where the original linear system is well-posed. In the absence of the external input $v$ we establish the existence of strong and generalised solutions under strictly weaker conditions. Finally, we introduce conditions guaranteeing that the origin is a globally asymptotically stable equilibrium point of the closed-loop system. Motivated by the analysis of partial differential equations with nonlinear boundary conditions, we use our results to investigate the well-posedness and stablility of abstract boundary control systems, port-Hamiltonian systems, a Timoshenko beam model, and a two-dimensional boundary controlled heat equation.
comment: 40 pages, 1 figure. Submitted. Version 3: Minor changes in Section 4. Version 2: The stability results in Section 4 were generalised to system nodes which are impedance passive but not necessarily well-posed
♻ ☆ Towards net-zero manufacturing: carbon-aware scheduling for GHG emissions reduction
Detailed scheduling has traditionally been optimized for the reduction of makespan and manufacturing costs. However, growing awareness of environmental concerns and increasingly stringent regulations are pushing manufacturing towards reducing the carbon footprint of its operations. Scope 2 emissions, which are the indirect emissions related to the production and consumption of grid electricity, are in fact estimated to be responsible for more than one-third of the global GHG emissions. In this context, carbon-aware scheduling can serve as a powerful way to reduce manufacturing's carbon footprint by considering the time-dependent carbon intensity of the grid and the availability of on-site renewable electricity. This study introduces a carbon-aware permutation flow-shop scheduling model designed to reduce scope 2 emissions. The model is formulated as a mixed-integer linear problem, taking into account the forecasted grid generation mix and available on-site renewable electricity, along with the set of jobs to be scheduled and their corresponding power requirements. The objective is to find an optimal day-ahead schedule that minimizes scope 2 emissions. The problem is addressed using a dedicated memetic algorithm, combining evolutionary strategy and local search. Results from computational experiments confirm that by considering the dynamic carbon intensity of the grid and on-site renewable electricity availability, substantial reductions in carbon emissions can be achieved.
♻ ☆ Solving decision problems with endogenous uncertainty and conditional information revelation using influence diagrams
Mathematical programming formulations of influence diagrams can bridge the gap between representing and solving decision problems. However, they suffer from both modeling and computational limitations. Aiming to address modeling limitations, we show how to incorporate conditionally observed information within the mathematical programming representation of the influence diagram. Multi-stage stochastic programming models use conditional non-anticipativity constraints to represent such uncertainties, and we show how such constraints can be incorporated into the influence diagram formulations. This allows us to consider the two main types of endogenous uncertainty simultaneously, namely decision-dependent information structure and decision-dependent probability distribution. Additionally, we apply a subdiagram decomposition to improve both computational efficiency and modeling capabilities. Under suitable conditions, this decomposition allows for considering continuous decision variables arising from, e.g., investment sizing decisions, leading to better solutions than a discretization of the continuous decisions. Finally, our proposed framework is illustrated with a large-scale cost-benefit problem regarding climate change mitigation, simultaneously considering technological research and development, and optimal emission trajectories.
comment: 28 pages, 10 figures, 1 table
♻ ☆ Convergence analysis of controlled particle systems arising in deep learning: from finite to infinite sample size
This paper deals with a class of neural SDEs and studies the limiting behavior of the associated sampled optimal control problems as the sample size grows to infinity. The neural SDEs with $N$ samples can be linked to the $N$-particle systems with centralized control. We analyze the Hamilton-Jacobi-Bellman equation corresponding to the $N$-particle system and establish regularity results which are uniform in $N$. The uniform regularity estimates are obtained by the stochastic maximum principle and the analysis of a backward stochastic Riccati equation. Using these uniform regularity results, we show the convergence of the minima of the objective functionals and optimal parameters of the neural SDEs as the sample size $N$ tends to infinity. The limiting objects can be identified with suitable functions defined on the Wasserstein space of Borel probability measures. Furthermore, quantitative convergence rates are also obtained.
comment: 46 pages
♻ ☆ A Second-Order Majorant Algorithm for Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) is a fundamental tool in unsupervised learning, widely used for tasks such as dimensionality reduction, feature extraction, representation learning, and topic modeling. Many algorithms have been developed for NMF, including the well-known Multiplicative Updates (MU) algorithm, which belongs to a broader class of majorization-minimization techniques. In this work, we introduce a general second-order optimization framework for NMF under both quadratic and $\beta$-divergence loss functions. This approach, called Second-Order Majorant (SOM), constructs a local quadratic majorization of the loss function by majorizing its Hessian matrix. It includes MU as a special case, while enabling faster variants. In particular, we propose mSOM, a new algorithm within this class that leverages a tighter local approximation to accelerate convergence. We provide a convergence analysis, showing linear convergence for individual factor updates and global convergence to a stationary point for the alternating version, AmSOM algorithm. Numerical experiments on both synthetic and real data sets demonstrate that mSOM consistently outperforms state-of-the-art algorithms across multiple loss functions.
comment: Updated version in JMLR style. This version matches the manuscript currently under review at JMLR and includes substantial improvements over the original arXiv version
♻ ☆ On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(\epsilon^{-2})$, $\tilde{\mathcal{O}}(\epsilon^{-4})$ and $\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively. The complexities in the first two cases are optimal up to logarithmic factors.
comment: Published in COLT 2024. This arXiv version refines Assumption 4.1 (d); adds discussions on related works in Appendix A; and corrects the kappa dependency in the upper bounds
Systems and Control 28
☆ Algorithmic Approaches to Enhance Safety in Autonomous Vehicles: Minimizing Lane Changes and Merging
The rapid advancements in autonomous vehicle (AV) technology promise enhanced safety and operational efficiency. However, frequent lane changes and merging maneuvers continue to pose significant safety risks and disrupt traffic flow. This paper introduces the Minimizing Lane Change Algorithm (MLCA), a state-machine-based approach designed to reduce unnecessary lane changes, thereby enhancing both traffic safety and efficiency. The MLCA algorithm prioritizes maintaining lane stability unless safety-critical conditions necessitate a lane change. The algorithm's effectiveness was evaluated through simulations conducted on the SUMO platform, comparing its performance against established models, including LC2017 and MOBIL. Results demonstrate substantial reductions in lane changes and collisions, leading to smoother traffic flow and improved safety metrics. Additionally, the study highlights the MLCA's adaptability to various traffic densities and roadway configurations, showcasing its potential for wide-scale deployment in real-world AV systems. Future work aims to validate these findings in more complex scenarios using the CARLA simulator, which will enable the testing of the algorithm under more dynamic and high-fidelity conditions, such as urban traffic environments with diverse road users. Moreover, the integration of cybersecurity measures for vehicle-to-vehicle (V2V) communication will be explored to ensure robust and secure data exchange, further enhancing the reliability and safety of AV operations. This research contributes to the broader goal of developing intelligent traffic systems that optimize both individual vehicle performance and overall traffic network efficiency.
☆ Mixed Traffic: A Perspective from Long Duration Autonomy
The rapid adoption of autonomous vehicle has established mixed traffic environments, comprising both autonomous and human-driven vehicles (HDVs), as essential components of next-generation mobility systems. Along these lines, connectivity between autonomous vehicles and infrastructure (V2I) is also a significant factor that can effectively support higher-level decision-making. At the same time, the integration of V2I within mixed traffic environments remains a timely and challenging problem. In this paper, we present a long-duration autonomy controller for connected and automated vehicles (CAVs) operating in such environments, with a focus on intersections where right turns on red are permitted. We begin by deriving the optimal control policy for CAVs under free-flow traffic. Next, we analyze crossing time constraints imposed by smart traffic lights and map these constraints to controller bounds using Control Barrier Functions (CBFs), with the aim to drive a CAV to cross the intersection on time. We also introduce criteria for identifying, in real-time, feasible crossing intervals for each CAV. To ensure safety for the CAVs, we present model-agnostic safety guarantees, and demonstrate their compatibility with both CAVs and HDVs. Ultimately, the final control actions are enforced through a combination of CBF constraints, constraining CAVs to traverse the intersection within the designated time intervals while respecting other vehicles. Finally, we guarantee that our control policy yields always a feasible solution and validate the proposed approach through extensive simulations in MATLAB.
comment: 14 pages, 12 figures
☆ Swarm-STL: A Framework for Motion Planning in Large-Scale, Multi-Swarm Systems
In multi-agent systems, signal temporal logic (STL) is widely used for path planning to accomplish complex objectives with formal safety guarantees. However, as the number of agents increases, existing approaches encounter significant computational challenges. Recognizing that many complex tasks require cooperation among multiple agents, we propose swarm STL specifications to describe the collective tasks that need to be achieved by a team of agents. Next, we address the motion planning problem for all the agents in two stages. First, we abstract a group of cooperating agents as a swarm and construct a reduced-dimension state space whose dimension does not increase with the number of agents. The path planning is performed at the swarm level, ensuring the safety and swarm STL specifications are satisfied. Then, we design low-level control strategies for agents within each swarm based on the path synthesized in the first step. The trajectories of agents generated by the two-step policy ensure satisfaction of the STL specifications. We evaluate our two-stage approach in both single-swarm and multi-swarm scenarios. The results demonstrate that all tasks are completed with safety guarantees. Compared to the baseline multi-agent planning approach, our method maintains computational efficiency as the number of agents increases, since the computational time scales with the number of swarms rather than the number of agents.
☆ Modeling Uncertainty: From Simulink to Stochastic Hybrid Automata
Simulink is widely used in industrial design processes to model increasingly complex embedded control systems. Thus, their formal analysis is highly desirable. However, this comes with two major challenges: First, Simulink models often provide an idealized view of real-life systems and omit uncertainties such as, aging, sensor noise or failures. Second, the semantics of Simulink is only informally defined. In this paper, we present an approach to formally analyze safety and performance of embedded control systems modeled in Simulink in the presence of uncertainty. To achieve this, we 1) model different types of uncertainties as stochastic Simulink subsystems and 2) extend an existing formalization of the Simulink semantics based on stochastic hybrid automata (SHA) by providing transformation rules for the stochastic subsystems. Our approach gives us access to established quantitative analysis techniques, like statistical model checking and reachability analysis. We demonstrate the applicability of our approach by analyzing safety and performance in the presence of uncertainty for two smaller case studies.
☆ Bayesian Knowledge Transfer for a Kalman Fixed-Lag Interval Smoother
A Bayesian knowledge transfer mechanism that leverages external information to improve the performance of the Kalman fixed-lag interval smoother (FLIS) is proposed. Exact knowledge of the external observation model is assumed to be missing, which hinders the direct application of Bayes' rule in traditional transfer learning approaches. This limitation is overcome by the fully probabilistic design, conditioning the targeted task of state estimation on external information. To mitigate the negative impact of inaccurate external data while leveraging precise information, a latent variable is introduced. Favorably, in contrast to a filter, FLIS retrospectively refines past decisions up to a fixed time horizon, reducing the accumulation of estimation error and consequently improving the performance of state inference. Simulations indicate that the proposed algorithm better exploits precise external knowledge compared to a similar technique and achieves comparable results when the information is imprecise.
comment: 6 pages, 1 figure. Accepted for publication in IEEE Control Systems Letters. To be presented at the IEEE 64th Conference on Decision and Control
☆ Network-Independent Incremental Passivity Conditions for Grid-Forming Inverter Control
Grid-forming inverters control the power transfer between the AC and DC sides of an electrical grid while maintaining the frequency and voltage of the AC side. This paper focuses on ensuring large-signal stability of an electrical grid with inverter-interfaced renewable sources. We prove that the Hybrid-Angle Control (HAC) scheme for grid-forming inverters can exhibit incremental passivity properties between current and voltage at both the AC and DC ports. This incremental passivity can be certified through decentralized conditions. Inverters operating under HAC can, therefore, be connected to other passive elements (e.g. transmission lines) with an immediate guarantee of global transient stability regardless of the network topology or parameters. Passivity of Hybrid Angle Control is also preserved under small-signal (linearized) analyses, in contrast to conventional proportional droop laws that are passivity-short at low frequencies. Passivity and interconnected-stability properties are demonstrated through an example case study.
comment: 7 pages, 4 figures
☆ Barrier Method for Inequality Constrained Factor Graph Optimization with Application to Model Predictive Control
Factor graphs have demonstrated remarkable efficiency for robotic perception tasks, particularly in localization and mapping applications. However, their application to optimal control problems -- especially Model Predictive Control (MPC) -- has remained limited due to fundamental challenges in constraint handling. This paper presents a novel integration of the Barrier Interior Point Method (BIPM) with factor graphs, implemented as an open-source extension to the widely adopted g2o framework. Our approach introduces specialized inequality factor nodes that encode logarithmic barrier functions, thereby overcoming the quadratic-form limitations of conventional factor graph formulations. To the best of our knowledge, this is the first g2o-based implementation capable of efficiently handling both equality and inequality constraints within a unified optimization backend. We validate the method through a multi-objective adaptive cruise control application for autonomous vehicles. Benchmark comparisons with state-of-the-art constraint-handling techniques demonstrate faster convergence and improved computational efficiency. (Code repository: https://github.com/snt-arg/bipm_g2o)
☆ Techno-economic optimization of hybrid steam-electric energy systems with excess heat utilization and reserve market participation
This study investigates the economic viability and optimal configuration of a hybrid industrial energy system combining an electrode boiler, steam accumulator, and battery energy storage system (BESS). This study optimizes system operation for a specific configuration to minimize net energy costs, defined as energy costs minus profits from price arbitrage and reserve markets. The optimization uses load shifting, peak shaving, and frequency containment reserve market participation with hourly 2024 data from Norway and Germany. Net present value (NPV) analysis was performed to determine the most cost-efficient energy storage configurations. The results show that current investment costs favor steam accumulators over BESS in both countries. However, a reduction in BESS cost will make batteries economically competitive, particularly in Germany, where high price volatility and power-based grid tariffs provide stronger incentives for load shifting and peak shaving. Participation in the FCR market accounted for a 17% and 7% reduction of the net energy costs in Norway and Germany, respectively. Utilization of excess heat, through inlet water preheating, further reduced the net energy costs. Sensitivity analyses confirm that investment costs, especially for BESS, strongly influence optimal system design. These findings offer guidance for industrial flexibility investments across diverse electricity markets.
☆ A Novel Dynamic Bandwidth Allocation Design for 100G Coherent Passive Optical Network
With the rapid advancements in coherent Passive Optical Network (PON) technologies featuring 100G and higher data rates, this paper addresses the urgent requirement for sophisticated simulation and MAC layer development within the domain of coherent Time Division Multiplexing (TDM) PON and coherent Time and Frequency Division Multiplexing (TFDM) PON networks. The ever-growing demand for latency-sensitive services and expanding user populations in next-generation 100G and beyond coherent PONs, underscores the crucial need for low-latency bandwidth management and efficient Dynamic Bandwidth Allocation (DBA) mechanisms. In this paper, we present a pioneering analysis of two established DBAs from the perspective of temporal misalignments. Subsequently, a novel DBA algorithm tailored for coherent PONs featuring 100 Gbps data rate and up to 512 end-users is introduced, named the Hybrid-Switch DBA. This innovative approach allows for adaptive switching of the DBA scheme in response to real-time traffic conditions. To the best of our knowledge, this paper represents the first attempt to address the misalignment problem of DBA and proposes a novel DBA solution for both TDM- and TFDM-based coherent PON networks. This research significantly contributes to the development of coherent TDM PON and coherent TFDM PON networks by enhancing the efficiency of bandwidth allocation and addressing the challenges associated with misalignments in DBA mechanisms. As optical access networks continue to evolve to meet the ever-increasing demands of modern communication services, the Hybrid-Switch DBA algorithm presented in this paper offers a promising solution for optimizing network performance and accommodating latency-sensitive applications.
☆ Pose State Perception of Interventional Robot for Cardio-cerebrovascular Procedures
In response to the increasing demand for cardiocerebrovascular interventional surgeries, precise control of interventional robots has become increasingly important. Within these complex vascular scenarios, the accurate and reliable perception of the pose state for interventional robots is particularly crucial. This paper presents a novel vision-based approach without the need of additional sensors or markers. The core of this paper's method consists of a three-part framework: firstly, a dual-head multitask U-Net model for simultaneous vessel segment and interventional robot detection; secondly, an advanced algorithm for skeleton extraction and optimization; and finally, a comprehensive pose state perception system based on geometric features is implemented to accurately identify the robot's pose state and provide strategies for subsequent control. The experimental results demonstrate the proposed method's high reliability and accuracy in trajectory tracking and pose state perception.
☆ Nonlinear Control of a Quadrotor UAV Using Backstepping-Based Sliding Mode Technique
This paper presents the development of a sliding mode controller using the backstepping approach. The controller is employed to synthesize tracking errors and Lyapunov functions. A novel state-space representation is formulated by incorporating the dynamics of the quadrotor and accounting for non-holonomic constraints. The proposed sliding mode controller effectively addresses system nonlinearities and improves tracking of predefined trajectories. Simulation results are presented graphically to demonstrate the controller's performance.
☆ Hard Contacts with Soft Gradients: Refining Differentiable Simulators for Learning and Control
Contact forces pose a major challenge for gradient-based optimization of robot dynamics as they introduce jumps in the system's velocities. Penalty-based simulators, such as MuJoCo, simplify gradient computation by softening the contact forces. However, realistically simulating hard contacts requires very stiff contact settings, which leads to incorrect gradients when using automatic differentiation. On the other hand, using non-stiff settings strongly increases the sim-to-real gap. We analyze the contact computation of penalty-based simulators to identify the causes of gradient errors. Then, we propose DiffMJX, which combines adaptive integration with MuJoCo XLA, to notably improve gradient quality in the presence of hard contacts. Finally, we address a key limitation of contact gradients: they vanish when objects do not touch. To overcome this, we introduce Contacts From Distance (CFD), a mechanism that enables the simulator to generate informative contact gradients even before objects are in contact. To preserve physical realism, we apply CFD only in the backward pass using a straight-through trick, allowing us to compute useful gradients without modifying the forward simulation.
☆ Enhancing Forecasting Accuracy in Dynamic Environments via PELT-Driven Drift Detection and Model Adaptation
Accurate time series forecasting models are often compromised by data drift, where underlying data distributions change over time, leading to significant declines in prediction performance. To address this challenge, this study proposes an adaptive forecasting framework that integrates drift detection with targeted model retraining to compensate for drift effects. The framework utilizes the Pruned Exact Linear Time (PELT) algorithm to identify drift points within the feature space of time series data. Once drift intervals are detected, selective retraining is applied to prediction models using Multilayer Perceptron (MLP) and Lasso Regressor architectures, allowing the models to adjust to changing data patterns. The effectiveness of the proposed approach is demonstrated on two datasets: a real-world dataset containing electricity consumption and HVAC system data, and a synthetic financial dataset designed to test cross-domain applicability. Initial baseline models were developed without drift detection using extensive feature engineering. After integrating drift-aware retraining, the MLP model achieved a 44% reduction in mean absolute error (MAE) and a 39% increase in R^2 on the real-world dataset, while even greater improvements were observed on the synthetic financial dataset. Similar enhancements were achieved with the Lasso Regressor. These results highlight the robustness and generalizability of incorporating drift detection and adaptive retraining to sustain forecasting accuracy across diverse domains.
comment: 18 pages
☆ Considering the multi-time scale rolling optimization scheduling method of micro-energy network connected to electric vehicles
The large-scale access of electric vehicles to the power grid not only provides flexible adjustment resources for the power system, but the temporal uncertainty and distribution complexity of their energy interaction pose significant challenges to the economy and robustness of the micro-energy network. In this paper, we propose a multi-time scale rolling optimization scheduling method for micro-energy networks considering the access of electric vehicles. In order to solve the problem of evaluating the dispatchable potential of electric vehicle clusters, a charging station aggregation model was constructed based on Minkowski summation theory, and the scattered electric vehicle resources were aggregated into virtual energy storage units to participate in system scheduling. Integrate price-based and incentive-based demand response mechanisms to synergistically tap the potential of source-load two-side regulation; On this basis, a two-stage optimal scheduling model of day-ahead and intra-day is constructed. The simulation results show that the proposed method reduces the scale of "preventive curtailment" due to more accurate scheduling, avoids the threat of power shortage to the safety of the power grid, and has more advantages in the efficiency of new energy consumption. At the same time, intra-day scheduling significantly reduces economic penalties and operating costs by avoiding output shortages, and improves the economy of the system in an uncertain forecasting environment.
comment: 7 pages,9 figures,1 table,conference
☆ A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving
Vision-Language Models (VLMs) have demonstrated notable promise in autonomous driving by offering the potential for multimodal reasoning through pretraining on extensive image-text pairs. However, adapting these models from broad web-scale data to the safety-critical context of driving presents a significant challenge, commonly referred to as domain shift. Existing simulation-based and dataset-driven evaluation methods, although valuable, often fail to capture the full complexity of real-world scenarios and cannot easily accommodate repeatable closed-loop testing with flexible scenario manipulation. In this paper, we introduce a hierarchical real-world test platform specifically designed to evaluate VLM-integrated autonomous driving systems. Our approach includes a modular, low-latency on-vehicle middleware that allows seamless incorporation of various VLMs, a clearly separated perception-planning-control architecture that can accommodate both VLM-based and conventional modules, and a configurable suite of real-world testing scenarios on a closed track that facilitates controlled yet authentic evaluations. We demonstrate the effectiveness of the proposed platform`s testing and evaluation ability with a case study involving a VLM-enabled autonomous vehicle, highlighting how our test framework supports robust experimentation under diverse conditions.
☆ Extracting transient Koopman modes from short-term weather simulations with sparsity-promoting dynamic mode decomposition
Convective features-here represented as warm bubble-like patterns-reveal essential, high-level information about how short-term weather dynamics evolve within a high-dimensional state space. We introduce a data-driven framework that uncovers transient dynamics captured by Koopman modes responsible for these structures and traces their emergence, growth, and decay. Our approach incorporates the sparsity-promoting dynamic mode decomposition into the framework of Koopman mode decomposition, yielding a few number of selected modes whose sparse amplitudes highlight dominant transient structures. By tuning the sparsity weight, we balance reconstruction accuracy and model complexity. We illustrate the methodology on weather simulations, using the magnitude of velocity and vorticity fields as distinct observable datasets. The resulting sparse dominant Koopman modes capture the transient evolution of bubble-like pattern and can reduce the dimensionality of the weather system model, offering an efficient surrogate for diagnostic and forecasting tasks.
comment: 38 pages, 21 figures,
♻ ☆ Optimizing Battery and Line Undergrounding Investments for Transmission Systems under Wildfire Risk Scenarios: A Benders Decomposition Approach
With electric power infrastructure posing an increasing risk of igniting wildfires under continuing climate change, utilities are frequently de-energizing power lines to mitigate wildfire ignition risk, which can cause load shedding. Recent research advocates for installing battery energy storage systems as well as undergrounding risky overhead lines to reduce the load shedding during such de-energizations. Since wildfire ignition risk can exhibit substantial geographic and temporal variations, it is important to plan battery installation and line undergrounding investments while considering multiple possible scenarios. This paper presents a scenario-based framework for optimizing battery installation and line undergrounding investments while considering many scenarios, each consisting of a day-long time series of uncertain parameters for the load demand, renewable generation, and wildfire ignition risks. This problem is difficult to solve due to a large number of scenarios and binary variables associated with the battery placements as well as the lines to be undergrounded. To address the computational challenges, we decompose the problem in a two-stage scheme via a Benders decomposition approach. The first stage is a master problem formulated as a mixed integer linear programming (MILP) model that makes decisions on the locations and sizes of batteries as well as the lines to be undergrounded. The second stage consists of a linear programming model that assesses these battery and line undergrounding decisions as modeled by a DC OPF formulation. We demonstrate the effectiveness of the proposed scheme on a large-scale transmission network with real world data on wildfire ignition risks, load, and renewable generation.
♻ ☆ Mass-Adaptive Admittance Control for Robotic Manipulators
Handling objects with unknown or changing masses is a common challenge in robotics, often leading to errors or instability if the control system cannot adapt in real-time. In this paper, we present a novel approach that enables a six-degrees-of-freedom robotic manipulator to reliably follow waypoints while automatically estimating and compensating for unknown payload weight. Our method integrates an admittance control framework with a mass estimator, allowing the robot to dynamically update an excitation force to compensate for the payload mass. This strategy mitigates end-effector sagging and preserves stability when handling objects of unknown weights. We experimentally validated our approach in a challenging pick-and-place task on a shelf with a crossbar, improved accuracy in reaching waypoints and compliant motion compared to a baseline admittance-control scheme. By safely accommodating unknown payloads, our work enhances flexibility in robotic automation and represents a significant step forward in adaptive control for uncertain environments.
comment: 6 pages, 7 figures
♻ ☆ Toward Near-Globally Optimal Nonlinear Model Predictive Control via Diffusion Models
Achieving global optimality in nonlinear model predictive control (NMPC) is challenging due to the non-convex nature of the underlying optimization problem. Since commonly employed local optimization techniques depend on carefully chosen initial guesses, this non-convexity often leads to suboptimal performance resulting from local optima. To overcome this limitation, we propose a novel diffusion model-based approach for near-globally optimal NMPC consisting of an offline and an online phase. The offline phase employs a local optimizer to sample from the distribution of optimal NMPC control sequences along generated system trajectories through random initial guesses. Subsequently, the generated diverse dataset is used to train a diffusion model to reflect the multi-modal distribution of optima. In the online phase, the trained model is leveraged to efficiently perform a variant of random shooting optimization to obtain near-globally optimal control sequences without relying on any initial guesses or online NMPC solving. The effectiveness of our approach is illustrated in a numerical simulation indicating high performance benefits compared to direct neural network approximations of NMPC and significantly lower computation times than online solving NMPC using global optimizers.
comment: This paper has been accepted by the 2025 7th Annual Learning for Dynamics & Control Conference (L4DC) as an oral presentation and has been nominated for the best paper award
♻ ☆ Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments
A finite-horizon variant of the quickest change detection problem is investigated, which is motivated by a change detection problem that arises in piecewise stationary bandits. The goal is to minimize the \emph{latency}, which is smallest threshold such that the probability that the detection delay exceeds the threshold is below a desired low level, while controlling the false alarm probability to a desired low level. When the pre- and post-change distributions are unknown, two tests are proposed as candidate solutions. These tests are shown to attain order optimality in terms of the horizon. Furthermore, the growth in their latencies with respect to the false alarm probability and late detection probability satisfies a property that is desirable in regret analysis for piecewise stationary bandits. Numerical results are provided to validate the theoretical performance results.
comment: 15 pages, 2 figures. arXiv admin note: text overlap with arXiv:2501.01291
♻ ☆ Fast Switching in Mixed-Integer Model Predictive Control
We derive stability results for finite control set and mixed-integer model predictive control with a downstream oversampling phase. The presentation rests upon the inherent robustness of model predictive control with stabilizing terminal conditions and techniques for solving mixed-integer optimal control problems by continuous optimization. Partial outer convexification and binary relaxation transform mixed-integer problems into common optimal control problems. We deduce nominal asymptotic stability for the resulting relaxed system formulation and implement sum-up rounding to restore efficiently integer feasibility on an oversampling time grid. If fast control switching is technically possible and inexpensive, we can approximate the relaxed system behavior in the state space arbitrarily close. We integrate input perturbed model predictive control with practical asymptotic stability. Numerical experiments illustrate practical relevance of fast control switching.
comment: This preprint was revised based on the feedback from the reviewers and resubmitted to the IEEE
♻ ☆ Exact Characterization of Aggregate Flexibility via Generalized Polymatroids
It is well established that the aggregate flexibility inherent in populations of distributed energy resources (DERs) can be leveraged to mitigate the intermittency and uncertainty associated with renewable generation, while also providing ancillary grid services. To enable this, aggregators must effectively represent the flexibility in the populations they control to the market or system operator. A key challenge is accurately computing the aggregate flexibility of a population, which can be formally expressed as the Minkowski sum of a collection of polytopes, a problem that is generally computationally intractable. However, the flexibility polytopes of many DERs exhibit structural symmetries that can be exploited for computational efficiency. To this end, we introduce generalized polymatroids, a family of polytopes, into the flexibility aggregation literature. We demonstrate that individual flexibility sets belong to this family, enabling efficient computation of their exact Minkowski sum. For homogeneous populations of DERs we further derive simplifications that yield more succinct representations of aggregate flexibility. Additionally, we develop an efficient optimization framework over these sets and propose a vertex-based disaggregation method, to allocate aggregate flexibility among individual DERs. Finally, we validate the optimality and computational efficiency of our approach through comparisons with existing methods.
♻ ☆ Fast Contact Detection via Fusion of Joint and Inertial Sensors for Parallel Robots in Human-Robot Collaboration
Fast contact detection is crucial for safe human-robot collaboration. Observers based on proprioceptive information can be used for contact detection but have first-order error dynamics, which results in delays. Sensor fusion based on inertial measurement units (IMUs) consisting of accelerometers and gyroscopes is advantageous for reducing delays. The acceleration estimation enables the direct calculation of external forces. For serial robots, the installation of multiple accelerometers and gyroscopes is required for dynamics modeling since the joint coordinates are the minimal coordinates. Alternatively, parallel robots (PRs) offer the potential to use only one IMU on the end-effector platform, which already presents the minimal coordinates of the PR. This work introduces a sensor-fusion method for contact detection using encoders and only one low-cost, consumer-grade IMU for a PR. The end-effector accelerations are estimated by an extended Kalman filter and incorporated into the dynamics to calculate external forces. In real-world experiments with a planar PR, we demonstrate that this approach reduces the detection duration by up to 50% compared to a momentum observer and enables the collision and clamping detection within 3-39ms.
comment: Preprint of a publication accepted for IEEE Robotics and Automation Letters
♻ ☆ Stability of the Theta Method for Systems with Multiple Time-Delayed Variables
The paper focuses on the numerical stability and accuracy of implicit time-domain integration (TDI) methods when applied for the solution of a power system model impacted by time delays. Such a model is generally formulated as a set of delay differential algebraic equations (DDAEs) in non index-1 Hessenberg form. In particular, the paper shows that numerically stable ordinary differential equation (ODE) methods, such as the trapezoidal and the Theta method, can become unstable when applied to a power system that includes a significant number of delayed variables. Numerical stability is discussed through a scalar test delay differential equation, as well as through a matrix pencil approach that accounts for the DDAEs of any given dynamic power system model. Simulation results are presented in a case study based on the IEEE 39-bus system.
comment: 10 pages
♻ ☆ System Identification Beyond the Nyquist Frequency: A Kernel-Regularized Approach
Models that contain intersample behavior are important for control design of systems with slow-rate outputs. The aim of this paper is to develop a system identification technique for fast-rate models of systems where only slow-rate output measurements are available, e.g., vision-in-the-loop systems. In this paper, the intersample response is estimated by identifying fast-rate models through least-squares criteria, and the limitations of these models are determined. In addition, a method is developed that surpasses these limitations and is capable of estimating unique fast-rate models of arbitrary order by regularizing the least-squares estimate. The developed method utilizes fast-rate inputs and slow-rate outputs and identifies fast-rate models accurately in a single identification experiment. Finally, both simulation and experimental validation on a prototype wafer stage demonstrate the effectiveness of the framework.
♻ ☆ Matrix Pencil-Based Analysis of Multirate Simulation Schemes
This paper focuses on multirate time-domain simulations of power system models. It proposes a matrix pencil-based approach to evaluate the spurious numerical deformation introduced into power system dynamics by a given multirate integration scheme. Moreover, it considers the problem of multirate partitioning and discusses a strategy for allocating state and algebraic variables to fast and slow subsystems based on modal participation factors (PFs). The suitability and features of the proposed approach are illustrated through numerical simulations that assess the accuracy effects of interfacing, as well as of various prediction and solution methods.
♻ ☆ From Data to Control: A Formal Compositional Framework for Large-Scale Interconnected Networks
We introduce a compositional data-driven methodology with noisy data for designing fully-decentralized safety controllers applicable to large-scale interconnected networks, encompassing a vast number of subsystems with unknown mathematical models. Our compositional scheme leverages the interconnection topology and breaks down the network analysis into the examination of distinct subsystems. This is accompanied by utilizing a concept of control storage certificates (CSCs) to capture joint dissipativity-type properties among subsystems. These CSCs are instrumental in a compositional derivation of a control barrier certificate (CBC) specialized for the interconnected network, thereby ensuring its safety. In our data-driven scheme, we gather only a single noise-corrupted input-state trajectory from each unknown subsystem within a specified time frame. By fulfilling a specific rank condition, this process facilitates the construction of a CSC for each subsystem. Following this, by adhering to compositional dissipativity reasoning, we compose CSCs derived from noisy data and build a CBC for the unknown network, ensuring its safety over an infinite time horizon, while providing correctness guarantees. We demonstrate that our compositional data-driven approach significantly enhances the design of a CBC and its robust safety controller under noisy data across the interconnected network. This advancement is achieved by reducing the computational complexity from a polynomial growth in relation to network dimension, when using sum-of-squares (SOS) optimization, to a linear scale based on the number of subsystems. We apply our data-driven findings to a variety of benchmarks, involving physical networks with unknown models and diverse interconnection topologies.
♻ ☆ From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks
In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. Different from the conventional aerial communication systems, LAWN's distinctive feature is its tight integration of functional planes in which multiple functionalities continually reshape themselves to operate safely and efficiently in the low-altitude sky. With the LAWN, we discuss several enabling technologies, such as integrated sensing and communication (ISAC), semantic communication, and fully-actuated control systems. Finally, we identify potential applications and key cross-layer challenges. This article offers a comprehensive roadmap for future research and development in the low-altitude airspace.
comment: 10 pages, 5 figures
Optimization and Control 23
☆ Discrete time shadow price revisited
In the paper discrete time shadow price is constructed for the market with several assets with given bid and ask prices. Shadow price is the price such that the problem of optimal utility from terminal wealth on the market without transaction costs gives the same value function as in the case of bid and ask prices. In the paper we solve first static problem for two assets and then we construct the shadow price for dynamic model. Finally we present construction of the shadow price for several assets.
☆ Robust Hedging of American Options via Aggregated Snell Envelopes
We construct an aggregator for a family of Snell envelopes in a nondominated framework. We apply this construction to establish a robust hedging duality, along with the existence of a minimal hedging strategy, in a general semi-martingale setting for American-style options. Our results encompass continuous processes, or processes with jumps and non-vanishing diffusion. A key application is to financial market models, where uncertainty is quantified through the semi-martingale characteristics.
comment: 22 pages
☆ Network-Independent Incremental Passivity Conditions for Grid-Forming Inverter Control
Grid-forming inverters control the power transfer between the AC and DC sides of an electrical grid while maintaining the frequency and voltage of the AC side. This paper focuses on ensuring large-signal stability of an electrical grid with inverter-interfaced renewable sources. We prove that the Hybrid-Angle Control (HAC) scheme for grid-forming inverters can exhibit incremental passivity properties between current and voltage at both the AC and DC ports. This incremental passivity can be certified through decentralized conditions. Inverters operating under HAC can, therefore, be connected to other passive elements (e.g. transmission lines) with an immediate guarantee of global transient stability regardless of the network topology or parameters. Passivity of Hybrid Angle Control is also preserved under small-signal (linearized) analyses, in contrast to conventional proportional droop laws that are passivity-short at low frequencies. Passivity and interconnected-stability properties are demonstrated through an example case study.
comment: 7 pages, 4 figures
☆ Collaborative Charging Scheduling via Balanced Bounding Box Methods
Electric mobility faces several challenges, most notably the high cost of infrastructure development and the underutilization of charging stations. The concept of shared charging offers a promising solution. The paper explores sustainable urban logistics through horizontal collaboration between two fleet operators and addresses a scheduling problem for the shared use of charging stations. To tackle this, the study formulates a collaborative scheduling problem as a bi-objective nonlinear integer programming model, in which each company aims to minimize its own costs, creating inherent conflicts that require trade-offs. The Balanced Bounding Box Methods (B3Ms) are introduced in order to efficiently derive the efficient frontier, identifying a reduced set of representative solutions. These methods enhance computational efficiency by selectively disregarding closely positioned and competing solutions, preserving the diversity and representativeness of the solutions over the efficient frontier. To determine the final solution and ensure balanced collaboration, cooperative bargaining methods are applied. Numerical case studies demonstrate the viability and scalability of the developed methods, showing that the B3Ms can significantly reduce computational time while maintaining the integrity of the frontier. These methods, along with cooperative bargaining, provide an effective framework for solving various bi-objective optimization problems, extending beyond the collaborative scheduling problem presented here.
☆ Projected integral control of impedance passive nonlinear systems
We propose an abstract framework for solving the constrained set-point tracking problem for impedance passive infinite-dimensional nonlinear systems. The class of systems considered is governed by monotone differential inclusions and allows us to exploit the theory of contraction semigroups. To account for possible operational constraints, e.g., bounds on the input, we replace a classical integral controller with a projected integral controller. This guarantees that the integrator state remains in a given closed convex set, where said constraints are satisfied. We showcase our results through three case studies.
☆ Towards Robust Learning to Optimize with Theoretical Guarantees CVPR 2024
Learning to optimize (L2O) is an emerging technique to solve mathematical optimization problems with learning-based methods. Although with great success in many real-world scenarios such as wireless communications, computer networks, and electronic design, existing L2O works lack theoretical demonstration of their performance and robustness in out-of-distribution (OOD) scenarios. We address this gap by providing comprehensive proofs. First, we prove a sufficient condition for a robust L2O model with homogeneous convergence rates over all In-Distribution (InD) instances. We assume an L2O model achieves robustness for an InD scenario. Based on our proposed methodology of aligning OOD problems to InD problems, we also demonstrate that the L2O model's convergence rate in OOD scenarios will deteriorate by an equation of the L2O model's input features. Moreover, we propose an L2O model with a concise gradient-only feature construction and a novel gradient-based history modeling method. Numerical simulation demonstrates that our proposed model outperforms the state-of-the-art baseline in both InD and OOD scenarios and achieves up to 10 $\times$ convergence speedup. The code of our method can be found from https://github.com/NetX-lab/GoMathL2O-Official.
comment: Published in CVPR 2024, 55 pages, 17 figures, this version fixed some typo
☆ Global-in-time optimal control of stochastic third-grade fluids
In this article, we address the velocity tracking control problem for a class of stochastic non-Newtonian fluids. More precisely, we consider the stochastic third-grade fluid equation perturbed by infinite-dimensional additive white noise and defined on the two-dimensional torus $\mathbb{T}^2$. The control acts as a distributed random external force. Taking an \emph{infinite-dimensional Ornstein-Uhlenbeck process}, the stochastic system is converted into an equivalent pathwise deterministic one, which allows to show the well-posedness of the original stochastic system globally in time. The state being a stochastic process with sample paths in $\mathrm{L}^\infty(0,T;\mathbb{H}^3(\mathbb{T}^2))$ and finite moments can be controlled in an optimal way. Namely, we establish the existence and uniqueness of solutions to the corresponding linearized state and adjoint equations. Furthermore, we derive an appropriate stability result for the state equation and verify that the G\^ateaux derivative of the control-to-state mapping coincides with the solution of the linearized state equation. Finally, we establish the first-order optimality conditions and prove the existence of an optimal solution.
♻ ☆ Optimal Transport for Probabilistic Circuits
We introduce a novel optimal transport framework for probabilistic circuits (PCs). While it has been shown recently that divergences between distributions represented as certain classes of PCs can be computed tractably, to the best of our knowledge, there is no existing approach to compute the Wasserstein distance between probability distributions given by PCs. We propose a Wasserstein-type distance that restricts the coupling measure of the associated optimal transport problem to be a probabilistic circuit. We then develop an algorithm for computing this distance by solving a series of small linear programs and derive the circuit conditions under which this is tractable. Furthermore, we show that we can easily retrieve the optimal transport plan between the PCs from the solutions to these linear programs. Lastly, we study the empirical Wasserstein distance between a PC and a dataset, and show that we can estimate the PC parameters to minimize this distance through an efficient iterative algorithm.
♻ ☆ A Convex-Nonconvex Framework for Enhancing Minimization Induced Penalties
This paper presents a novel framework for nonconvex enhancement of minimization induced (MI) penalties while preserving the overall convexity of associated regularization models. MI penalties enable the adaptation to certain signal structures via minimization, but often underestimate significant components owing to convexity. To overcome this shortcoming, we design a generalized Moreau enhanced minimization induced (GME-MI) penalty by subtracting from the MI penalty its generalized Moreau envelope. While the proposed GME-MI penalty is nonconvex in general, we derive an overall convexity condition for the GME-MI regularized least-squares model. Moreover, we present a proximal splitting algorithm with guaranteed convergence to a globally optimal solution of the GME-MI model under the overall convexity condition. Numerical examples illustrate the effectiveness of the proposed framework.
♻ ☆ Stochastic Graphon Games with Jumps and Approximate Nash Equilibria
We study continuous stochastic games with heterogeneous mean field interactions and jumps on large networks and explore their limit counterparts. We introduce the graphon game model based on a controlled graphon mean field stochastic differential equation system with jumps, which can be regarded as the limiting case of a finite game dynamic system as the number of players goes to infinity. We examine the case of controlled dynamics, with control terms present in the drift, diffusion, and jump components. We focus on the study of Markovian controls and concentrate on the limit theory. We provide convergence results on the state trajectories and their laws, transitioning from finite game systems to graphon systems. We also study approximate equilibria for finite games on large networks, using the graphon equilibrium as a benchmark. The rates of convergence are analyzed under various underlying graphon models and regularity assumptions.
comment: 32 pages
♻ ☆ Fast Switching in Mixed-Integer Model Predictive Control
We derive stability results for finite control set and mixed-integer model predictive control with a downstream oversampling phase. The presentation rests upon the inherent robustness of model predictive control with stabilizing terminal conditions and techniques for solving mixed-integer optimal control problems by continuous optimization. Partial outer convexification and binary relaxation transform mixed-integer problems into common optimal control problems. We deduce nominal asymptotic stability for the resulting relaxed system formulation and implement sum-up rounding to restore efficiently integer feasibility on an oversampling time grid. If fast control switching is technically possible and inexpensive, we can approximate the relaxed system behavior in the state space arbitrarily close. We integrate input perturbed model predictive control with practical asymptotic stability. Numerical experiments illustrate practical relevance of fast control switching.
comment: This preprint was revised based on the feedback from the reviewers and resubmitted to the IEEE
♻ ☆ Five Starter Problems: Solving Quadratic Unconstrained Binary Optimization Models on Quantum Computers
This tutorial offers a quick, hands-on introduction to solving Quadratic Unconstrained Binary Optimization (QUBO) models on currently available quantum computers and their simulators. We cover both IBM and D-Wave machines: IBM utilizes a gate-circuit architecture, and D-Wave is a quantum annealer. We provide examples of three canonical problems and two models from practical applications. The tutorial is structured to bridge the gap between theory and practice: we begin with an overview of QUBOs, explain their relevance and connection to quantum algorithms, introduce key quantum computing concepts, provide the foundations for two quantum heuristics, and provide detailed implementation guides. An associated GitHub repository provides the codes in five companion notebooks. In addition to reaching undergraduate and graduate students in computationally intensive disciplines, this article aims to reach working industry professionals seeking to explore the potential of near-term quantum applications. As our title indicates, this tutorial is intended to be a starting point in a journey towards solving more complex QUBOs on quantum computers.
comment: 47 pages, 13 figures, 2 tables
♻ ☆ Low-Rank Thinning
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.
♻ ☆ Distributionally robust Kalman filtering with volatility uncertainty
This work presents a distributionally robust Kalman filter to address uncertainties in noise covariance matrices and predicted covariance estimates. We adopt a distributionally robust formulation using bicausal optimal transport to characterize a set of plausible alternative models. The optimization problem is transformed into a convex nonlinear semi-definite programming problem and solved using the trust-region interior point method with the aid of $LDL^\top$ decomposition. The empirical outperformance is demonstrated through target tracking and pairs trading.
comment: Final version
♻ ☆ Perturbation analysis of a class of composite optimization problems
In this paper, we study the perturbation analysis of a class of composite optimization problems, which is a very convenient and unified framework for developing both theoretical and algorithmic issues of constrained optimization problems. The underlying theme of this paper is very important in both theoretical and computational study of optimization problems. Under some mild assumptions on the objective function, we provide a definition of a strong second order sufficient condition (SSOSC) for the composite optimization problem and also prove that the following conditions are equivalent to each other: the SSOSC and the nondegeneracy condition, the nonsingularity of Clarke's generalized Jacobian of the nonsmooth system at a Karush-Kuhn-Tucker (KKT) point, and the strong regularity of the KKT point. These results provide an important way to characterize the stability of the KKT point. As for the convex composite optimization problem, which is a special case of the general problem, we establish the equivalence between the primal/dual second order sufficient condition and the dual/primal strict Robinson constraint qualification, the equivalence between the primal/dual SSOSC and the dual/primal nondegeneracy condition. Moreover, we prove that the dual nondegeneracy condition and the nonsingularity of Clarke's generalized Jacobian of the subproblem corresponding to the augmented Lagrangian method are also equivalent to each other. These theoretical results lay solid foundation for designing an efficient algorithm.
comment: 41 pages
♻ ☆ Learning Spatially Adaptive $\ell_1$-Norms Weights for Convolutional Synthesis Regularization
We propose an unrolled algorithm approach for learning spatially adaptive parameter maps in the framework of convolutional synthesis-based $\ell_1$ regularization. More precisely, we consider a family of pre-trained convolutional filters and estimate deeply parametrized spatially varying parameters applied to the sparse feature maps by means of unrolling a FISTA algorithm to solve the underlying sparse estimation problem. The proposed approach is evaluated for image reconstruction of low-field MRI and compared to spatially adaptive and non-adaptive analysis-type procedures relying on Total Variation regularization and to a well-established model-based deep learning approach. We show that the proposed approach produces visually and quantitatively comparable results with the latter approaches and at the same time remains highly interpretable. In particular, the inferred parameter maps quantify the local contribution of each filter in the reconstruction, which provides valuable insight into the algorithm mechanism and could potentially be used to discard unsuited filters.
comment: Accepted for publication in the proceedings of the EUSIPCO 2025 conference
♻ ☆ Accelerating RLHF Training with Reward Variance Increase
Reinforcement learning from human feedback (RLHF) is an essential technique for ensuring that large language models (LLMs) are aligned with human values and preferences during the post-training phase. As an effective RLHF approach, group relative policy optimization (GRPO) has demonstrated success in many LLM-based applications. However, efficient GRPO-based RLHF training remains a challenge. Recent studies reveal that a higher reward variance of the initial policy model leads to faster RLHF training. Inspired by this finding, we propose a practical reward adjustment model to accelerate RLHF training by provably increasing the reward variance and preserving the relative preferences and reward expectation. Our reward adjustment method inherently poses a nonconvex optimization problem, which is NP-hard to solve in general. To overcome the computational challenges, we design a novel $O(n \log n)$ algorithm to find a global solution of the nonconvex reward adjustment model by explicitly characterizing the extreme points of the feasible set. As an important application, we naturally integrate this reward adjustment model into the GRPO algorithm, leading to a more efficient GRPO with reward variance increase (GRPOVI) algorithm for RLHF training. As an interesting byproduct, we provide an indirect explanation for the empirical effectiveness of GRPO with rule-based reward for RLHF training, as demonstrated in DeepSeek-R1. Experiment results demonstrate that the GRPOVI algorithm can significantly improve the RLHF training efficiency compared to the original GRPO algorithm.
♻ ☆ Does DQN Learn?
A primary requirement for any reinforcement learning method is that it should produce policies that improve upon the initial guess. In this work, we show that the widely used Deep Q-Network (DQN) fails to satisfy this minimal criterion -- even when it gets to see all possible states and actions infinitely often (a condition under which tabular Q-learning is guaranteed to converge to the optimal Q-value function). Our specific contributions are twofold. First, we numerically show that DQN often returns a policy that performs worse than the initial one. Second, we offer a theoretical explanation for this phenomenon in linear DQN, a simplified version of DQN that uses linear function approximation in place of neural networks while retaining the other key components such as $\epsilon$-greedy exploration, experience replay, and target network. Using tools from differential inclusion theory, we prove that the limit points of linear DQN correspond to fixed points of projected Bellman operators. Crucially, we show that these fixed points need not relate to optimal -- or even near-optimal -- policies, thus explaining linear DQN's sub-optimal behaviors. We also give a scenario where linear DQN always identifies the worst policy. Our work fills a longstanding gap in understanding the convergence behaviors of Q-learning with function approximation and $\epsilon$-greedy exploration.
comment: 24 pages, 3 figures
♻ ☆ Revealed Information
An analyst observes the frequency with which a decision maker (DM) takes actions, but not the frequency conditional on payoff-relevant states. We ask when the analyst can rationalize the DM's choices as if the DM first learns something about the state before acting. We provide a support-function characterization of the triples of utility functions, prior beliefs, and (marginal) distributions over actions such that the DM's action distribution is consistent with information given the DM's prior and utility function. Assumptions on the cardinality of the state space and the utility function allow us to refine this characterization, obtaining a sharp system of finitely many inequalities the utility function, prior, and action distribution must satisfy. We apply our characterization to study comparative statics and to identify conditions under which a single information structure rationalizes choices across multiple decision problems. We characterize the set of distributions over posterior beliefs that are consistent with the DM's choices. We extend our results to settings with a continuum of actions and states assuming the first-order approach applies, and to simple multi-agent settings.
♻ ☆ Stochastic Bregman Proximal Gradient Method Revisited: Kernel Conditioning and Painless Variance Reduction
We investigate Bregman proximal gradient (BPG) methods for solving nonconvex composite stochastic optimization problems. Instead of the standard gradient Lipschitz continuity (GLC) assumption, the objective function only satisfies a smooth-adaptability assumption w.r.t. some kernel function. An in-depth analysis of the stationarity measure is made in this paper, where we reveal an interesting fact that the widely adopted Bregman proximal gradient mapping in the existing works may not correctly depict the near stationarity of the solutions. To resolve this issue, a new Bregman proximal gradient mapping is proposed and analyzed in this paper. Second, a thorough analysis is made on the sample complexities of the stochastic Bregman proximal gradient methods under both the old and the newly proposed gradient mappings are analyzed. Note that the absence of GLC disables the standard analysis of the stochastic variance reduction techniques, existing stochastic BPG methods only obtain an $O(\epsilon^{-2})$ sample complexity under the old gradient mapping, we show that such a limitation in the existing analyses mainly comes from the insufficient exploitation of the kernel's properties. By proposing a new kernel-conditioning regularity assumption on the kernel, we show that a simple epoch bound mechanism is enough to enable all the existing variance reduction techniques for stochastic BPG methods. Combined with a novel probabilistic argument, we show that there is a high probability event conditioning on which $O(\sqrt{n}\epsilon^{-1})$ sample complexity for the finite-sum stochastic setting can be derived for the old gradient mapping. Moreover, with a novel adaptive step size control mechanism, we also show an $\tilde{O}(\sqrt{n}L_h(\mathcal{X}_\epsilon)\epsilon^{-1})$ complexity under the new gradient mapping.
♻ ☆ Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems
Despite being a key bottleneck in many machine learning tasks, the cost of solving large linear systems has proven challenging to quantify due to problem-dependent quantities such as condition numbers. To tackle this, we consider a fine-grained notion of complexity for solving linear systems, which is motivated by applications where the data exhibits low-dimensional structure, including spiked covariance models and kernel machines, and when the linear system is explicitly regularized, such as ridge regression. Concretely, let $\kappa_\ell$ be the ratio between the $\ell$th largest and the smallest singular value of $n\times n$ matrix $A$. We give a stochastic algorithm based on the Sketch-and-Project paradigm, that solves the linear system $Ax = b$, that is, finds $\bar{x}$ such that $\|A\bar{x} - b\| \le \epsilon \|b\|$, in time $\bar O(\kappa_\ell\cdot n^2\log 1/\epsilon)$, for any $\ell = O(n^{0.729})$. This is a direct improvement over preconditioned conjugate gradient, and it provides a stronger separation between stochastic linear solvers and algorithms accessing $A$ only through matrix-vector products. Our main technical contribution is the new analysis of the first and second moments of the random projection matrix that arises in Sketch-and-Project.
♻ ☆ Parallelized Conflict Graph Cut Generation
A conflict graph represents logical relations between binary variables, and effective use of the graph can significantly accelerate branch-and-cut solvers for mixed-integer programming (MIP). In this paper we develop efficient parallel conflict graph management: conflict detection; maximal clique generation; clique extension; and clique merging. We leverage parallel computing in order to intensify computational effort on the conflict graph, thereby generating a much larger pool of cutting planes than what can be practically achieved in serial. Computational experiments demonstrate that the expanded pool of cuts enabled by parallel computing lead to substantial reductions in total MIP solve time, especially for more challenging cases.
comment: 25 pages, 3 figures
♻ ☆ Faster Acceleration for Steepest Descent
Recent advances (Sherman, 2017; Sidford and Tian, 2018; Cohen et al., 2021) have overcome the fundamental barrier of dimension dependence in the iteration complexity of solving $\ell_\infty$ regression with first-order methods. Yet it remains unclear to what extent such acceleration can be achieved for general $\ell_p$ smooth functions. In this paper, we propose a new accelerated first-order method for convex optimization under non-Euclidean smoothness assumptions. In contrast to standard acceleration techniques, our approach uses primal-dual iterate sequences taken with respect to $\textit{differing}$ norms, which are then coupled using an $\textit{implicitly}$ determined interpolation parameter. For $\ell_p$ norm smooth problems in $d$ dimensions, our method provides an iteration complexity improvement of up to $O(d^{1-\frac{2}{p}})$ in terms of calls to a first-order oracle, thereby allowing us to circumvent long-standing barriers in accelerated non-Euclidean steepest descent.
comment: Published in The 38th Annual Conference on Learning Theory (COLT 2025)
Systems and Control 30
☆ A Stochastic Differential Equation Framework for Modeling Queue Length Dynamics Inspired by Self-Similarity
This article develops a stochastic differential equation (SDE) for modeling the temporal evolution of queue length dynamics at signalized intersections. Inspired by the observed quasiperiodic and self-similar characteristics of the queue length dynamics, the proposed model incorporates three properties into the SDE: (i) mean reversion with periodic mean, (ii) multiplicative noise, and (iii) fractional Brownian motion. It replicates key statistical features observed in real data, including the probability distribution function (PDF) and PSD of queue lengths. To our knowledge, this is the first equation-based model for queue dynamics. The proposed approach offers a transparent, data-consistent framework that may help inform and enhance the design of black-box learning algorithms with underlying traffic physics.
☆ Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning
Offline reinforcement learning promises policy improvement from logged interaction data alone, yet state-of-the-art algorithms remain vulnerable to value over-estimation and to violations of domain knowledge such as monotonicity or smoothness. We introduce implicit constraint-aware off-policy correction, a framework that embeds structural priors directly inside every Bellman update. The key idea is to compose the optimal Bellman operator with a proximal projection on a convex constraint set, which produces a new operator that (i) remains a $\gamma$-contraction, (ii) possesses a unique fixed point, and (iii) enforces the prescribed structure exactly. A differentiable optimization layer solves the projection; implicit differentiation supplies gradients for deep function approximators at a cost comparable to implicit Q-learning. On a synthetic Bid-Click auction -- where the true value is provably monotone in the bid -- our method eliminates all monotonicity violations and outperforms conservative Q-learning and implicit Q-learning in return, regret, and sample efficiency.
☆ Quadrotor Morpho-Transition: Learning vs Model-Based Control Strategies
Quadrotor Morpho-Transition, or the act of transitioning from air to ground through mid-air transformation, involves complex aerodynamic interactions and a need to operate near actuator saturation, complicating controller design. In recent work, morpho-transition has been studied from a model-based control perspective, but these approaches remain limited due to unmodeled dynamics and the requirement for planning through contacts. Here, we train an end-to-end Reinforcement Learning (RL) controller to learn a morpho-transition policy and demonstrate successful transfer to hardware. We find that the RL control policy achieves agile landing, but only transfers to hardware if motor dynamics and observation delays are taken into account. On the other hand, a baseline MPC controller transfers out-of-the-box without knowledge of the actuator dynamics and delays, at the cost of reduced recovery from disturbances in the event of unknown actuator failures. Our work opens the way for more robust control of agile in-flight quadrotor maneuvers that require mid-air transformation.
☆ Safe Domains of Attraction for Discrete-Time Nonlinear Systems: Characterization and Verifiable Neural Network Estimation
Analysis of nonlinear autonomous systems typically involves estimating domains of attraction, which have been a topic of extensive research interest for decades. Despite that, accurately estimating domains of attraction for nonlinear systems remains a challenging task, where existing methods are conservative or limited to low-dimensional systems. The estimation becomes even more challenging when accounting for state constraints. In this work, we propose a framework to accurately estimate safe (state-constrained) domains of attraction for discrete-time autonomous nonlinear systems. In establishing this framework, we first derive a new Zubov equation, whose solution corresponds to the exact safe domain of attraction. The solution to the aforementioned Zubov equation is shown to be unique and continuous over the whole state space. We then present a physics-informed approach to approximating the solution of the Zubov equation using neural networks. To obtain certifiable estimates of the domain of attraction from the neural network approximate solutions, we propose a verification framework that can be implemented using standard verification tools (e.g., $\alpha,\!\beta$-CROWN and dReal). To illustrate its effectiveness, we demonstrate our approach through numerical examples concerning nonlinear systems with state constraints.
☆ Observer Switching Strategy for Enhanced State Estimation in CSTR Networks
Accurate state estimation is essential for monitoring and controlling nonlinear chemical reactors, such as continuous stirred-tank reactors (CSTRs), where limited sensor coverage and process uncertainties hinder real-time observability. This paper introduces a novel multi-observer switching framework that combines several advanced estimators: Extended Luenberger Observer (ELO), Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), Quadrature Kalman Filter (QKF), and Particle Filter (PF), operating in parallel. At each sampling instant, a cost function based on the $L_1$ norm and Kullback-Leibler divergence selects the observer, yielding the best agreement with available measurements. The proposed architecture is validated through simulations of both linearized and fully nonlinear CSTR models with up to three reactors in series. Results show that the switching strategy significantly reduces estimation errors compared to single-observer approaches, especially under partial observability and parametric uncertainty. Despite its modular design, the framework remains computationally tractable, making it suitable for real-time industrial applications such as fault detection and model-based predictive control.
comment: This work has been submitted to the ISA Transactions for possible publication. 18 pages, 10 Figures
☆ Deceptive Path Planning: A Bayesian Game Approach
This paper investigates how an autonomous agent can transmit information through its motion in an adversarial setting. We consider scenarios where an agent must reach its goal while deceiving an intelligent observer about its destination. We model this interaction as a dynamic Bayesian game between a mobile Attacker with a privately known goal and a Defender who infers the Attacker's intent to allocate defensive resources effectively. We use Perfect Bayesian Nash Equilibrium (PBNE) as our solution concept and propose a computationally efficient approach to find it. In the resulting equilibrium, the Defender employs a simple Markovian strategy, while the Attacker strategically balances deception and goal efficiency by stochastically mixing shortest and non-shortest paths to manipulate the Defender's beliefs. Numerical experiments demonstrate the advantages of our PBNE-based strategies over existing methods based on one-sided optimization.
comment: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication
☆ Parallel Branch Model Predictive Control on GPUs
We present a parallel GPU-accelerated solver for branch Model Predictive Control problems. Based on iterative LQR methods, our solver exploits the tree-sparse structure and implements temporal parallelism using the parallel scan algorithm. Consequently, the proposed solver enables parallelism across both the prediction horizon and the scenarios. In addition, we utilize an augmented Lagrangian method to handle general inequality constraints. We compare our solver with state-of-the-art numerical solvers in two automated driving applications. The numerical results demonstrate that, compared to CPU-based solvers, our solver achieves competitive performance for problems with short horizons and small-scale trees, while outperforming other solvers on large-scale problems.
comment: 12 pages, 9 figures
☆ A Hybrid Artificial Intelligence Method for Estimating Flicker in Power Systems
This paper introduces a novel hybrid AI method combining H filtering and an adaptive linear neuron network for flicker component estimation in power distribution systems.The proposed method leverages the robustness of the H filter to extract the voltage envelope under uncertain and noisy conditions followed by the use of ADALINE to accurately identify flicker frequencies embedded in the envelope.This synergy enables efficient time domain estimation with rapid convergence and noise resilience addressing key limitations of existing frequency domain approaches.Unlike conventional techniques this hybrid AI model handles complex power disturbances without prior knowledge of noise characteristics or extensive training.To validate the method performance we conduct simulation studies based on IEC Standard 61000 4 15 supported by statistical analysis Monte Carlo simulations and real world data.Results demonstrate superior accuracy robustness and reduced computational load compared to Fast Fourier Transform and Discrete Wavelet Transform based estimators.
comment: 31 pages, 12 figures, and 6 tables
☆ BattBee: Equivalent Circuit Modeling and Early Detection of Thermal Runaway Triggered by Internal Short Circuits for Lithium-Ion Batteries
Lithium-ion batteries are the enabling power source for transportation electrification. However, in real-world applications, they remain vulnerable to internal short circuits (ISCs) and the consequential risk of thermal runaway (TR). Toward addressing the challenge of ISCs and TR, we undertake a systematic study that extends from dynamic modeling to fault detection in this paper. First, we develop {\em BattBee}, the first equivalent circuit model to specifically describe the onset of ISCs and the evolution of subsequently induced TR. Drawing upon electrochemical modeling, the model can simulate ISCs at different severity levels and predict their impact on the initiation and progression of TR events. With the physics-inspired design, this model offers strong physical interpretability and predictive accuracy, while maintaining structural simplicity to allow fast computation. Then, building upon the BattBee model, we develop fault detection observers and derive detection criteria together with decision-making logics to identify the occurrence and emergence of ISC and TR events. This detection approach is principled in design and fast in computation, lending itself to practical applications. Validation based on simulations and experimental data demonstrates the effectiveness of both the BattBee model and the ISC/TR detection approach. The research outcomes underscore this study's potential for real-world battery safety risk management.
comment: 19 pages, 15 figures, 2 tables
☆ Hybrid Polynomial Zonotopes: A Set Representation for Reachability Analysis in Hybrid Nonaffine Systems
Reachability analysis for hybrid nonaffine systems remains computationally challenging, as existing set representations--including constrained, polynomial, and hybrid zonotopes--either lose tightness under high-order nonaffine maps or suffer exponential blow-up after discrete jumps. This paper introduces Hybrid Polynomial Zonotope (HPZ), a novel set representation that combines the mode-dependent generator structure of hybrid zonotopes with the algebraic expressiveness of polynomial zonotopes. HPZs compactly encode non-convex reachable states across modes by attaching polynomial exponents to each hybrid generator, enabling precise capture of high-order state-input couplings without vertex enumeration. We develop a comprehensive library of HPZ operations, including Minkowski sum, linear transformation, and intersection. Theoretical analysis and computational experiments demonstrate that HPZs achieve superior tightness preservation and computational efficiency compared to existing approaches for hybrid system reachability analysis.
comment: 9 pages
☆ A Survey on Imitation Learning for Contact-Rich Tasks in Robotics
This paper comprehensively surveys research trends in imitation learning for contact-rich robotic tasks. Contact-rich tasks, which require complex physical interactions with the environment, represent a central challenge in robotics due to their nonlinear dynamics and sensitivity to small positional deviations. The paper examines demonstration collection methodologies, including teaching methods and sensory modalities crucial for capturing subtle interaction dynamics. We then analyze imitation learning approaches, highlighting their applications to contact-rich manipulation. Recent advances in multimodal learning and foundation models have significantly enhanced performance in complex contact tasks across industrial, household, and healthcare domains. Through systematic organization of current research and identification of challenges, this survey provides a foundation for future advancements in contact-rich robotic manipulation.
comment: 47pages, 1 figures
☆ High-gain model-following control for trajectory tracking
We consider trajectory tracking for minimum-phase nonlinear systems in Byrnes-Isidori form using the model-following control (MFC) architecture. The tracking problem is motivated by a hierarchical control concept where a higher-level instance provides the reference trajectory at run-time. We present a computational efficient implementation of the feedback linearisation MFC design, and apply high-gain feedback in the process control loop (PCL) to achieve practical tracking in presence of Lipschitz perturbations. Our main results establish ultimate boundedness of the tracking error and give a constructive bound for the high-gain scaling parameter to achieve arbitrary tracking precision. Further we establish that the peaking phenomenon can be attenuated using MFC. We demonstrate the results via an automotive case study considering advanced engine-based cruise control.
☆ Delayed Expansion AGT: Kinodynamic Planning with Application to Tractor-Trailer Parking
Kinodynamic planning of articulated vehicles in cluttered environments faces additional challenges arising from high-dimensional state space and complex system dynamics. Built upon [1],[2], this work proposes the DE-AGT algorithm that grows a tree using pre-computed motion primitives (MPs) and A* heuristics. The first feature of DE-AGT is a delayed expansion of MPs. In particular, the MPs are divided into different modes, which are ranked online. With the MP classification and prioritization, DE-AGT expands the most promising mode of MPs first, which eliminates unnecessary computation and finds solutions faster. To obtain the cost-to-go heuristic for nonholonomic articulated vehicles, we rely on supervised learning and train neural networks for fast and accurate cost-to-go prediction. The learned heuristic is used for online mode ranking and node selection. Another feature of DE-AGT is the improved goal-reaching. Exactly reaching a goal state usually requires a constant connection checking with the goal by solving steering problems -- non-trivial and time-consuming for articulated vehicles. The proposed termination scheme overcomes this challenge by tightly integrating a light-weight trajectory tracking controller with the search process. DE-AGT is implemented for autonomous parking of a general car-like tractor with 3-trailer. Simulation results show an average of 10x acceleration compared to a previous method.
☆ A Model-Free Detection Method for Internal Short Circuits in Single Lithium-ion Cells Using Pseudo Open-Circuit Voltage Difference
This letter proposes a lightweight, model-free online diagnostic framework for detecting internal short circuits (ISC) in single lithium-ion cells under dynamic operating conditions. The core of the method lies in computing the first-order difference of pseudo open-circuit voltage ($\boldsymbol{\mathrm{OCV}_{\text{pseudo}}}$) to extract high-frequency deviations caused by ISC events from low-frequency polarization variations. The method relies solely on terminal voltage, current measurements, and an offline $R_0$--SOC look-up table, thereby eliminating the need for electrochemical or equivalent-circuit observers. Validated on ten real and one false fault scenarios, the proposed approach achieves a 100\% detection success rate with no missed or false alarms. In addition, the proposed method exhibits extremely low computational and memory requirements, making it highly suitable for real-time deployment in battery management systems (BMS).
☆ Voltage Stability of Inverter-Based Systems: Impact of Parameters and Irrelevance of Line Dynamics
This paper investigates voltage stability in inverter-based power systems concerning fold and saddle-node bifurcations. An analytical expression is derived for the sensitivity of the stability margin using the normal vector to the bifurcation hypersurface. Such information enables efficient identification of effective control parameters in mitigating voltage instability. Comprehensive analysis reveals that reactive loading setpoint and current controller's feedforward gain are the most influential parameters for enhancing voltage stability in a grid-following (GFL) inverter system, while the voltage controller's feedforward gain plays a dominant role in a grid-forming (GFM) inverter. Notably, both theoretical and numerical results demonstrate that transmission line dynamics have no impact on fold/saddle-node bifurcations in these systems. Results in this paper provide insights for efficient analysis and control in future inverter-dominated power systems through reductions in parameter space and model complexity.
comment: 8 pages, 6 figues
☆ Aggregating Inverter-Based Resources for Fast Frequency Response: A Nash Bargaining Game-Based Approach
This paper proposes a multi-objective optimization (MOO) approach for grid-level frequency regulation by aggregating inverter-based resources (IBRs). Virtual power plants (VPPs), acting as aggregators, can efficiently respond to dynamic response requirements from the grid. Through parametric modeling, grid-level frequency regulation requirements are accurately quantified and translated into a feasible parameter region defined by device-level parameters. Based on this feasible region, an MOO model is developed to address the conflicting demands of IBRs during frequency response. A Nash bargaining game-based approach is then employed to optimally allocate regulation requirements within the VPP, balancing the various demands of the IBRs. Numerical experiments demonstrate the effectiveness of the proposed method in enhancing frequency stability and improving coordination among IBRs.
comment: Accepted by the 2025 IEEE IAS Annual Meeting
☆ EPC Framework for BESS Projects
Battery Energy Storage Systems (BESS) are critical for modern power networks, supporting grid services such as frequency regulation, peak shaving, and black start. Delivering a BESS under an Engineering, Procurement, and Construction (EPC) model requires a concise methodology that balances regulatory compliance, technical details, and schedule efficiency. This paper presents a streamlined, five step EPC framework covering feasibility assessment, permitting, procurement, construction, and commissioning. A Danish demonstration (the BOSS project on Bornholm) serves as a case study.
comment: Submitted to a conference
☆ Stability and Performance of Online Feedback Optimization for Distribution Grid Flexibility
The integration of distributed energy resources (DERs) into sub-transmission systems has enabled new opportunities for flexibility provision in ancillary services such as frequency and voltage support, as well as congestion management. This paper investigates the stability and performance of Online Feedback Optimization (OFO) controllers in ensuring reliable flexibility provision. A hierarchical control architecture is proposed, emphasizing safe transitions between system states within the Feasible Operating Region (FOR). We evaluate the controller's stability and performance through simulations of transitions to the vertices of the FOR, analyzing the impact of tuning parameters. The study demonstrates that controller stability is sensitive to parameter tuning, particularly gain and sensitivity approximations. Results demonstrate that improper tuning can lead to oscillatory or unstable behavior, highlighting the need for systematic parameter selection to ensure reliable operation across the full flexibility range.
☆ RL-Guided MPC for Autonomous Greenhouse Control
The efficient operation of greenhouses is essential for enhancing crop yield while minimizing energy costs. This paper investigates a control strategy that integrates Reinforcement Learning (RL) and Model Predictive Control (MPC) to optimize economic benefits in autonomous greenhouses. Previous research has explored the use of RL and MPC for greenhouse control individually, or by using MPC as the function approximator for the RL agent. This study introduces the RL-Guided MPC framework, where a RL policy is trained and then used to construct a terminal cost and terminal region constraint for the MPC optimization problem. This approach leverages the ability to handle uncertainties of RL with MPC's online optimization to improve overall control performance. The RL-Guided MPC framework is compared with both MPC and RL via numerical simulations. Two scenarios are considered: a deterministic environment and an uncertain environment. Simulation results demonstrate that, in both environments, RL-Guided MPC outperforms both RL and MPC with shorter prediction horizons.
☆ Evaluation methodology of Model Predictive Controllers for building's energy systems
Climate change poses a serious threat to the Earth's ecosystems, fueled primarily by escalating greenhouse gas emissions. Among the main contributors, the building sector stands out due to its significant energy demand. Addressing this challenge requires innovative techniques in the control of energy systems in buildings. This paper deals with the formulation of a methodology designed to evaluate the performance of these controllers. The evaluation process involves the establishment of a comprehensive test protocol and a diverse set of scenarios to evaluate the controllers. Key performance indicators are used to quantify their effectiveness based on the test results. A practical case study is presented as an application to introduce this methodology, focusing on the integration of Model Predictive Controllers (MPCs) with the Dimosim thermal simulation platform. The digital twin of the Greener building in Grenoble is used as a model for emulation. The paper demonstrates the ability of the proposed methodology to test and rank MPCs in different test scenarios, providing valuable feedback on their performance capabilities. The paper highlights the importance of the developed approach in systematically evaluating and ranking MPCs for optimized building energy management.
comment: IBPSA France 2024, May 2024, La rochelle/ Ol{\'e}ron, France
☆ Online-Optimized Gated Radial Basis Function Neural Network-Based Adaptive Control
Real-time adaptive control of nonlinear systems with unknown dynamics and time-varying disturbances demands precise modeling and robust parameter adaptation. While existing neural network-based strategies struggle with computational inefficiency or inadequate temporal dependencies, this study proposes a hybrid control framework integrating a Temporal-Gated Radial Basis Function (TGRBF) network with a nonlinear robust controller. The TGRBF synergizes radial basis function neural networks (RBFNNs) and gated recurrent units (GRUs) through dynamic gating, enabling efficient offline system identification and online temporal modeling with minimal parameter overhead (14.5% increase vs. RBFNNs). During control execution, an event-triggered optimization mechanism activates momentum-explicit gradient descent to refine network parameters, leveraging historical data to suppress overfitting while maintaining real-time feasibility. Concurrently, the nonlinear controller adaptively tunes its gains via Jacobian-driven rules derived from the TGRBF model, ensuring rapid error convergence and disturbance rejection. Lyapunov-based analysis rigorously guarantees uniform ultimate boundedness of both tracking errors and adaptive parameters. Simulations on a nonlinear benchmark system demonstrate the framework's superiority: compared to PID and fixed-gain robust controllers, the proposed method reduces settling time by 14.2%, limits overshoot to 10%, and achieves 48.4% lower integral time-weighted absolute error under dynamic disturbances. By unifying data-driven adaptability with stability-guaranteed control, this work advances real-time performance in partially observable, time-varying industrial systems.
comment: 10 pages, 8 figures
☆ Cognitive Synergy Architecture: SEGO for Human-Centric Collaborative Robots
This paper presents SEGO (Semantic Graph Ontology), a cognitive mapping architecture designed to integrate geometric perception, semantic reasoning, and explanation generation into a unified framework for human-centric collaborative robotics. SEGO constructs dynamic cognitive scene graphs that represent not only the spatial configuration of the environment but also the semantic relations and ontological consistency among detected objects. The architecture seamlessly combines SLAM-based localization, deep-learning-based object detection and tracking, and ontology-driven reasoning to enable real-time, semantically coherent mapping.
♻ ☆ Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
We provide the first known algorithm that provably achieves $\varepsilon$-optimality within $\widetilde{\mathcal{O}}(1/\varepsilon)$ function evaluations for the discounted discrete-time LQR problem with unknown parameters, without relying on two-point gradient estimates. These estimates are known to be unrealistic in many settings, as they depend on using the exact same initialization, which is to be selected randomly, for two different policies. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates, which either leads to $\widetilde{\mathcal{O}}(1/\varepsilon^2)$ rates or heavily relies on stability assumptions.
♻ ☆ Linear Aggregate Model for Realizable Dispatch of Homogeneous Energy Storage
To optimize the dispatch of batteries, a model is required that can predict the state of energy (SOE) trajectory for a chosen open-loop power schedule to ensure admissibility (i.e., that schedule can be realized). However, battery dispatch optimization is inherently challenging when batteries cannot simultaneously charge and discharge, which begets a non-convex complementarity constraint. In this paper, we develop a novel composition of energy storage elements that can charge or discharge independently and provide a sufficient linear energy storage model of the composite battery. This permits convex optimization of the composite battery dispatch while ensuring the admissibility of the resulting (aggregated) power schedule and its disaggregation to the individual elements.
♻ ☆ TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load
This paper proposes TIP-Search, a time-predictable inference scheduling framework for real-time market prediction under uncertain workloads. Motivated by the strict latency demands in high-frequency financial systems, TIP-Search dynamically selects a deep learning model from a heterogeneous pool, aiming to maximize predictive accuracy while satisfying per-task deadline constraints. Our approach profiles latency and generalization performance offline, then performs online task-aware selection without relying on explicit input domain labels. We evaluate TIP-Search on three real-world limit order book datasets (FI-2010, Binance BTC/USDT, LOBSTER AAPL) and demonstrate that it outperforms static baselines with up to 8.5% improvement in accuracy and 100% deadline satisfaction. Our results highlight the effectiveness of TIP-Search in robust low-latency financial inference under uncertainty.
♻ ☆ Uncertainty-Aware Critic Augmentation for Hierarchical Multi-Agent EV Charging Control
The advanced bidirectional EV charging and discharging technology, aimed at supporting grid stability and emergency operations, has driven a growing interest in workplace applications. It not only reduces electricity expenses but also enhances the resilience in handling practical matters, such as peak power limitation, fluctuating energy prices, and unpredictable EV departures. Considering these factors systematically can benefit energy efficiency in office buildings and for EV users simultaneously. To employ AI to address these issues, we propose HUCA, a novel real-time charging control for regulating energy demands for both the building and EVs. HUCA employs hierarchical actor-critic networks to dynamically reduce electricity costs in buildings, accounting for the needs of EV charging in the dynamic pricing scenario. To tackle the uncertain EV departures, we introduce a new critic augmentation to account for departure uncertainties in evaluating the charging decisions, while maintaining the robustness of the charging control. Experiments on real-world electricity datasets under both simulated certain and uncertain departure scenarios demonstrate that HUCA outperforms baselines in terms of total electricity costs while maintaining competitive performance in fulfilling EV charging requirements. A case study also manifests that HUCA effectively balances energy supply between the building and EVs based on real-time information, showcasing its potential as a key AI-driven solution for vehicle charging control.
♻ ☆ Mixed Bernstein-Fourier Approximants for Optimal Trajectory Generation with Periodic Behavior
Efficient trajectory generation is critical for autonomous systems, yet current numerical methods often struggle to handle periodic behaviors effectively, especially when equidistant time nodes are required. This paper introduces a novel mixed Bernstein-Fourier approximation framework tailored explicitly for optimal motion planning. Our proposed methodology leverages the uniform convergence properties of Bernstein polynomials for nonperiodic behaviors while effectively capturing periodic dynamics through Fourier series. Theoretical results are established, including uniform convergence proofs for approximations of functions, derivatives, and integrals, as well as detailed error bound analyses. We further introduce a regulated least squares approach for determining approximation coefficients, enhancing numerical stability and practical applicability. Within an optimal control context, we establish feasibility and consistency of approximated solutions to their continuous counterparts. We also extend the covector mapping theorem, providing theoretical guarantees for approximating dual variables crucial in verifying the necessary optimality conditions from Pontryagin's Maximum Principle. Comprehensive numerical examples illustrate the method's superior performance, demonstrating substantial improvements in computational efficiency and precision in scenarios with complex periodic constraints and dynamics. Our mixed Bernstein-Fourier methodology thus presents a robust, theoretically grounded, and computationally efficient approach for advanced optimal trajectory planning in autonomous systems.
comment: 49 pages, 9 figures
♻ ☆ CONQURE: A Co-Execution Environment for Quantum and Classical Resources
Cutting edge classical computing today relies on a combination of CPU-based computing with a strong reliance on accelerators. In particular, high-performance computing (HPC) and machine learning (ML) rely heavily on acceleration via GPUs for numerical kernels. In the future, acceleration via quantum devices may complement GPUs for kernels where algorithms provide quantum advantage, i.e., significant speedups over classical algorithms. Computing with quantum kernels mapped onto quantum processing units (QPUs) requires seamless integration into HPC and ML. However, quantum offloading onto HPC/cloud lacks open-source software infrastructure. For classical algorithms, parallelization standards, such as OpenMP, MPI, or CUDA exist. In contrast, a lack of quantum abstractions currently limits the adoption of quantum acceleration in practical applications creating a gap between quantum algorithm development and practical HPC integration. Such integration needs to extend to efficient quantum offloading of kernels, which further requires scheduling of quantum resources, control of QPU kernel execution, tracking of QPU results, providing results to classical calling contexts and coordination with HPC scheduling. This work proposes CONQURE, a co-execution environment for quantum and classical resources. CONQURE is a fully open-source cloud queue framework that presents a novel modular scheduling framework allowing users to offload OpenMP quantum kernels to QPUs as quantum circuits, to relay results back to calling contexts in classical computing, and to schedule quantum resources via our CONQURE API. We show our API has a low overhead averaging 12.7ms in our tests, and we demonstrate functionality on an ion-trap device. Our OpenMP extension enables the parallelization of VQE runs with a 3.1X reduction in runtime.
♻ ☆ Distributed Attack-Resilient Platooning Against False Data Injection
This paper presents a novel distributed vehicle platooning control and coordination strategy. We propose a distributed predecessor-follower CACC scheme that allows to choose an arbitrarily small inter-vehicle distance while guaranteeing no rear-end collisions occur, even in the presence of undetected cyber-attacks on the communication channels such as false data injection. The safety guarantees of the CACC policy are derived by combining a sensor-based ACC policy that explicitly accounts for actuator saturation, and a communication-based predictive term that has state-dependent limits on its control authority, thus containing the effects of an unreliable communication channel. An undetected attack may still however be able to degrade platooning performance. To mitigate it, we propose a tailored Kalman observer-based attack detection algorithm that initially triggers a switch from the CACC policy to the ACC policy. Subsequently, by relying on a high-level coordinator, our strategy allows to isolate a compromised vehicle from the platoon formation by reconfiguring the platoon topology itself. The coordinator can also handle merging and splitting requests. We compare our algorithm in an extensive simulation study against a state of the art distributed MPC scheme and a robust control scheme. We additionally extensively test our full method in practice on a real system, a team of scaled-down car-like robots. Furthermore, we share the code to run both the simulations and robotic experiments.
♻ ☆ Finding Thermodynamically Favorable Pathways in Chemical Reaction Networks Using Flows in Hypergraphs and Mixed-Integer Linear Programming
The search for pathways that optimize the formation of a particular target molecule in a reaction network is a key problem in many settings, including reactor systems. Chemical reaction networks are mathematically well represented as hypergraphs, modeling that facilitates the search for pathways by computational means. We propose to enrich an existing search method for pathways by including thermodynamic principles. In more detail, we give a mixed-integer linear programming (mixed ILP) formulation of the search problem into which we integrate chemical potentials and concentrations for individual molecules, enabling us to constrain the search to return pathways containing only thermodynamically favorable reactions. Moreover, if multiple possible pathways are found, we can rank these by objective functions based on thermodynamics. As an example of use, we apply the framework to a reaction network representing the HCN-formamide chemistry. Alternative pathways to the one currently hypothesized in the literature are queried and enumerated, including some that score better according to our chosen objective function.
comment: 17 pages, 6 figures, 6 tables
Optimization and Control 15
☆ A Study on Effective Initial Guess Finding Method Based on Bézier Curves: Orbit Determination Applications
In celestial mechanics, proper orbits related to missions are obtained by solving two-point boundary value problems. Since a selection method of initial value affects the convergence of the solution, developing an effective method to find an initial guess is required. In this work, B\'ezier curves, which can describe complicated curves and surfaces, are utilized to find the initial guess. First, the given problems are transformed into B\'ezier curves forms, and B\'ezier curves' control points, which can handle the shape of curves, are selected by solving the system of nonlinear equations. Finally, the initial guess is obtained by substituting the calculated control points to B\'ezier curves. To validate the performance of the proposed method, numerical simulations are conducted with respect to three kinds of orbits, which are from circular to highly elliptical orbit (HEO). The proposed method is compared to the general shooting method. The comparison results show that the initial guess calculated by B\'ezier curves makes finding the solution more efficient in terms of computational time and iterations. Also, it shows that the proposed method finds the solution for the HEO while the general shooting method fails to find the solution.
comment: 10 pages, 4 figures, 4 tables, 2019 AAS/AIAA Astrodynamics Specialist Conference
☆ Understanding Lookahead Dynamics Through Laplace Transform
We introduce a frequency-domain framework for convergence analysis of hyperparameters in game optimization, leveraging High-Resolution Differential Equations (HRDEs) and Laplace transforms. Focusing on the Lookahead algorithm--characterized by gradient steps $k$ and averaging coefficient $\alpha$--we transform the discrete-time oscillatory dynamics of bilinear games into the frequency domain to derive precise convergence criteria. Our higher-precision $O(\gamma^2)$-HRDE models yield tighter criteria, while our first-order $O(\gamma)$-HRDE models offer practical guidance by prioritizing actionable hyperparameter tuning over complex closed-form solutions. Empirical validation in discrete-time settings demonstrates the effectiveness of our approach, which may further extend to locally linear operators, offering a scalable framework for selecting hyperparameters for learning in games.
☆ Gradient-Normalized Smoothness for Optimization with Approximate Hessians
In this work, we develop new optimization algorithms that use approximate second-order information combined with the gradient regularization technique to achieve fast global convergence rates for both convex and non-convex objectives. The key innovation of our analysis is a novel notion called Gradient-Normalized Smoothness, which characterizes the maximum radius of a ball around the current point that yields a good relative approximation of the gradient field. Our theory establishes a natural intrinsic connection between Hessian approximation and the linearization of the gradient. Importantly, Gradient-Normalized Smoothness does not depend on the specific problem class of the objective functions, while effectively translating local information about the gradient field and Hessian approximation into the global behavior of the method. This new concept equips approximate second-order algorithms with universal global convergence guarantees, recovering state-of-the-art rates for functions with H\"older-continuous Hessians and third derivatives, quasi-self-concordant functions, as well as smooth classes in first-order optimization. These rates are achieved automatically and extend to broader classes, such as generalized self-concordant functions. We demonstrate direct applications of our results for global linear rates in logistic regression and softmax problems with approximate Hessians, as well as in non-convex optimization using Fisher and Gauss-Newton approximations.
☆ Global Convergence of Adjoint-Optimized Neural PDEs
Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural-network PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural-network PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (1) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (2) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem, even in the infinite-width hidden layer limit (unlike in typical neual network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.
comment: 63 pages, 2 figures
☆ Dynamic Layered Decoding Scheduling for LDPC Codes Aided by Check Node Error Probabilities
In this study, a new scheduling strategies for low-density parity-check (LDPC) codes under layered belief propagation (LBP) is designed. Based on the criteria of prioritizing the update of check nodes with lower error probabilities, we propose two dynamic scheduling methods: dynamic error belief propagation (Dyn-EBP) and dynamic penalty error belief propagation (Dyn-PEBP). In Dyn-EBP, each check node is restricted from being updated the same number of times, whereas Dyn-PEBP removes this restriction and instead introduces a penalty term to balance the number of updates. Simulation results show that, for 5G new radio (NR) LDPC codes, our proposed scheduling methods can outperform existing dynamic and offline scheduling strategies under various blocklengths and code rates. This demonstrates that prioritizing the update of check nodes with lower error probabilities can lead to higher decoding efficiency and validates the effectiveness of our algorithms.
comment: 5 pages, 4 figures
☆ Balancing Intensity and Focality in Directional DBS Under Uncertainty: A Simulation Study of Electrode Optimization via a Metaheuristic L1L1 Approach
As DBS technology advances toward directional leads and optimization-based current steering, this study aims to improve the selection of electrode contact configurations using the recently developed L1-norm regularized L1-norm fitting (L1L1) method. The focus is in particular on L1L1's capability to incorporate a priori lead field uncertainty, offering a potential advantage over conventional approaches that do not account for such variability. Our optimization framework incorporates uncertainty by constraining the solution space based on lead field attenuation. This reflects physiological expectations about the VTA and serves to avoid overfitting. By applying this method to 8- and 40-contact electrode configurations, we optimize current distributions within a discretized finite element (FE) model, focusing on the lead field's characteristics. The model accounts for uncertainty through these explicit constraints, enhancing the feasibility, focality, and robustness of the resulting solutions. The L1L1 method was validated through a series of numerical experiments using both noiseless and noisy lead fields, where the noise level was selected to reflect attenuation within VTA. It successfully fits and regularizes the current distribution across target structures, with hyperparameter optimization extracting either bipolar or multipolar electrode configurations. These configurations aim to maximize focused current density or prioritize a high gain field ratio in a discretized FE model. Compared to traditional methods, the L1L1 approach showed competitive performance in concentrating stimulation within the target region while minimizing unintended current spread, particularly under noisy conditions. By incorporating uncertainty directly into the optimization process, we obtain a noise-robust framework for current steering, allowing for variations in lead field models and simulation parameters.
☆ A Dynamic Relaxation Framework for Global Solution of ACOPF
Solving the Alternating Current Optimal Power Flow (AC OPF) problem to global optimality remains challenging due to its nonconvex quadratic constraints. In this paper, we present a unified framework that combines static piecewise relaxations with dynamic cut-generation mechanism to systematically tighten the classic Second-Order Cone Programming (SOCP) relaxation to arbitrarily small conic violation, thus enabling the recovery of globally optimal solutions. Two static formulations, Pyramidal Relaxation (PR) and Quasi-Pyramidal Relaxation (QPR), are introduced to tighten each branch-flow second-order cone via a finite union of wedges, providing controllable accuracy. Their dynamic counterparts, Dynamic PR (DPR) and Dynamic QPR (DQPR), embed on-the-fly cut generation within a branch-and-cut solver to improve scalability. Convergence is further accelerated through warm starts and a lightweight local-search post-processing. Extensive experiments on benchmarks demonstrate effective elimination of conic violations and flexible trade-offs between solution accuracy and runtime. Practical guidelines are derived for selecting appropriate variants based on network size and accuracy requirements.
comment: Full version of a submission to IEEE Transactions on Power Systems. Includes all proofs and algorithm pseudocode
☆ Using Model Predictive Control To Reduce Traffic Emissions on Urban Freeways
Urban traffic congestion significantly impacts regional air quality and contributes substantially to pollutant emissions. Suburban freeway corridors are a major source of traffic-related emissions, particularly nitrogen oxides (NOx) and carbon dioxide (CO2). This paper proposes a Model Predictive Control (MPC) framework aimed at emission reduction on peripheral freeway corridors. Emission rates on freeways exhibit high sensitivity to speed fluctuations and congestion recovery processes. To address this relationship, we develop and analyze a bounded-acceleration continuum traffic flow model. By introducing an upper limit on vehicle acceleration capabilities, we enhance behavioral realism through the incorporation of driver responses to congestion, which is widely recognized as a main cause of the important capacity drop phenomenon. Our approach implements dynamically optimized variable speed limits (VSLs) at strategic corridor locations, balancing the dual objectives of minimizing both travel time and emissions as quantified by the COPERT V [1] model. Numerical simulations demonstrate that this framework effectively manages congestion and reduces emissions across various traffic demand scenarios.
♻ ☆ Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
We provide the first known algorithm that provably achieves $\varepsilon$-optimality within $\widetilde{\mathcal{O}}(1/\varepsilon)$ function evaluations for the discounted discrete-time LQR problem with unknown parameters, without relying on two-point gradient estimates. These estimates are known to be unrealistic in many settings, as they depend on using the exact same initialization, which is to be selected randomly, for two different policies. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates, which either leads to $\widetilde{\mathcal{O}}(1/\varepsilon^2)$ rates or heavily relies on stability assumptions.
♻ ☆ Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness ICML 2025
We analyze nonlinearly preconditioned gradient methods for solving smooth minimization problems. We introduce a generalized smoothness property, based on the notion of abstract convexity, that is broader than Lipschitz smoothness and provide sufficient first- and second-order conditions. Notably, our framework encapsulates algorithms associated with the gradient clipping method and brings out novel insights for the class of $(L_0,L_1)$-smooth functions that has received widespread interest recently, thus allowing us to extend beyond already established methods. We investigate the convergence of the proposed method in both the convex and nonconvex setting.
comment: ICML 2025 oral
♻ ☆ Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity
The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity of the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-{\L}ojasiewicz inequality, we establish that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.
♻ ☆ SensLI: Sensitivity-Based Layer Insertion for Neural Networks
The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic approach to inserting new layers during the training process. Our method eliminates the need to choose a fixed network size before training, is numerically inexpensive to execute and applicable to various architectures including fully connected feedforward networks, ResNets and CNNs. Our technique borrows ideas from constrained optimization and is based on first-order sensitivity information of the loss function with respect to the virtual parameters that additional layers, if inserted, would offer. In numerical experiments, our proposed sensitivity-based layer insertion technique (SensLI) exhibits improved performance on training loss and test error, compared to training on a fixed architecture, and reduced computational effort in comparison to training the extended architecture from the beginning. Our code is available on https://github.com/mathemml/SensLI.
♻ ☆ Batch Sample-wise Stochastic Optimal Control via Stochastic Maximum Principle
In this work, we study the stochastic optimal control problem (SOC) mainly from the probabilistic view point, i.e. via the Stochastic Maximum principle (SMP) \cite{Peng4}. We adopt the sample-wise backpropagation scheme proposed in \cite{Hui1} to solve the SOC problem under the strong convexity assumption. Importantly, in the Stochastic Gradient Descent (SGD) procedure, we use batch samples with higher order scheme in the forward SDE to improve the convergence rate in \cite{Hui1} from $\sim \mathcal{O}(\sqrt{\frac{N}{K} + \frac{1}{N}})$ to $\sim \mathcal{O}(\sqrt{\frac{1}{K} + \frac{1}{N^2}})$ and note that the main source of uncertainty originates from the scheme for the simulation of $Z$ term in the BSDE. In the meantime, we note the SGD procedure uses only the necessary condition of the SMP, while the batch simulation of the approximating solution of BSDEs allows one to obtain a more accurate estimate of the control $u$ that minimizes the Hamiltonian. We then propose a damped contraction algorithm to solve the SOC problem whose proof of convergence for a special case is attained under some appropriate assumption. We then show numerical results to check the first order convergence rate of the projection algorithm and analyze the convergence behavior of the damped contraction algorithm. Lastly, we briefly discuss how to incorporate the proposed scheme in solving practical problems especially when the Randomized Neural Networks are used. We note that in this special case, the error backward propagation can be avoided and parameter update can be achieved via purely algebraic computation (vector algebra) which will potentially improve the efficiency of the whole training procedure. Such idea will require further exploration and we will leave it as our future work.
comment: Follow up work for a previous SINUM paper
♻ ☆ A model-free approach to control barrier functions using funnel control
Control barrier functions (CBFs) are a popular approach to design feedback laws that achieve safety guarantees for nonlinear systems. The CBF-based controller design relies on the availability of a model to select feasible inputs from the set of CBF-based controls. In this paper, we develop a model-free approach to design CBF-based control laws, eliminating the need for knowledge of system dynamics or parameters. Specifically, we address safety requirements characterized by a time-varying distance to a reference trajectory in the output space and construct a CBF that depends only on the measured output. Utilizing this particular CBF, we determine a subset of CBF-based controls without relying on a model of the dynamics by using techniques from funnel control. The latter is a model-free high-gain adaptive control methodology, which achieves tracking guarantees via reactive feedback. In this paper, we discover and establish a connection between the modular controller synthesis via zeroing CBFs and model-free reactive feedback. The theoretical results are illustrated by a numerical simulation.
♻ ☆ Generalized Inverse Optimal Control and its Application in Biology
Living organisms exhibit remarkable adaptations across all scales, from molecules to ecosystems. We believe that many of these adaptations correspond to optimal solutions driven by evolution, training, and underlying physical and chemical laws and constraints. While some argue against such optimality principles due to their potential ambiguity, we propose generalized inverse optimal control to infer them directly from data. This novel approach incorporates multi-criteria optimality, nestedness of objective functions on different scales, the presence of active constraints, the possibility of switches of optimality principles during the observed time horizon, maximization of robustness, and minimization of time as important special cases, as well as uncertainties involved with the mathematical modeling of biological systems. This data-driven approach ensures that optimality principles are not merely theoretical constructs but are firmly rooted in experimental observations. Furthermore, the inferred principles can be used in forward optimal control to predict and manipulate biological systems, with possible applications in bio-medicine, biotechnology, and agriculture. As discussed and illustrated, the well-posed problem formulation and the inference are challenging and require a substantial interdisciplinary effort in the development of theory and robust numerical methods.