Search | arXiv e-print repository

IGNN-Solver: A Graph Neural Solver for Implicit Graph Neural Networks

Authors: Junchao Lin, Zenan Ling, Zhanbo Feng, Feng Zhou, Jingwen Xu, Robert C Qiu

Abstract: Implicit graph neural networks (IGNNs), which exhibit strong expressive power with a single layer, have recently demonstrated remarkable performance in capturing long-range dependencies (LRD) in underlying graphs while effectively mitigating the over-smoothing problem. However, IGNNs rely on computationally expensive fixed-point iterations, which lead to significant speed and scalability limitatio… ▽ More Implicit graph neural networks (IGNNs), which exhibit strong expressive power with a single layer, have recently demonstrated remarkable performance in capturing long-range dependencies (LRD) in underlying graphs while effectively mitigating the over-smoothing problem. However, IGNNs rely on computationally expensive fixed-point iterations, which lead to significant speed and scalability limitations, hindering their application to large-scale graphs. To achieve fast fixed-point solving for IGNNs, we propose a novel graph neural solver, IGNN-Solver, which leverages the generalized Anderson Acceleration method, parameterized by a small GNN, and learns iterative updates as a graph-dependent temporal process. Extensive experiments demonstrate that the IGNN-Solver significantly accelerates inference, achieving a $1.5\times$ to $8\times$ speedup without sacrificing accuracy. Moreover, this advantage becomes increasingly pronounced as the graph scale grows, facilitating its large-scale deployment in real-world applications. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08459 [pdf, other]

Beamforming Design for Intelligent Reffecting Surface Aided Near-Field THz Communications

Authors: Chi Qiu, Qingqing Wu, Wen Chen, Meng Hua, Wanming Hao, Mengnan Jian, Fen Hou

Abstract: Intelligent reflecting surface (IRS) operating in the terahertz (THz) band has recently gained considerable interest due to its high spectrum bandwidth. Due to the exploitation of large scale of IRS, there is a high probability that the transceivers will be situated within the near-field region of the IRS. Thus, the near-field beam split effect poses a major challenge for the design of wideband IR… ▽ More Intelligent reflecting surface (IRS) operating in the terahertz (THz) band has recently gained considerable interest due to its high spectrum bandwidth. Due to the exploitation of large scale of IRS, there is a high probability that the transceivers will be situated within the near-field region of the IRS. Thus, the near-field beam split effect poses a major challenge for the design of wideband IRS beamforming, which causes the radiation beam to deviate from its intended location, leading to significant gain losses and limiting the efficient use of available bandwidths. While delay-based IRS has emerged as a potential solution, current beamforming schemes generally assume unbounded range time delays (TDs). In this letter, we first investigate the near-field beam split issue at the IRS. Then, we extend the piece-wise far-field model to the IRS, based on which, a double-layer delta-delay (DLDD) IRS beamforming scheme is proposed. Specifically, we employ an element-grouping strategy and the TD imposed on each sub-surface of IRS is achieved by a series of TD modules. This method significantly reduces the required range of TDs. Numerical results show that the proposed DLDD IRS beamforming scheme can effectively mitigate the near-field beam split and achieve near-optimal performance. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2409.09495 [pdf, other]

Protecting Vehicle Location Privacy with Contextually-Driven Synthetic Location Generation

Authors: Sourabh Yadav, Chenyang Yu, Xinpeng Xie, Yan Huang, Chenxi Qiu

Abstract: Geo-obfuscation is a Location Privacy Protection Mechanism used in location-based services that allows users to report obfuscated locations instead of exact ones. A formal privacy criterion, geoindistinguishability (Geo-Ind), requires real locations to be hard to distinguish from nearby locations (by attackers) based on their obfuscated representations. However, Geo-Ind often fails to consider con… ▽ More Geo-obfuscation is a Location Privacy Protection Mechanism used in location-based services that allows users to report obfuscated locations instead of exact ones. A formal privacy criterion, geoindistinguishability (Geo-Ind), requires real locations to be hard to distinguish from nearby locations (by attackers) based on their obfuscated representations. However, Geo-Ind often fails to consider context, such as road networks and vehicle traffic conditions, making it less effective in protecting the location privacy of vehicles, of which the mobility are heavily influenced by these factors. In this paper, we introduce VehiTrack, a new threat model to demonstrate the vulnerability of Geo-Ind in protecting vehicle location privacy from context-aware inference attacks. Our experiments demonstrate that VehiTrack can accurately determine exact vehicle locations from obfuscated data, reducing average inference errors by 61.20% with Laplacian noise and 47.35% with linear programming (LP) compared to traditional Bayesian attacks. By using contextual data like road networks and traffic flow, VehiTrack effectively eliminates a significant number of seemingly "impossible" locations during its search for the actual location of the vehicles. Based on these insights, we propose TransProtect, a new geo-obfuscation approach that limits obfuscation to realistic vehicle movement patterns, complicating attackers' ability to differentiate obfuscated from actual locations. Our results show that TransProtect increases VehiTrack's inference error by 57.75% with Laplacian noise and 27.21% with LP, significantly enhancing protection against these attacks. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: SIGSPATIAL 2024

arXiv:2409.03937 [pdf, other]

Harnessing LLMs for Cross-City OD Flow Prediction

Authors: Chenyang Yu, Xinpeng Xie, Yan Huang, Chenxi Qiu

Abstract: Understanding and predicting Origin-Destination (OD) flows is crucial for urban planning and transportation management. Traditional OD prediction models, while effective within single cities, often face limitations when applied across different cities due to varied traffic conditions, urban layouts, and socio-economic factors. In this paper, by employing Large Language Models (LLMs), we introduce… ▽ More Understanding and predicting Origin-Destination (OD) flows is crucial for urban planning and transportation management. Traditional OD prediction models, while effective within single cities, often face limitations when applied across different cities due to varied traffic conditions, urban layouts, and socio-economic factors. In this paper, by employing Large Language Models (LLMs), we introduce a new method for cross-city OD flow prediction. Our approach leverages the advanced semantic understanding and contextual learning capabilities of LLMs to bridge the gap between cities with different characteristics, providing a robust and adaptable solution for accurate OD flow prediction that can be transferred from one city to another. Our novel framework involves four major components: collecting OD training datasets from a source city, instruction-tuning the LLMs, predicting destination POIs in a target city, and identifying the locations that best match the predicted destination POIs. We introduce a new loss function that integrates POI semantics and trip distance during training. By extracting high-quality semantic features from human mobility and POI data, the model understands spatial and functional relationships within urban spaces and captures interactions between individuals and various POIs. Extensive experimental results demonstrate the superiority of our approach over the state-of-the-art learning-based methods in cross-city OD flow prediction. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 12 pages, 18 figures

arXiv:2408.11077 [pdf, other]

Characteristic Performance Study on Solving Oscillator ODEs via Soft-constrained Physics-informed Neural Network with Small Data

Authors: Kai-liang Lu, Yu-meng Su, Zhuo Bi, Cheng Qiu, Wen-jun Zhang

Abstract: This paper compared physics-informed neural network (PINN), conventional neural network (NN) and traditional numerical discretization methods on solving differential equations (DEs) through literature investigation and experimental validation. We focused on the soft-constrained PINN approach and formalized its mathematical framework and computational flow for solving Ordinary DEs and Partial DEs (… ▽ More This paper compared physics-informed neural network (PINN), conventional neural network (NN) and traditional numerical discretization methods on solving differential equations (DEs) through literature investigation and experimental validation. We focused on the soft-constrained PINN approach and formalized its mathematical framework and computational flow for solving Ordinary DEs and Partial DEs (ODEs/PDEs). The working mechanism and its accuracy and efficiency were experimentally verified by solving typical linear and non-linear oscillator ODEs. We demonstrate that the DeepXDE-based implementation of PINN is not only light code and efficient in training, but also flexible across CPU/GPU platforms. PINN greatly reduces the need for labeled data: when the nonlinearity of the ODE is weak, a very small amount of supervised training data plus a few unsupervised collocation points are sufficient to predict the solution; in the minimalist case, only one or two training points (with initial values) are needed for first- or second-order ODEs, respectively. We also find that, with the aid of collocation points and the use of physical information, PINN has the ability to extrapolate data outside the time domain of the training set, and especially is robust to noisy data, thus with enhanced generalization capabilities. Training is accelerated when the gains obtained along with the reduction in the amount of data outweigh the delay caused by the increase in the loss function terms. The soft-constrained PINN can easily impose a physical law (e.g., conservation of energy) constraint by adding a regularization term to the total loss function, thus improving the solution performance to ODEs that obey this physical law. Furthermore, PINN can also be used for stiff ODEs, PDEs, and other types of DEs, and is becoming a favorable catalyst for the era of Digital Twins. △ Less

Submitted 7 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: 24 pages, 7 figures, 2 tables, etc. Ready for submission

MSC Class: 68T07 ACM Class: I.5

arXiv:2408.03540 [pdf, other]

PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model

Authors: Yunlong Huang, Junshuo Liu, Ke Xian, Robert Caiming Qiu

Abstract: Transformers have significantly advanced the field of 3D human pose estimation (HPE). However, existing transformer-based methods primarily use self-attention mechanisms for spatio-temporal modeling, leading to a quadratic complexity, unidirectional modeling of spatio-temporal relationships, and insufficient learning of spatial-temporal correlations. Recently, the Mamba architecture, utilizing the… ▽ More Transformers have significantly advanced the field of 3D human pose estimation (HPE). However, existing transformer-based methods primarily use self-attention mechanisms for spatio-temporal modeling, leading to a quadratic complexity, unidirectional modeling of spatio-temporal relationships, and insufficient learning of spatial-temporal correlations. Recently, the Mamba architecture, utilizing the state space model (SSM), has exhibited superior long-range modeling capabilities in a variety of vision tasks with linear complexity. In this paper, we propose PoseMamba, a novel purely SSM-based approach with linear complexity for 3D human pose estimation in monocular video. Specifically, we propose a bidirectional global-local spatio-temporal SSM block that comprehensively models human joint relations within individual frames as well as temporal correlations across frames. Within this bidirectional global-local spatio-temporal SSM block, we introduce a reordering strategy to enhance the local modeling capability of the SSM. This strategy provides a more logical geometric scanning order and integrates it with the global SSM, resulting in a combined global-local spatial scan. We have quantitatively and qualitatively evaluated our approach using two benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experiments demonstrate that PoseMamba achieves state-of-the-art performance on both datasets while maintaining a smaller model size and reducing computational costs. The code and models will be released. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.03478 [pdf, other]

Effect of Kernel Size on CNN-Vision-Transformer-Based Gaze Prediction Using Electroencephalography Data

Authors: Chuhui Qiu, Bugao Liang, Matthew L Key

Abstract: In this paper, we present an algorithm of gaze prediction from Electroencephalography (EEG) data. EEG-based gaze prediction is a new research topic that can serve as an alternative to traditional video-based eye-tracking. Compared to the existing state-of-the-art (SOTA) method, we improved the root mean-squared-error of EEG-based gaze prediction to 53.06 millimeters, while reducing the training ti… ▽ More In this paper, we present an algorithm of gaze prediction from Electroencephalography (EEG) data. EEG-based gaze prediction is a new research topic that can serve as an alternative to traditional video-based eye-tracking. Compared to the existing state-of-the-art (SOTA) method, we improved the root mean-squared-error of EEG-based gaze prediction to 53.06 millimeters, while reducing the training time to less than 33% of its original duration. Our source code can be found at https://github.com/AmCh-Q/CSCI6907Project △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: International Conference on Human-Computer Interaction (HCII 2024)

arXiv:2408.03472 [pdf, other]

Integrating HCI Datasets in Project-Based Machine Learning Courses: A College-Level Review and Case Study

Authors: Xiaodong Qu, Matthew Key, Eric Luo, Chuhui Qiu

Abstract: This study explores the integration of real-world machine learning (ML) projects using human-computer interfaces (HCI) datasets in college-level courses to enhance both teaching and learning experiences. Employing a comprehensive literature review, course websites analysis, and a detailed case study, the research identifies best practices for incorporating HCI datasets into project-based ML educat… ▽ More This study explores the integration of real-world machine learning (ML) projects using human-computer interfaces (HCI) datasets in college-level courses to enhance both teaching and learning experiences. Employing a comprehensive literature review, course websites analysis, and a detailed case study, the research identifies best practices for incorporating HCI datasets into project-based ML education. Key f indings demonstrate increased student engagement, motivation, and skill development through hands-on projects, while instructors benefit from effective tools for teaching complex concepts. The study also addresses challenges such as data complexity and resource allocation, offering recommendations for future improvements. These insights provide a valuable framework for educators aiming to bridge the gap between △ Less

Submitted 6 August, 2024; originally announced August 2024.

Journal ref: International Conference on Human-Computer Interaction (HCII 2024)

arXiv:2408.02283 [pdf, other]

Enhanced Equilibria-Solving via Private Information Pre-Branch Structure in Adversarial Team Games

Authors: Chen Qiu, Haobo Fu, Kai Li, Weixin Huang, Jiajia Zhang, Xuan Wang

Abstract: In ex ante coordinated adversarial team games (ATGs), a team competes against an adversary, and the team members are only allowed to coordinate their strategies before the game starts. The team-maxmin equilibrium with correlation (TMECor) is a suitable solution concept for ATGs. One class of TMECor-solving methods transforms the problem into solving NE in two-player zero-sum games, leveraging well… ▽ More In ex ante coordinated adversarial team games (ATGs), a team competes against an adversary, and the team members are only allowed to coordinate their strategies before the game starts. The team-maxmin equilibrium with correlation (TMECor) is a suitable solution concept for ATGs. One class of TMECor-solving methods transforms the problem into solving NE in two-player zero-sum games, leveraging well-established tools for the latter. However, existing methods are fundamentally action-based, resulting in poor generalizability and low solving efficiency due to the exponential growth in the size of the transformed game. To address the above issues, we propose an efficient game transformation method based on private information, where all team members are represented by a single coordinator. We designed a structure called private information pre-branch, which makes decisions considering all possible private information from teammates. We prove that the size of the game transformed by our method is exponentially reduced compared to the current state-of-the-art. Moreover, we demonstrate equilibria equivalence. Experimentally, our method achieves a significant speedup of 182.89$\times$ to 694.44$\times$ in scenarios where the current state-of-the-art method can work, such as small-scale Kuhn poker and Leduc poker. Furthermore, our method is applicable to larger games and those with dynamically changing private information, such as Goofspiel. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: 13 pages, 4 figures

arXiv:2407.21566 [pdf, other]

TRGR: Transmissive RIS-aided Gait Recognition Through Walls

Authors: Yunlong Huang, Junshuo Liu, Jianan Zhang, Tiebin Mi, Xin Shi, Robert Caiming Qiu

Abstract: Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intell… ▽ More Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: Globecom 2024 IoTSN accepted

arXiv:2407.13725 [pdf, other]

Scalable Optimization for Locally Relevant Geo-Location Privacy

Authors: Chenxi Qiu, Ruiyao Liu, Primal Pappachan, Anna Squicciarini, Xinpeng Xie

Abstract: Geo-obfuscation functions as a location privacy protection mechanism (LPPM), enabling mobile users to share obfuscated locations with servers instead of their exact locations. This technique protects users' location privacy during server-side data breaches since the obfuscation process is irreversible. To minimize the utility loss caused by data obfuscation, linear programming (LP) is widely used.… ▽ More Geo-obfuscation functions as a location privacy protection mechanism (LPPM), enabling mobile users to share obfuscated locations with servers instead of their exact locations. This technique protects users' location privacy during server-side data breaches since the obfuscation process is irreversible. To minimize the utility loss caused by data obfuscation, linear programming (LP) is widely used. However, LP can face a polynomial explosion in decision variables, making it impractical for large-scale geo-obfuscation applications. In this paper, we propose a new LPPM called Locally Relevant Geo-obfuscation (LR-Geo) to optimize geo-obfuscation using LP more efficiently. This is accomplished by restricting the geo-obfuscation calculations for each user to locally relevant (LR) locations near the user's actual location. To prevent LR locations from inadvertently revealing a user's true whereabouts, users compute the LP coefficients locally and upload only these coefficients to the server, rather than the LR locations themselves. The server then solves the LP problem using the provided coefficients. Additionally, we enhance the LP framework with an exponential obfuscation mechanism to ensure that the obfuscation distribution is indistinguishable across multiple users. By leveraging the constraint structure of the LP formulation, we apply Benders' decomposition to further boost computational efficiency. Our theoretical analysis confirms that, even though geo-obfuscation is calculated independently for each user, it still adheres to geo-indistinguishability constraints across multiple users with high probability. Finally, experimental results using a real-world dataset demonstrate that LR-Geo outperforms existing geo-obfuscation methods in terms of computational time, data utility, and privacy protection. △ Less

Submitted 29 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.10080 [pdf, other]

Design and Optimization on Successive RIS-assisted Multi-hop Wireless Communications

Authors: Rujing Xiong, Jialong Lu, Jianan Zhang, Minggang Liu, Xuehui Dong, Tiebin Mi, Robert Caiming Qiu

Abstract: As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering… ▽ More As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering energy efficiency, and often suffers from inaccurate channel state information (CSI), poor convergence, and high computational complexity. This paper addresses the design and optimization challenges for successive RIS-assisted multi-hop systems. Specifically, we establish a general model for multi-hop communication based on the relationship between the input and output electric fields within each RIS. Meanwhile, we derive the half-power beamwidth of the RIS-reflected beams, considering the beam direction. Leveraging these models and derivations, we propose deployment optimization and beam optimization strategies for multi-hop systems, which feature high aperture efficiency and significant gains in signal power. Simulation and prototype experiment results validate the effectiveness and superiority of the proposed systems and methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2406.16308 [pdf, other]

Anomaly Detection of Tabular Data Using LLMs

Authors: Aodong Li, Yunhan Zhao, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, Stephan Mandt

Abstract: Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating thei… ▽ More Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidden outliers in a batch of data, demonstrating their ability to identify low-density data regions. For LLMs that are not well aligned with anomaly detection and frequently output factual errors, we apply simple yet effective data-generating processes to simulate synthetic batch-level anomaly detection datasets and propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies. Experiments on a large anomaly detection benchmark (ODDS) showcase i) GPT-4 has on-par performance with the state-of-the-art transductive learning-based anomaly detection methods and ii) the efficacy of our synthetic dataset and fine-tuning strategy in aligning LLMs to this task. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: accepted at the Anomaly Detection with Foundation Models workshop

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 16 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.16181 [pdf, other]

Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling

Authors: Chunlin Qiu, Yiheng Duan, Lingchen Zhao, Qian Wang

Abstract: Transfer-based attacks craft adversarial examples utilizing a white-box surrogate model to compromise various black-box target models, posing significant threats to many real-world applications. However, existing transfer attacks suffer from either weak transferability or expensive computation. To bridge the gap, we propose a novel sample-based attack, named neighborhood conditional sampling (NCS)… ▽ More Transfer-based attacks craft adversarial examples utilizing a white-box surrogate model to compromise various black-box target models, posing significant threats to many real-world applications. However, existing transfer attacks suffer from either weak transferability or expensive computation. To bridge the gap, we propose a novel sample-based attack, named neighborhood conditional sampling (NCS), which enjoys high transferability with lightweight computation. Inspired by the observation that flat maxima result in better transferability, NCS is formulated as a max-min bi-level optimization problem to seek adversarial regions with high expected adversarial loss and small standard deviations. Specifically, due to the inner minimization problem being computationally intensive to resolve, and affecting the overall transferability, we propose a momentum-based previous gradient inversion approximation (PGIA) method to effectively solve the inner problem without any computation cost. In addition, we prove that two newly proposed attacks, which achieve flat maxima for better transferability, are actually specific cases of NCS under particular conditions. Extensive experiments demonstrate that NCS efficiently generates highly transferable adversarial examples, surpassing the current best method in transferability while requiring only 50% of the computational cost. Additionally, NCS can be seamlessly integrated with other methods to further enhance transferability. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.13699 [pdf, other]

Uncertainty-aware Evaluation of Auxiliary Anomalies with the Expected Anomaly Posterior

Authors: Lorenzo Perini, Maja Rudolph, Sabrina Schmedding, Chen Qiu

Abstract: Anomaly detection is the task of identifying examples that do not behave as expected. Because anomalies are rare and unexpected events, collecting real anomalous examples is often challenging in several applications. In addition, learning an anomaly detector with limited (or no) anomalies often yields poor prediction performance. One option is to employ auxiliary synthetic anomalies to improve the… ▽ More Anomaly detection is the task of identifying examples that do not behave as expected. Because anomalies are rare and unexpected events, collecting real anomalous examples is often challenging in several applications. In addition, learning an anomaly detector with limited (or no) anomalies often yields poor prediction performance. One option is to employ auxiliary synthetic anomalies to improve the model training. However, synthetic anomalies may be of poor quality: anomalies that are unrealistic or indistinguishable from normal samples may deteriorate the detector's performance. Unfortunately, no existing methods quantify the quality of auxiliary anomalies. We fill in this gap and propose the expected anomaly posterior (EAP), an uncertainty-based score function that measures the quality of auxiliary anomalies by quantifying the total uncertainty of an anomaly detector. Experimentally on 40 benchmark datasets of images and tabular data, we show that EAP outperforms 12 adapted data quality estimators in the majority of cases. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11541 [pdf, other]

R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

Authors: Huiying Yang, Zihan Jin, Chenhao Wu, Rujing Xiong, Robert Caiming Qiu, Zenan Ling

Abstract: Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors… ▽ More Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors and the mobility of RISs. In this paper, we propose a novel modeling approach using Neural Radiance Fields (NeRF) to characterize the dynamics of electromagnetic fields in such environments. Our method utilizes NeRF-based ray tracing to intuitively capture and visualize the complex dynamics of signal propagation, effectively modeling the complete signal pathways from the transmitter to the RIS, and from the RIS to the receiver. This two-stage process accurately characterizes multiple complex transmission paths, enhancing our understanding of signal behavior in real-world scenarios. Our approach predicts the signal field for any specified RIS placement and receiver location, facilitating efficient RIS deployment. Experimental evaluations using both simulated and real-world data validate the significant benefits of our methodology. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.06967 [pdf, other]

Optimal Configuration of Reconfigurable Intelligent Surfaces With Non-uniform Phase Quantization

Authors: Jialong Lu, Rujing Xiong, Tiebin Mi, Ke Yin, Robert Caiming Qiu

Abstract: The existing methods for Reconfigurable Intelligent Surface (RIS) beamforming in wireless communication are typically limited to uniform phase quantization. However, in real world applications, the phase and bit resolution of RIS units are often non-uniform due to practical requirements and engineering challenges. To fill this research gap, we formulate an optimization problem for discrete non-uni… ▽ More The existing methods for Reconfigurable Intelligent Surface (RIS) beamforming in wireless communication are typically limited to uniform phase quantization. However, in real world applications, the phase and bit resolution of RIS units are often non-uniform due to practical requirements and engineering challenges. To fill this research gap, we formulate an optimization problem for discrete non-uniform phase configuration in RIS assisted multiple-input single-output (MISO) communications. Subsequently, a partition-and-traversal (PAT) algorithm is proposed to solve that, achieving the global optimal solution. The efficacy and superiority of the PAT algorithm are validated through numerical simulations, and the impact of non-uniform phase quantization on system performance is analyzed. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06442 [pdf, other]

Optimal Beamforming of RIS-Aided Wireless Communications: An Alternating Inner Product Maximization Approach

Authors: Rujing Xiong, Tiebin Mi, Jialong Lu, Ke Yin, Kai Wan, Fuhai Wang, Robert Caiming Qiu

Abstract: This paper investigates a general discrete $\ell_p$-norm maximization problem, with the power enhancement at steering directions through reconfigurable intelligent surfaces (RISs) as an instance. We propose a mathematically concise iterative framework composed of alternating inner product maximizations, well-suited for addressing $\ell_1$- and $\ell_2$-norm maximizations with either discrete or co… ▽ More This paper investigates a general discrete $\ell_p$-norm maximization problem, with the power enhancement at steering directions through reconfigurable intelligent surfaces (RISs) as an instance. We propose a mathematically concise iterative framework composed of alternating inner product maximizations, well-suited for addressing $\ell_1$- and $\ell_2$-norm maximizations with either discrete or continuous uni-modular variable constraints. The iteration is proven to be monotonically non-decreasing. Moreover, this framework exhibits a distinctive capability to mitigate performance degradation due to discrete quantization, establishing it as the first post-rounding lifting approach applicable to any algorithm intended for the continuous solution. Additionally, as an integral component of the alternating iterations framework, we present a divide-and-sort (DaS) method to tackle the discrete inner product maximization problem. In the realm of $\ell_\infty$-norm maximization with discrete uni-modular constraints, the DaS ensures the identification of the global optimum with polynomial search complexity. We validate the effectiveness of the alternating inner product maximization framework in beamforming through RISs using both numerical experiments and field trials on prototypes. The results demonstrate that the proposed approach achieves higher power enhancement and outperforms other competitors. Finally, we show that discrete phase configurations with moderate quantization bits (e.g., 4-bit) exhibit comparable performance to continuous configurations in terms of power gains. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.04344 [pdf, other]

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Authors: Chenxi Qiu

Abstract: Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear pr… ▽ More Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency. △ Less

Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: To be published in IJCAI 2024

arXiv:2405.01104 [pdf, other]

Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Rujing Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu

Abstract: This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we j… ▽ More This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we jointly optimize the transmit beamforming at the base station (BS) and the end-to-end transmission matrix of the SIM, to minimize the Cramér-Rao Bound (CRB) for target estimation. Effective algorithms such as the alternating optimization (AO) and semidefinite relaxation (SDR) are employed to solve the non-convex SINR-constrained CRB minimization problem. Finally, we design and build an experimental platform for SIM, and evaluate the performance of the proposed algorithms for communication and sensing tasks. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.18109 [pdf, other]

Finding Beautiful and Happy Images for Mental Health and Well-being Applications

Authors: Ruitao Xie, Connor Qiu, Guoping Qiu

Abstract: This paper explores how artificial intelligence (AI) technology can contribute to achieve progress on good health and well-being, one of the United Nations' 17 Sustainable Development Goals. It is estimated that one in ten of the global population lived with a mental disorder. Inspired by studies showing that engaging and viewing beautiful natural images can make people feel happier and less stres… ▽ More This paper explores how artificial intelligence (AI) technology can contribute to achieve progress on good health and well-being, one of the United Nations' 17 Sustainable Development Goals. It is estimated that one in ten of the global population lived with a mental disorder. Inspired by studies showing that engaging and viewing beautiful natural images can make people feel happier and less stressful, lead to higher emotional well-being, and can even have therapeutic values, we explore how AI can help to promote mental health by developing automatic algorithms for finding beautiful and happy images. We first construct a large image database consisting of nearly 20K very high resolution colour photographs of natural scenes where each image is labelled with beautifulness and happiness scores by about 10 observers. Statistics of the database shows that there is a good correlation between the beautifulness and happiness scores which provides anecdotal evidence to corroborate that engaging beautiful natural images can potentially benefit mental well-being. Building on this unique database, the very first of its kind, we have developed a deep learning based model for automatically predicting the beautifulness and happiness scores of natural images. Experimental results are presented to show that it is possible to develop AI algorithms to automatically assess an image's beautifulness and happiness values which can in turn be used to develop applications for promoting mental health and well-being. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.13348 [pdf, other]

Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems

Authors: Xiaofei Wang, Yunfeng Zhao, Chao Qiu, Qinghua Hu, Victor C. M. Leung

Abstract: Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in sup… ▽ More Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in supporting services with diverse requirements. In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI. SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents within the EI system. SL not only enhances the system's adaptability but also optimizes communication, and networking processes, essential for distributed intelligence across diverse devices and platforms. Therefore, a combination of SL and EI may greatly facilitate the development of collaborative intelligence in the future network. This paper presents the findings of a literature review on the integration of EI and SL, summarizing the latest achievements in existing research on EI and SL. Subsequently, we delve comprehensively into the limitations of EI and how it could benefit from SL. Special emphasis is placed on the communication challenges and networking strategies and other aspects within these systems, underlining the role of optimized network solutions in improving system efficiency. Based on these discussions, we elaborate in detail on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses. Finally, we identify some possible future applications of combining SL and EI, discuss open problems and suggest some future research. △ Less

Submitted 15 October, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by IEEE Communications Surveys and Tutorials

arXiv:2404.08916 [pdf, other]

Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Jun Li, Peiquan Jin

Abstract: Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contra… ▽ More Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contrast compared to the background, further complicating the segmentation task. To address these challenges, we present the first large-scale perirectal metastatic lymph node CT image dataset called Meply, which encompasses pixel-level annotations of 269 patients diagnosed with rectal cancer. Furthermore, we introduce a novel lymph-node segmentation model named CoSAM. The CoSAM utilizes sequence-based detection to guide the segmentation of metastatic lymph nodes in rectal cancer, contributing to improved localization performance for the segmentation model. It comprises three key components: sequence-based detection module, segmentation module, and collaborative convergence unit. To evaluate the effectiveness of CoSAM, we systematically compare its performance with several popular segmentation methods using the Meply dataset. Our code and dataset will be publicly available at: https://github.com/kanydao/CoSAM. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 13 pages

arXiv:2404.07504 [pdf, other]

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Authors: Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine Susstrunk, Mathieu Salzmann

Abstract: In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we i… ▽ More In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we introduce a novel self-supervised learning (SSL) strategy. Our approach leverages both object patterns and contextual cues to produce robust features. It begins with the formulation of an object-exchanging strategy, where pairs of objects with comparable sizes are exchanged across different scenes, effectively disentangling the strong contextual dependencies. Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes. Our extensive experiments demonstrate the superiority of our method over existing SSL techniques, further showing its better robustness to environmental changes. Moreover, we showcase the applicability of our approach by transferring pre-trained models to diverse point cloud datasets. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.00585 [pdf, other]

Decentralized Uncoded Storage Elastic Computing with Heterogeneous Computation Speeds

Authors: Wenbo Huang, Xudong You, Kai Wan, Robert Caiming Qiu, Mingyue Ji

Abstract: Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in the computation. In 2018, Yang et al. introduced Coded Storage Elastic Computing (CSEC) to address the elasticity using coding technology, with lower storage and… ▽ More Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in the computation. In 2018, Yang et al. introduced Coded Storage Elastic Computing (CSEC) to address the elasticity using coding technology, with lower storage and computation load requirements. However, CSEC is limited to certain types of computations (e.g., linear) due to the coded data storage based on linear coding. Then Centralized Uncoded Storage Elastic Computing (CUSEC) with heterogeneous computation speeds was proposed, which directly copies parts of data into the virtual machines. In all existing works in elastic computing, the storage assignment is centralized, meaning that the number and identity of all virtual machines possible used in the whole computation process are known during the storage assignment. In this paper, we consider Decentralized Uncoded Storage Elastic Computing (DUSEC) with heterogeneous computation speeds, where any available virtual machine can join the computation which is not predicted and thus coordination among different virtual machines' storage assignments is not allowed. Under a decentralized storage assignment originally proposed in coded caching by Maddah-Ali and Niesen, we propose a computing scheme with closed-form optimal computation time. We also run experiments over MNIST dataset with Softmax regression model through the Tencent cloud platform, and the experiment results demonstrate that the proposed DUSEC system approaches the state-of-art best storage assignment in the CUSEC system in computation time. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 10 pages, 8 figures, submitted to ISIT2024

arXiv:2403.00258 [pdf, ps, other]

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Authors: Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, Shiliang Pu, Robert C. Qiu, Zhenyu Liao

Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, s… ▽ More Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 32 pages, 4 figures, and 2 tables. Fixing typos in Theorems 1 and 2 from NeurIPS 2022 proceeding (https://proceedings.neurips.cc/paper_files/paper/2022/hash/185087ea328b4f03ea8fd0c8aa96f747-Abstract-Conference.html)

arXiv:2402.18844 [pdf, other]

doi 10.1016/j.neucom.2024.128049

Deep learning for 3D human pose estimation and mesh recovery: A survey

Authors: Yang Liu, Changzhen Qiu, Zhiyong Zhang

Abstract: 3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics. Deep learning on 3D human pose estimation and mesh recovery has recently thrived, with numerous methods proposed to address different problems in this area. In this paper, to stimulate future research, we present a comprehensive review of… ▽ More 3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics. Deep learning on 3D human pose estimation and mesh recovery has recently thrived, with numerous methods proposed to address different problems in this area. In this paper, to stimulate future research, we present a comprehensive review of recent progress over the past five years in deep learning methods for this area by delving into over 200 references. To the best of our knowledge, this survey is arguably the first to comprehensively cover deep learning methods for 3D human pose estimation, including both single-person and multi-person approaches, as well as human mesh recovery, encompassing methods based on explicit models and implicit representations. We also present comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions. A regularly updated project page can be found at https://github.com/liuyangme/SOTA-3DHPE-HMR. △ Less

Submitted 2 July, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.02697 [pdf, ps, other]

Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

Authors: Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao

Abstract: Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of… ▽ More Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets. △ Less

Submitted 19 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by ICML 2024

arXiv:2401.12133 [pdf, other]

VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive Games

Authors: He Zhang, Xinyang Li, Yuanxi Sun, Xinyi Fu, Christine Qiu, John M. Carroll

Abstract: Understanding and recognizing emotions are important and challenging issues in the metaverse era. Understanding, identifying, and predicting fear, which is one of the fundamental human emotions, in virtual reality (VR) environments plays an essential role in immersive game development, scene development, and next-generation virtual human-computer interaction applications. In this article, we used… ▽ More Understanding and recognizing emotions are important and challenging issues in the metaverse era. Understanding, identifying, and predicting fear, which is one of the fundamental human emotions, in virtual reality (VR) environments plays an essential role in immersive game development, scene development, and next-generation virtual human-computer interaction applications. In this article, we used VR horror games as a medium to analyze fear emotions by collecting multi-modal data (posture, audio, and physiological signals) from 23 players. We used an LSTM-based model to predict fear with accuracies of 65.31% and 90.47% under 6-level classification (no fear and five different levels of fear) and 2-level classification (no fear and fear), respectively. We constructed a multi-modal natural behavior dataset of immersive human fear responses (VRMN-bD) and compared it with existing relevant advanced datasets. The results show that our dataset has fewer limitations in terms of collection method, data scale and audience scope. We are unique and advanced in targeting multi-modal datasets of fear and behavior in VR stand-up interactive environments. Moreover, we discussed the implications of this work for communities and applications. The dataset and pre-trained model are available at https://github.com/KindOPSTAR/VRMN-bD. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted to IEEE VR 2024

arXiv:2312.16931 [pdf, other]

DeLR: Active Learning for Detection with Decoupled Localization and Recognition Query

Authors: Yuhang Zhang, Yuang Deng, Xiaopeng Zhang, Jie Li, Robert C. Qiu, Qi Tian

Abstract: Active learning has been demonstrated effective to reduce labeling cost, while most progress has been designed for image recognition, there still lacks instance-level active learning for object detection. In this paper, we rethink two key components, i.e., localization and recognition, for object detection, and find that the correctness of them are highly related, therefore, it is not necessary to… ▽ More Active learning has been demonstrated effective to reduce labeling cost, while most progress has been designed for image recognition, there still lacks instance-level active learning for object detection. In this paper, we rethink two key components, i.e., localization and recognition, for object detection, and find that the correctness of them are highly related, therefore, it is not necessary to annotate both boxes and classes if we are given pseudo annotations provided with the trained model. Motivated by this, we propose an efficient query strategy, termed as DeLR, that Decoupling the Localization and Recognition for active query. In this way, we are probably free of class annotations when the localization is correct, and able to assign the labeling budget for more informative samples. There are two main differences in DeLR: 1) Unlike previous methods mostly focus on image-level annotations, where the queried samples are selected and exhausted annotated. In DeLR, the query is based on region-level, and we only annotate the object region that is queried; 2) Instead of directly providing both localization and recognition annotations, we separately query the two components, and thus reduce the recognition budget with the pseudo class labels provided by the model. Experiments on several benchmarks demonstrate its superiority. We hope our proposed query strategy would shed light on researches in active learning in object detection. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16418 [pdf, other]

Refining Latent Homophilic Structures over Heterophilic Graphs for Robust Graph Convolution Networks

Authors: Chenyang Qiu, Guoshun Nan, Tianyu Xiong, Wendi Deng, Di Wang, Zhiyang Teng, Lijuan Sun, Qimei Cui, Xiaofeng Tao

Abstract: Graph convolution networks (GCNs) are extensively utilized in various graph tasks to mine knowledge from spatial data. Our study marks the pioneering attempt to quantitatively investigate the GCN robustness over omnipresent heterophilic graphs for node classification. We uncover that the predominant vulnerability is caused by the structural out-of-distribution (OOD) issue. This finding motivates u… ▽ More Graph convolution networks (GCNs) are extensively utilized in various graph tasks to mine knowledge from spatial data. Our study marks the pioneering attempt to quantitatively investigate the GCN robustness over omnipresent heterophilic graphs for node classification. We uncover that the predominant vulnerability is caused by the structural out-of-distribution (OOD) issue. This finding motivates us to present a novel method that aims to harden GCNs by automatically learning Latent Homophilic Structures over heterophilic graphs. We term such a methodology as LHS. To elaborate, our initial step involves learning a latent structure by employing a novel self-expressive technique based on multi-node interactions. Subsequently, the structure is refined using a pairwisely constrained dual-view contrastive learning approach. We iteratively perform the above procedure, enabling a GCN model to aggregate information in a homophilic way on heterophilic graphs. Armed with such an adaptable structure, we can properly mitigate the structural OOD threats over heterophilic graphs. Experiments on various benchmarks show the effectiveness of the proposed LHS approach for robust GCNs. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: To be appeared in the proceedings of AAAI-2024

arXiv:2312.15582 [pdf, other]

Decoding Fear: Exploring User Experiences in Virtual Reality Horror Games

Authors: He Zhang, Xinyang Li, Christine Qiu, Xinyi Fu

Abstract: This preliminary study investigated user experiences in VR horror games, highlighting fear-triggering and gender-based differences in perception. By utilizing a scientifically validated and specially designed questionnaire, we successfully collected questionnaire data from 23 subjects for an early empirical study of fear induction in a virtual reality gaming environment. The early findings suggest… ▽ More This preliminary study investigated user experiences in VR horror games, highlighting fear-triggering and gender-based differences in perception. By utilizing a scientifically validated and specially designed questionnaire, we successfully collected questionnaire data from 23 subjects for an early empirical study of fear induction in a virtual reality gaming environment. The early findings suggest that visual restrictions and ambient sound-enhanced realism may be more effective in intensifying the fear experience. Participants exhibited a tendency to avoid playing alone or during nighttime, underscoring the significant psychological impact of VR horror games. The study also revealed a distinct gender difference in fear perception, with female participants exhibiting a higher sensitivity to fear stimuli. However, the preference for different types of horror games was not solely dominated by males; it varied depending on factors such as the game's pace, its objectives, and the nature of the fear stimulant. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: 9 pages, 4 figures, 7 tables, accepted by CHCHI2023

arXiv:2311.18433 [pdf, other]

E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning

Authors: Xiuhong Lin, Changjie Qiu, Zhipeng Cai, Siqi Shen, Yu Zang, Weiquan Liu, Xuesheng Bian, Matthias Müller, Cheng Wang

Abstract: Event cameras have emerged as a promising vision sensor in recent years due to their unparalleled temporal resolution and dynamic range. While registration of 2D RGB images to 3D point clouds is a long-standing problem in computer vision, no prior work studies 2D-3D registration for event cameras. To this end, we propose E2PNet, the first learning-based method for event-to-point cloud registration… ▽ More Event cameras have emerged as a promising vision sensor in recent years due to their unparalleled temporal resolution and dynamic range. While registration of 2D RGB images to 3D point clouds is a long-standing problem in computer vision, no prior work studies 2D-3D registration for event cameras. To this end, we propose E2PNet, the first learning-based method for event-to-point cloud registration. The core of E2PNet is a novel feature representation network called Event-Points-to-Tensor (EP2T), which encodes event data into a 2D grid-shaped feature tensor. This grid-shaped feature enables matured RGB-based frameworks to be easily used for event-to-point cloud registration, without changing hyper-parameters and the training procedure. EP2T treats the event input as spatio-temporal point clouds. Unlike standard 3D learning architectures that treat all dimensions of point clouds equally, the novel sampling and information aggregation modules in EP2T are designed to handle the inhomogeneity of the spatial and temporal dimensions. Experiments on the MVSEC and VECtor datasets demonstrate the superiority of E2PNet over hand-crafted and other learning-based methods. Compared to RGB-based registration, E2PNet is more robust to extreme illumination or fast motion due to the use of event data. Beyond 2D-3D registration, we also show the potential of EP2T for other vision tasks such as flow estimation, event-to-image reconstruction and object recognition. The source code can be found at: https://github.com/Xmu-qcj/E2PNet. △ Less

Submitted 27 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 10 pages, 4 figures, accepted by Thirty-seventh Conference on Neural Information Processing Systems(NeurIPS 2023)

arXiv:2311.11582 [pdf, other]

Asymptotic CRB Analysis of Random RIS-Assisted Large-Scale Localization Systems

Authors: Zhengyu Wang, Hongzheng Liu, Rujing Xiong, Fuhai Wang, Robert Caiming Qiu

Abstract: This paper studies the performance of a randomly RIS-assisted multi-target localization system, in which the configurations of the RIS are randomly set to avoid high-complexity optimization. We first focus on the scenario where the number of RIS elements is significantly large, and then obtain the scaling law of Cramér-Rao bound (CRB) under certain conditions, which shows that CRB decreases in the… ▽ More This paper studies the performance of a randomly RIS-assisted multi-target localization system, in which the configurations of the RIS are randomly set to avoid high-complexity optimization. We first focus on the scenario where the number of RIS elements is significantly large, and then obtain the scaling law of Cramér-Rao bound (CRB) under certain conditions, which shows that CRB decreases in the third or fourth order as the RIS dimension increases. Second, we extend our analysis to large systems where both the number of targets and sensors is substantial. Under this setting, we explore two common RIS models: the constant module model and the discrete amplitude model, and illustrate how the random RIS configuration impacts the value of CRB. Numerical results demonstrate that asymptotic formulas provide a good approximation to the exact CRB in the proposed randomly configured RIS systems. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.04686 [pdf, other]

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Authors: Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

Abstract: Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, fe… ▽ More Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is \emph{independent} of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 21 pages

arXiv:2310.10461 [pdf, other]

Model Selection of Anomaly Detectors in the Absence of Labeled Validation Data

Authors: Clement Fung, Chen Qiu, Aodong Li, Maja Rudolph

Abstract: Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. While the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data -- without it, their detection performance cannot be evaluated reliably. In this work, we propose SWSA (Selection With Synt… ▽ More Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. While the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data -- without it, their detection performance cannot be evaluated reliably. In this work, we propose SWSA (Selection With Synthetic Anomalies): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies without any training or fine-tuning, using only a small support set of normal images. Our synthetic anomalies are used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies. △ Less

Submitted 16 September, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 14 pages

arXiv:2310.09114 [pdf, other]

Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport

Authors: Songpengcheng Xia, Lei Chu, Ling Pei, Jiarui Yang, Wenxian Yu, Robert C. Qiu

Abstract: Human activity recognition (HAR) with wearables is one of the serviceable technologies in ubiquitous and mobile computing applications. The sliding-window scheme is widely adopted while suffering from the multi-class windows problem. As a result, there is a growing focus on joint segmentation and recognition with deep-learning methods, aiming at simultaneously dealing with HAR and time-series segm… ▽ More Human activity recognition (HAR) with wearables is one of the serviceable technologies in ubiquitous and mobile computing applications. The sliding-window scheme is widely adopted while suffering from the multi-class windows problem. As a result, there is a growing focus on joint segmentation and recognition with deep-learning methods, aiming at simultaneously dealing with HAR and time-series segmentation issues. However, obtaining the full activity annotations of wearable data sequences is resource-intensive or time-consuming, while unsupervised methods yield poor performance. To address these challenges, we propose a novel method for joint activity segmentation and recognition with timestamp supervision, in which only a single annotated sample is needed in each activity segment. However, the limited information of sparse annotations exacerbates the gap between recognition and segmentation tasks, leading to sub-optimal model performance. Therefore, the prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module for well-structured embeddings. Moreover, with the optimal transport theory, our approach generates the sample-level pseudo-labels that take advantage of unlabeled data between timestamp annotations for further performance improvement. Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods and achieves comparable performance to the fully-supervised approaches. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Under Review (submitted to IEEE TMC)

arXiv:2310.07990 [pdf]

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research. △ Less

Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 19 pages, 3 figures

arXiv:2310.06123 [pdf, other]

Text-driven Prompt Generation for Vision-Language Models in Federated Learning

Authors: Chen Qiu, Xingyu Li, Chaithanya Kumar Mummadi, Madan Ravi Ganesh, Zhenzhen Li, Lu Peng, Wan-Yi Lin

Abstract: Prompt learning for vision-language models, e.g., CoOp, has shown great success in adapting CLIP to different downstream tasks, making it a promising solution for federated learning due to computational reasons. Existing prompt learning techniques replace hand-crafted text prompts with learned vectors that offer improvements on seen classes, but struggle to generalize to unseen classes. Our work a… ▽ More Prompt learning for vision-language models, e.g., CoOp, has shown great success in adapting CLIP to different downstream tasks, making it a promising solution for federated learning due to computational reasons. Existing prompt learning techniques replace hand-crafted text prompts with learned vectors that offer improvements on seen classes, but struggle to generalize to unseen classes. Our work addresses this challenge by proposing Federated Text-driven Prompt Generation (FedTPG), which learns a unified prompt generation network across multiple remote clients in a scalable manner. The prompt generation network is conditioned on task-related text input, thus is context-aware, making it suitable to generalize for both seen and unseen classes. Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods, that achieve overall better generalization on both seen and unseen classes and is also generalizable to unseen datasets. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.09276 [pdf, other]

MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification

Authors: Junjie Zhu, Yiying Li, Chunping Qiu, Ke Yang, Naiyang Guan, Xiaodong Yi

Abstract: Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in t… ▽ More Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in the remote sensing domain. In order to tackle these issues, we turn to the recently proposed parameter-efficient tuning methods, such as VPT, which updates only the newly added prompt parameters while keeping the pre-trained backbone frozen. Inspired by VPT, we propose the Meta Visual Prompt Tuning (MVP) method. Specifically, we integrate the VPT method into the meta-learning framework and tailor it to the remote sensing domain, resulting in an efficient framework for Few-Shot Remote Sensing Scene Classification (FS-RSSC). Furthermore, we introduce a novel data augmentation strategy based on patch embedding recombination to enhance the representation and diversity of scenes for classification purposes. Experiment results on the FS-RSSC benchmark demonstrate the superior performance of the proposed MVP over existing methods in various settings, such as various-way-various-shot, various-way-one-shot, and cross-domain adaptation. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: SUBMIT TO IEEE TRANSACTIONS

arXiv:2308.16425 [pdf, other]

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

Authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu

Abstract: Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this,… ▽ More Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted by Workshop on High-dimensional Learning Dynamics, ICML 2023, Honolulu, Hawaii

arXiv:2308.15854 [pdf, other]

Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models

Authors: Zhanbo Feng, Zenan Ling, Ci Gong, Feng Zhou, Jie Li, Robert C. Qiu

Abstract: Denoising diffusion models have shown outstanding performance in image editing. Existing works tend to use either image-guided methods, which provide a visual reference but lack control over semantic coherence, or text-guided methods, which ensure faithfulness to text guidance but lack visual quality. To address the problem, we propose the Zero-shot Inversion Process (ZIP), a framework that inject… ▽ More Denoising diffusion models have shown outstanding performance in image editing. Existing works tend to use either image-guided methods, which provide a visual reference but lack control over semantic coherence, or text-guided methods, which ensure faithfulness to text guidance but lack visual quality. To address the problem, we propose the Zero-shot Inversion Process (ZIP), a framework that injects a fusion of generated visual reference and text guidance into the semantic latent space of a \textit{frozen} pre-trained diffusion model. Only using a tiny neural network, the proposed ZIP produces diverse content and attributes under the intuitive control of the text prompt. Moreover, ZIP shows remarkable robustness for both in-domain and out-of-domain attribute manipulation on real images. We perform detailed experiments on various benchmark datasets. Compared to state-of-the-art methods, ZIP produces images of equivalent quality while providing a realistic editing effect. △ Less

Submitted 10 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.08283 [pdf, other]

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Authors: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

Abstract: Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in perfo… ▽ More Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 8 pages

arXiv:2307.11079 [pdf, other]

doi 10.1145/3580305.3599238

3D-IDS: Doubly Disentangled Dynamic Intrusion Detection

Authors: Chenyang Qiu, Yingsheng Geng, Junrui Lu, Kaida Chen, Shitong Zhu, Ya Su, Guoshun Nan, Can Zhang, Junsong Fu, Qimei Cui, Xiaofeng Tao

Abstract: Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SV… ▽ More Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: Accepted and appeared in the proceedings of the KDD 2023 Research Track

arXiv:2307.01441 [pdf, other]

A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games

Authors: Chen Qiu, Yulin Wu, Weixin Huang, Botao Liu, Shaohuai Shi, Xuan Wang

Abstract: Many recent practical and theoretical breakthroughs focus on adversarial team multi-player games (ATMGs) in ex ante correlation scenarios. In this setting, team members are allowed to coordinate their strategies only before the game starts. Although there existing algorithms for solving extensive-form ATMGs, the size of the game tree generated by the previous algorithms grows exponentially with th… ▽ More Many recent practical and theoretical breakthroughs focus on adversarial team multi-player games (ATMGs) in ex ante correlation scenarios. In this setting, team members are allowed to coordinate their strategies only before the game starts. Although there existing algorithms for solving extensive-form ATMGs, the size of the game tree generated by the previous algorithms grows exponentially with the number of players. Therefore, how to deal with large-scale zero-sum extensive-form ATMGs problems close to the real world is still a significant challenge. In this paper, we propose a generic multi-player transformation algorithm, which can transform any multi-player game tree satisfying the definition of AMTGs into a 2-player game tree, such that finding a team-maxmin equilibrium with correlation (TMECor) in large-scale ATMGs can be transformed into solving NE in 2-player games. To achieve this goal, we first introduce a new structure named private information pre-branch, which consists of a temporary chance node and coordinator nodes and aims to make decisions for all potential private information on behalf of the team members. We also show theoretically that NE in the transformed 2-player game is equivalent TMECor in the original multi-player game. This work significantly reduces the growth of action space and nodes from exponential to constant level. This enables our work to outperform all the previous state-of-the-art algorithms in finding a TMECor, with 182.89, 168.47, 694.44, and 233.98 significant improvements in the different Kuhn Poker and Leduc Poker cases (21K3, 21K4, 21K6 and 21L33). In addition, this work first practically solves the ATMGs in a 5-player case which cannot be conducted by existing algorithms. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 9 pages, 5 figures, NIPS 2023

arXiv:2306.14102 [pdf, ps, other]

Intelligent Reflecting Surface Empowered Self-Interference Cancellation in Full-Duplex Systems

Authors: Chi Qiu, Meng Hua, Qingqing Wu, Wen Chen, Shaodan Ma, Fen Hou, Derrick Wing Kwan Ng, A. Lee Swindlehurst

Abstract: Compared with traditional half-duplex wireless systems, the application of emerging full-duplex (FD) technology can potentially double the system capacity theoretically. However, conventional techniques for suppressing self-interference (SI) adopted in FD systems require exceedingly high power consumption and expensive hardware. In this paper, we consider employing an intelligent reflecting surfac… ▽ More Compared with traditional half-duplex wireless systems, the application of emerging full-duplex (FD) technology can potentially double the system capacity theoretically. However, conventional techniques for suppressing self-interference (SI) adopted in FD systems require exceedingly high power consumption and expensive hardware. In this paper, we consider employing an intelligent reflecting surface (IRS) in the proximity of an FD base station (BS) to mitigate SI for simultaneously receiving data from uplink users and transmitting information to downlink users. The objective considered is to maximize the weighted sum-rate of the system by jointly optimizing the IRS phase shifts, the BS transmit beamformers, and the transmit power of the uplink users. To visualize the role of the IRS in SI cancellation by isolating other interference, we first study a simple scenario with one downlink user and one uplink user. To address the formulated non-convex problem, a low-complexity algorithm based on successive convex approximation is proposed. For the more general case considering multiple downlink and uplink users, an efficient alternating optimization algorithm based on element-wise optimization is proposed. Numerical results demonstrate that the FD system with the proposed schemes can achieve a larger gain over the half-duplex system, and the IRS is able to achieve a balance between suppressing SI and providing beamforming gain. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.00544 [pdf, other]

Codebook Configuration for RIS-aided Systems via Implicit Neural Representations

Authors: Huiying Yang, Rujing Xiong, Yao Xiao, Zhijie Fan, Tiebin Mi, Robert Caiming Qiu, Zenan Ling

Abstract: Reconfigurable Intelligent Surface (RIS) is envisioned to be an enabling technique in 6G wireless communications. By configuring the reflection beamforming codebook, RIS focuses signals on target receivers to enhance signal strength. In this paper, we investigate the codebook configuration for RIS-aided communication systems. We formulate an implicit relationship between user's coordinates informa… ▽ More Reconfigurable Intelligent Surface (RIS) is envisioned to be an enabling technique in 6G wireless communications. By configuring the reflection beamforming codebook, RIS focuses signals on target receivers to enhance signal strength. In this paper, we investigate the codebook configuration for RIS-aided communication systems. We formulate an implicit relationship between user's coordinates information and the codebook from the perspective of signal radiation mechanisms, and introduce a novel learning-based method, implicit neural representations (INRs), to solve this implicit coordinates-to-codebook mapping problem. Our approach requires only user's coordinates, avoiding reliance on channel models. Additionally, given the significant practical applications of the 1-bit RIS, we formulate the 1-bit codebook configuration as a multi-label classification problem, and propose an encoding strategy for 1-bit RIS to reduce the codebook dimension, thereby improving learning efficiency. Experimental results from simulations and measured data demonstrate significant advantages of our method. △ Less

Submitted 28 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2306.00303 [pdf, other]

Sea Ice Extraction via Remote Sensed Imagery: Algorithms, Datasets, Applications and Challenges

Authors: Anzhu Yu, Wenjun Huang, Qing Xu, Qun Sun, Wenyue Guo, Song Ji, Bowei Wen, Chunping Qiu

Abstract: The deep learning, which is a dominating technique in artificial intelligence, has completely changed the image understanding over the past decade. As a consequence, the sea ice extraction (SIE) problem has reached a new era. We present a comprehensive review of four important aspects of SIE, including algorithms, datasets, applications, and the future trends. Our review focuses on researches publ… ▽ More The deep learning, which is a dominating technique in artificial intelligence, has completely changed the image understanding over the past decade. As a consequence, the sea ice extraction (SIE) problem has reached a new era. We present a comprehensive review of four important aspects of SIE, including algorithms, datasets, applications, and the future trends. Our review focuses on researches published from 2016 to the present, with a specific focus on deep learning-based approaches in the last five years. We divided all relegated algorithms into 3 categories, including classical image segmentation approach, machine learning-based approach and deep learning-based methods. We reviewed the accessible ice datasets including SAR-based datasets, the optical-based datasets and others. The applications are presented in 4 aspects including climate research, navigation, geographic information systems (GIS) production and others. It also provides insightful observations and inspiring future research directions. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: 24 pages, 6 figures

arXiv:2305.06603 [pdf, other]

Realistic Safety-critical Scenarios Search for Autonomous Driving System via Behavior Tree

Authors: Ping Zhang, Lingfeng Ming, Tingyi Yuan, Cong Qiu, Yang Li, Xinhua Hui, Zhiquan Zhang, Chao Huang

Abstract: The simulation-based testing of Autonomous Driving Systems (ADSs) has gained significant attention. However, current approaches often fall short of accurately assessing ADSs for two reasons: over-reliance on expert knowledge and the utilization of simplistic evaluation metrics. That leads to discrepancies between simulated scenarios and naturalistic driving environments. To address this, we propos… ▽ More The simulation-based testing of Autonomous Driving Systems (ADSs) has gained significant attention. However, current approaches often fall short of accurately assessing ADSs for two reasons: over-reliance on expert knowledge and the utilization of simplistic evaluation metrics. That leads to discrepancies between simulated scenarios and naturalistic driving environments. To address this, we propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios. Our approach involves the $log2BT$ method, which abstracts logged road-users' trajectories to behavior sequences. Furthermore, we vary the properties of behaviors from real-world driving distributions and then use an adaptive algorithm to explore the input space. Meanwhile, we design a general evaluation engine that guides the algorithm toward critical areas, thus reducing the generation of invalid scenarios. Our approach is demonstrated in our Matrix Simulator. The experimental results show that: (1) Our $log2BT$ achieves satisfactory trajectory reconstructions. (2) Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm. Specifically, it improves the ratio of the critical violations to total scenarios and the ratio of the types to total scenarios by at least 10x and 5x, respectively, while reducing the ratio of the invalid scenarios to total scenarios by at least 58% in two case studies. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Showing 1–50 of 151 results for author: Qiu, C