Search | arXiv e-print repository

Probing band topology in ABAB and ABBA stacked twisted double bilayer graphene

Authors: Jundong Zhu, Le Liu, Yalong Yuan, Jinwei Dong, Yanbang Chu, Luojun Du, Kenji Watanabe, Takashi Taniguchi, Jianpeng Liu, Quansheng Wu, Dongxia Shi, Wei Yang, Guangyu Zhang

Abstract: Twisted graphene moire superlattice has been demonstrated as an exotic platform for investigating correlated states and nontrivial topology. Among the moire family, twisted double bilayer graphene (TDBG) is a tunable flat band system expected to show stacking-dependent topological properties. However, electron correlations and the band topology are usually intertwined in the flat band limit, rende… ▽ More Twisted graphene moire superlattice has been demonstrated as an exotic platform for investigating correlated states and nontrivial topology. Among the moire family, twisted double bilayer graphene (TDBG) is a tunable flat band system expected to show stacking-dependent topological properties. However, electron correlations and the band topology are usually intertwined in the flat band limit, rendering the unique topological property due to stacking still elusive. Focusing on a large-angle TDBG with weak electron correlations, here we probe the Landau level (LL) spectra in two differently stacked TDBG, i.e. ABBA- and ABAB-TDBG, to unveil their distinct topological properties. For ABBA-TDBG, we observe non-trivial topology at zero electric displacement filed, evident from both the emergence of Chern bands from half fillings and the closure of gap at CNP above a critical magnetic field. For ABAB-TDBG, by contrast, we find that the moire band is topologically trivial, supported by the absence of LLs from half fillings and the persistence of the gap at CNP above the critical magnetic fields. In addition, we also observe an evolution of the trivial-to-nontrivial topological transition at finite D fields, confirmed by the emerged Landau fans originating from quarter filling v = 1. Our result demonstrates, for the first time, the unique stacking-dependent topology in TDBG, offering a promising avenue for future investigations on topological states in correlated systems. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: 12 pages, 5 figures. Comments are welcome

arXiv:2409.10197 [pdf, other]

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Authors: Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou

Abstract: Recent progress in Multimodal Large Language Models(MLLMs) often use large image tokens to compensate the visual shortcoming of MLLMs, which not only exhibits obvious redundancy but also greatly exacerbates the already high computation. Token pruning is an effective solution for speeding up MLLMs, but when and how to drop tokens still remains a challenge. In this paper, we propose a novel and trai… ▽ More Recent progress in Multimodal Large Language Models(MLLMs) often use large image tokens to compensate the visual shortcoming of MLLMs, which not only exhibits obvious redundancy but also greatly exacerbates the already high computation. Token pruning is an effective solution for speeding up MLLMs, but when and how to drop tokens still remains a challenge. In this paper, we propose a novel and training-free approach for the effective visual token pruning of MLLMs, termed FitPrune, which can quickly produce a complete pruning recipe for MLLMs according to a pre-defined budget. Specifically, FitPrune considers token pruning as a statistical problem of MLLM and its objective is to find out an optimal pruning scheme that can minimize the divergence of the attention distributions before and after pruning. In practice, FitPrune can be quickly accomplished based on the attention statistics from a small batch of inference data, avoiding the expensive trials of MLLMs. According to the pruning recipe, an MLLM can directly remove the redundant visual tokens of different examples during inference. To validate FitPrune, we apply it to a set of recent MLLMs, including LLaVA-1.5, LLaVA-HR and LLaVA-NEXT, and conduct extensive experiments on a set of benchmarks. The experimental results show that our FitPrune can not only reduce the computational complexity to a large extent, while retaining high performance, e.g., -54.9% FLOPs for LLaVA-NEXT with only 0.5% accuracy drop. Notably, the pruning recipe can be obtained in about 5 minutes. Our code is available at https://github.com/ywh187/FitPrune. △ Less

Submitted 16 September, 2024; originally announced September 2024.

arXiv:2409.09468 [pdf, other]

High-Energy and Ultra-High-Energy Neutrinos from Primordial Black Holes

Authors: Quan-feng Wu, Xun-Jie Xu

Abstract: Primordial Black Holes (PBHs) are capable of emitting extremely energetic particles independent of their interactions with the Standard Model. In this work, we investigate a particularly interesting scenario in which PBHs evaporating in the early universe may be responsible for some of the observed high-energy neutrinos above the TeV or PeV scale in the present universe. We compute the energy spec… ▽ More Primordial Black Holes (PBHs) are capable of emitting extremely energetic particles independent of their interactions with the Standard Model. In this work, we investigate a particularly interesting scenario in which PBHs evaporating in the early universe may be responsible for some of the observed high-energy neutrinos above the TeV or PeV scale in the present universe. We compute the energy spectrum of neutrinos directly emitted by PBHs with a monochromatic mass function and estimate the wash-out point, which determines the maximum energy of the spectrum. We find that the spectrum generally extends to high energies following a power law of $E_ν^{-3}$ until it reaches the wash-out point, which crucially depends on the PBH mass. For PBHs of $10^{13}$ grams, the spectrum can extend up to the PeV scale, though the flux is too low for detection. We also consider an indirect production mechanism involving dark particles that are emitted by PBHs and decay into neutrinos at a much later epoch. This mechanism allows lighter (such as those in the gram to kilogram range) PBHs to produce more energetic neutrino fluxes without being washed out by the thermal plasma in the early universe. In this scenario, we find that ultra-high-energy neutrinos around or above the EeV scale can be generated, with sufficiently high fluxes detectable by current and future high-energy neutrino observatories such as IceCube and GRAND. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: 23 pages, 4 figures

arXiv:2409.09111 [pdf, other]

Neural Message Passing Induced by Energy-Constrained Diffusion

Authors: Qitian Wu, David Wipf, Junchi Yan

Abstract: Learning representations for structured data with certain geometries (observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, we propose an energy-constrained diffusion model as a principled interpretable framework for understanding the mechanism of MPNNs and navigating novel architectural d… ▽ More Learning representations for structured data with certain geometries (observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, we propose an energy-constrained diffusion model as a principled interpretable framework for understanding the mechanism of MPNNs and navigating novel architectural designs. The model, inspired by physical systems, combines the inductive bias of diffusion on manifolds with layer-wise constraints of energy minimization. As shown by our analysis, the diffusion operators have a one-to-one correspondence with the energy functions implicitly descended by the diffusion process, and the finite-difference iteration for solving the energy-constrained diffusion system induces the propagation layers of various types of MPNNs operated on observed or latent structures. On top of these findings, we devise a new class of neural message passing models, dubbed as diffusion-inspired Transformers, whose global attention layers are induced by the principled energy-constrained diffusion. Across diverse datasets ranging from real-world networks to images and physical particles, we show that the new model can yield promising performance for cases where the data structures are observed (as a graph), partially observed or completely unobserved. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Extended version from DIFFormer paper in ICLR2023. arXiv admin note: text overlap with arXiv:2301.09474

arXiv:2409.09007 [pdf, other]

SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity

Authors: Qitian Wu, Kai Yang, Hengrui Zhang, David Wipf, Junchi Yan

Abstract: Learning representations on large graphs is a long-standing challenge due to the inter-dependence nature. Transformers recently have shown promising performance on small graphs thanks to its global attention for capturing all-pair interactions beyond observed structures. Existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated architectur… ▽ More Learning representations on large graphs is a long-standing challenge due to the inter-dependence nature. Transformers recently have shown promising performance on small graphs thanks to its global attention for capturing all-pair interactions beyond observed structures. Existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated architectures by stacking deep attention-based propagation layers. In this paper, we attempt to evaluate the necessity of adopting multi-layer attentions in Transformers on graphs, which considerably restricts the efficiency. Specifically, we analyze a generic hybrid propagation layer, comprised of all-pair attention and graph-based propagation, and show that multi-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs, particularly through simplifying model architectures without sacrificing expressiveness. As exemplified by this work, we propose a Simplified Single-layer Graph Transformers (SGFormer), whose main component is a single-layer global attention that scales linearly w.r.t. graph sizes and requires none of any approximation for accommodating all-pair interactions. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M, yielding orders-of-magnitude inference acceleration over peer Transformers on medium-sized graphs, and demonstrates competitiveness with limited labeled data. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Extended version of NeurIPS2023 contribution arXiv:2306.10759

arXiv:2409.07964 [pdf, other]

WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

Authors: Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang

Abstract: Wireless networks are increasingly facing challenges due to their expanding scale and complexity. These challenges underscore the need for advanced AI-driven strategies, particularly in the upcoming 6G networks. In this article, we introduce WirelessAgent, a novel approach leveraging large language models (LLMs) to develop AI agents capable of managing complex tasks in wireless networks. It can ef… ▽ More Wireless networks are increasingly facing challenges due to their expanding scale and complexity. These challenges underscore the need for advanced AI-driven strategies, particularly in the upcoming 6G networks. In this article, we introduce WirelessAgent, a novel approach leveraging large language models (LLMs) to develop AI agents capable of managing complex tasks in wireless networks. It can effectively improve network performance through advanced reasoning, multimodal data processing, and autonomous decision making. Thereafter, we demonstrate the practical applicability and benefits of WirelessAgent for network slicing management. The experimental results show that WirelessAgent is capable of accurately understanding user intent, effectively allocating slice resources, and consistently maintaining optimal performance. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.07946 [pdf, ps, other]

Collaborative Automatic Modulation Classification via Deep Edge Inference for Hierarchical Cognitive Radio Networks

Authors: Chaowei He, Peihao Dong, Fuhui Zhou, Qihui Wu

Abstract: In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the transmission overhead, data privacy, and computation load. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to rea… ▽ More In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the transmission overhead, data privacy, and computation load. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to realize the collaborative automatic modulation classification (C-AMC) between them. A spectrum semantic compression neural network (SSCNet) with the lightweight structure is designed for the edge device to compress the collected raw data into a compact semantic message that is then sent to the edge server via the wireless channel. On the edge server side, a modulation classification neural network (MCNet) combining bidirectional long short-term memory (Bi-LSTM) and multi-head attention layers is elaborated to determine the modulation type from the noisy semantic message. By leveraging the computation resources of both the edge device and the edge server, high transmission overhead and risks of data privacy leakage are avoided. The simulation results verify the effectiveness of the proposed C-AMC framework, significantly reducing the model size and computational complexity. △ Less

Submitted 14 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: arXiv admin note: text overlap with arXiv:2407.20772

arXiv:2409.07816 [pdf]

Unveiling the 5$f$ electron hybridization process in UPd$_2$Al$_3$ via ARPES and Time-resolved PES

Authors: Jiao-Jiao Song, Qi-Yi Wu, Chen Zhang, Steve M. Gilbertson, Peter S. Riseborough, Jan Rusz, John J. Joyce, Kevin S. Graham, Clifford G. Olson, Paul H. Tobash, Eric D. Bauer, Bo Chen, Hao Liu, Yu-Xia Duan, Peter M. Oppeneer, George Rodriguez, Tomasz Durakiewicz, Jian-Qiao Meng

Abstract: This study investigates the 5$f$-electron-conduction electron hybridization process in the heavy fermion superconductor UPd$_2$Al$_3$ using a combination of angle-resolved photoemission spectroscopy (ARPES) and time-resolved photoemission spectroscopy (tr-PES). ARPES measurements reveal the formation of a hybridization gap at a temperature of approximately 75 K, which becomes more pronounced as th… ▽ More This study investigates the 5$f$-electron-conduction electron hybridization process in the heavy fermion superconductor UPd$_2$Al$_3$ using a combination of angle-resolved photoemission spectroscopy (ARPES) and time-resolved photoemission spectroscopy (tr-PES). ARPES measurements reveal the formation of a hybridization gap at a temperature of approximately 75 K, which becomes more pronounced as the temperature decreases. Notably, the persistence of a flat U 5$f$ band at temperatures well above the hybridization onset challenges conventional understanding. Our findings demonstrate a non-monotonic temperature dependence of the quasiparticle relaxation time, with an anomalous decrease at 20 K, suggesting complex electronic and magnetic interactions. These findings provide detailed insights into the 5$f$-electron hybridization process in UPd$_2$Al$_3$, with significant implications for the understanding of heavy fermion superconductivity and the role of 5$f$-electron hybridization in uranium-based materials. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 5 pages, 4 figures

arXiv:2409.07152 [pdf, ps, other]

Weighted bounds for a class of singular integral operators in variable exponent Herz-Morrey spaces

Authors: Yanqi Yang, Qi Wu

Abstract: Let T be the singular integral operator with variable kernel defined by $Tf(x)= p.v. \int_{\mathbb{R}^{n}}K(x,x-y)f(y)\mathrm{d}y$ and $D^γ(0\leqγ\leq1)$ be the fractional differentiation operator, where $K(x,z)=\frac{Ω(x,z')}{|z|^{n}}$, $z'=\frac{z}{|z|},~~z\neq0$. Let $~T^{\ast}~$and $~T^\sharp~$ be the adjoint of $T$ and the pseudo-adjoint of $T$, respectively. In this paper, via the expansion… ▽ More Let T be the singular integral operator with variable kernel defined by $Tf(x)= p.v. \int_{\mathbb{R}^{n}}K(x,x-y)f(y)\mathrm{d}y$ and $D^γ(0\leqγ\leq1)$ be the fractional differentiation operator, where $K(x,z)=\frac{Ω(x,z')}{|z|^{n}}$, $z'=\frac{z}{|z|},~~z\neq0$. Let $~T^{\ast}~$and $~T^\sharp~$ be the adjoint of $T$ and the pseudo-adjoint of $T$, respectively. In this paper, via the expansion of spherical harmonics and the estimates of the convolution operators $T_{m,j}$, we shall prove some boundedness results for $TD^γ-D^γT$ and $(T^{\ast}-T^{\sharp})D^γ$ under natural regularity assumptions on the exponent function on a class of generalized Herz-Morrey spaces with weight and variable exponent, which extend some known results. Moreover, various norm characterizations for the product $T_{1}T_{2}$ and the pseudo-product $T_{1}\circ T_{2}$ are also established. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.06733 [pdf, ps, other]

Is the type-D NUT C-metric really "missing" from the most general Plebański-Demiański solution?

Authors: Shuang-Qing Wu, Di Wu

Abstract: It remains a long-standing problem, unsettled for almost two decades in General Relativity (GR) community, ever since Griffiths and Podolsky demonstrated in Ref. [J.B. Griffiths and J. Podolsky, Class. Quant. Grav. 22, 3467 (2005)] that the type-D NUT C-metric seems to be absent from the most general family of the type-D Plebański-Demiański (P-D) solution. However, Astorino [Phys. Rev. D 109, 0840… ▽ More It remains a long-standing problem, unsettled for almost two decades in General Relativity (GR) community, ever since Griffiths and Podolsky demonstrated in Ref. [J.B. Griffiths and J. Podolsky, Class. Quant. Grav. 22, 3467 (2005)] that the type-D NUT C-metric seems to be absent from the most general family of the type-D Plebański-Demiański (P-D) solution. However, Astorino [Phys. Rev. D 109, 084038 (2024)] presented a different form of rotating and accelerating black holes and showed that all known four-dimensional type-D accelerating black holes (without the NUT charge) can be recovered via various different limits in a definitive fashion. In particular, he provided, for the first time, the correct expressions for the type-D static accelerating black holes with a nonzero NUT charge, which was previously impossible using the traditional parametrization of the familiar P-D solution. Nevertheless, it still remains elusive that how these two different forms of the four-dimensional rotating and accelerating solutions are related. In this paper, we aim to fill this gap by finding the obvious coordinate transformations and parameter identifications between the vacuum metrics after two different parameterizations of the generated solution via the inverse scattering method from the seed metric -- the Rindler vacuum background. We then resolve this ``missing" puzzle by providing another Möbius transformation and linear combinations of the Killing coordinates, which clearly cast the type-D NUT C-metric into the familiar form of the P-D solution. Additionally, we propose an alternative new routine for the normalization of the obtained metric derived via the inverse scattering method from the vacuum seed solution, which could be potentially useful for the construction of higher-dimensional solutions using the trivial vacuum background as the seed metric. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 16 pages, No figure, revtex4-1. During the arXiv's onhold of our submission (submit/5815878) on August 27, 2409.02308 appears with a partial overlap with the subject of our paper

arXiv:2409.06539 [pdf, other]

Exploring the nature of $Y(4230)$ and $Y(4360)$ in B decays

Authors: Ming-Zhu Liu, Qi Wu

Abstract: The vector charmonium states can be directly produced at the $e^+e^{-}$ annihilation process. Among them, $Y(4230)$ and $Y(4360)$ splitting from the previously discovered $Y(4260)$ are not easily arranged into the conventional charmonium spectrum, while the recent studies indicated that they have strong couplings to $D\bar{D}_1$ and $D^*\bar{D}_1$. In this work, we investigate the productions of… ▽ More The vector charmonium states can be directly produced at the $e^+e^{-}$ annihilation process. Among them, $Y(4230)$ and $Y(4360)$ splitting from the previously discovered $Y(4260)$ are not easily arranged into the conventional charmonium spectrum, while the recent studies indicated that they have strong couplings to $D\bar{D}_1$ and $D^*\bar{D}_1$. In this work, we investigate the productions of $Y(4230)$ and $Y(4360)$ as the heavy quark spin symmetry doublet hadronic molecules of $D\bar{D}_1$ and $D^*\bar{D}_1$ in $B$ decays via the triangle diagram mechanism. In particular, we propose that the decay constants of $Y(4230)$ and $Y(4360)$ extracted in $B$ decays are helpful to clarify their nature. △ Less

Submitted 11 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: Typos corrected

arXiv:2409.05508 [pdf, other]

A general reduced-order neural operator for spatio-temporal predictive learning on complex spatial domains

Authors: Qinglu Meng, Yingguang Li, Zhiliang Deng, Xu Liu, Gengxiang Chen, Qiutong Wu, Changqing Liu, Xiaozhong Hao

Abstract: Predictive learning for spatio-temporal processes (PL-STP) on complex spatial domains plays a critical role in various scientific and engineering fields, with its essence being the construction of operators between infinite-dimensional function spaces. This paper focuses on the unequal-domain mappings in PL-STP and categorising them into increase-domain and decrease-domain mapping. Recent advances… ▽ More Predictive learning for spatio-temporal processes (PL-STP) on complex spatial domains plays a critical role in various scientific and engineering fields, with its essence being the construction of operators between infinite-dimensional function spaces. This paper focuses on the unequal-domain mappings in PL-STP and categorising them into increase-domain and decrease-domain mapping. Recent advances in deep learning have revealed the great potential of neural operators (NOs) to learn operators directly from observational data. However, existing NOs require input space and output space to be the same domain, which pose challenges in ensuring predictive accuracy and stability for unequal-domain mappings. To this end, this study presents a general reduced-order neural operator named Reduced-Order Neural Operator on Riemannian Manifolds (RO-NORM), which consists of two parts: the unequal-domain encoder/decoder and the same-domain approximator. Motivated by the variable separation in classical modal decomposition, the unequal-domain encoder/decoder uses the pre-computed bases to reformulate the spatio-temporal function as a sum of products between spatial (or temporal) bases and corresponding temporally (or spatially) distributed weight functions, thus the original unequal-domain mapping can be converted into a same-domain mapping. Consequently, the same-domain approximator NORM is applied to model the transformed mapping. The performance of our proposed method has been evaluated on six benchmark cases, including parametric PDEs, engineering and biomedical applications, and compared with four baseline algorithms: DeepONet, POD-DeepONet, PCA-Net, and vanilla NORM. The experimental results demonstrate the superiority of RO-NORM in prediction accuracy and training efficiency for PL-STP. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05462 [pdf, ps, other]

Federated Transfer Learning Based Cooperative Wideband Spectrum Sensing with Model Pruning

Authors: Jibin Jia, Peihao Dong, Fuhui Zhou, Qihui Wu

Abstract: For ultra-wideband and high-rate wireless communication systems, wideband spectrum sensing (WSS) is critical, since it empowers secondary users (SUs) to capture the spectrum holes for opportunistic transmission. However, WSS encounters challenges such as excessive costs of hardware and computation due to the high sampling rate, as well as robustness issues arising from scenario mismatch. In this p… ▽ More For ultra-wideband and high-rate wireless communication systems, wideband spectrum sensing (WSS) is critical, since it empowers secondary users (SUs) to capture the spectrum holes for opportunistic transmission. However, WSS encounters challenges such as excessive costs of hardware and computation due to the high sampling rate, as well as robustness issues arising from scenario mismatch. In this paper, a WSS neural network (WSSNet) is proposed by exploiting multicoset preprocessing to enable the sub-Nyquist sampling, with the two dimensional convolution design specifically tailored to work with the preprocessed samples. A federated transfer learning (FTL) based framework mobilizing multiple SUs is further developed to achieve a robust model adaptable to various scenarios, which is paved by the selective weight pruning for the fast model adaptation and inference. Simulation results demonstrate that the proposed FTL-WSSNet achieves the fairly good performance in different target scenarios even without local adaptation samples. △ Less

Submitted 13 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05243 [pdf, other]

Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations

Authors: Xinran Li, Xiaomao Fan, Qingyang Wu, Xiaojiang Peng, Ye Li

Abstract: Emotion Recognition in Conversations (ERCs) is a vital area within multimodal interaction research, dedicated to accurately identifying and classifying the emotions expressed by speakers throughout a conversation. Traditional ERC approaches predominantly rely on unimodal cues\-such as text, audio, or visual data\-leading to limitations in their effectiveness. These methods encounter two significan… ▽ More Emotion Recognition in Conversations (ERCs) is a vital area within multimodal interaction research, dedicated to accurately identifying and classifying the emotions expressed by speakers throughout a conversation. Traditional ERC approaches predominantly rely on unimodal cues\-such as text, audio, or visual data\-leading to limitations in their effectiveness. These methods encounter two significant challenges: 1) Consistency in multimodal information. Before integrating various modalities, it is crucial to ensure that the data from different sources is aligned and coherent. 2) Contextual information capture. Successfully fusing multimodal features requires a keen understanding of the evolving emotional tone, especially in lengthy dialogues where emotions may shift and develop over time. To address these limitations, we propose a novel Mamba-enhanced Text-Audio-Video alignment network (MaTAV) for the ERC task. MaTAV is with the advantages of aligning unimodal features to ensure consistency across different modalities and handling long input sequences to better capture contextual multimodal information. The extensive experiments on the MELD and IEMOCAP datasets demonstrate that MaTAV significantly outperforms existing state-of-the-art methods on the ERC task with a big margin. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2409.04763 [pdf]

Chalcogenide Metasurfaces Enabling Ultra-Wideband Detectors from Visible to Mid-infrared

Authors: Shutao Zhang, Shu An, Mingjin Dai, Qing Yang Steve Wu, Nur Qalishah Adanan, Jun Zhang, Yan Liu, Henry Yit Loong Lee, Nancy Lai Mun Wong, Ady Suwardi, Jun Ding, Robert Edward Simpson, Qi Jie Wang, Joel K. W. Yang, Zhaogang Dong

Abstract: Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this p… ▽ More Thermoelectric materials can be designed to support optical resonances across multiple spectral ranges to enable ultra-wide band photodetection. For instance, antimony telluride (Sb2Te3) chalcogenide exhibits interband plasmonic resonances in the visible range and Mie resonances in the mid-infrared (mid-IR) range, while simultaneously possessing large thermoelectric Seebeck coefficients. In this paper, we designed and fabricated Sb2Te3 metasurface devices to achieve resonant absorption for enabling photodetectors operating across an ultra-wideband spectrum, from visible to mid-IR. Furthermore, relying on asymmetric Sb2Te3 metasurface, we demonstrated the thermoelectric photodetectors with polarization-selectivity. This work provides a potential platform towards the portable ultrawide band spectrometers at room temperature, for environmental sensing applications. △ Less

Submitted 7 September, 2024; originally announced September 2024.

arXiv:2409.03447 [pdf, ps, other]

Some negative answers to the Bergelson-Hindman's question

Authors: Qinqi Wu

Abstract: Let $p_1,\dots,p_d$ be integral polynomials vanishing at $0$. It was asked by Bergelson and Hindman whenever $A$ is large, whether the set $\{(m,n)\in \mathbb{N}^2:m+p_1(n),m+p_2(n),\dots,m+p_d(n)\in A\}$ be large in the same sense. In this paper, we give negative answers to this question when ``large'' being the notions of ``central*'', ``IP*'', ``IP$_n$*'', ``IP$_{<ω}$*'' and ``$Δ$*''. Let $p_1,\dots,p_d$ be integral polynomials vanishing at $0$. It was asked by Bergelson and Hindman whenever $A$ is large, whether the set $\{(m,n)\in \mathbb{N}^2:m+p_1(n),m+p_2(n),\dots,m+p_d(n)\in A\}$ be large in the same sense. In this paper, we give negative answers to this question when ``large'' being the notions of ``central*'', ``IP*'', ``IP$_n$*'', ``IP$_{<ω}$*'' and ``$Δ$*''. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03354 [pdf, other]

Few-Shot Continual Learning for Activity Recognition in Classroom Surveillance Images

Authors: Yilei Qian, Kanglei Geng, Kailong Chen, Shaoxu Cheng, Linfeng Xu, Hongliang Li, Fanman Meng, Qingbo Wu

Abstract: The application of activity recognition in the "AI + Education" field is gaining increasing attention. However, current work mainly focuses on the recognition of activities in manually captured videos and a limited number of activity types, with little attention given to recognizing activities in surveillance images from real classrooms. In real classroom settings, normal teaching activities such… ▽ More The application of activity recognition in the "AI + Education" field is gaining increasing attention. However, current work mainly focuses on the recognition of activities in manually captured videos and a limited number of activity types, with little attention given to recognizing activities in surveillance images from real classrooms. In real classroom settings, normal teaching activities such as reading, account for a large proportion of samples, while rare non-teaching activities such as eating, continue to appear. This requires a model that can learn non-teaching activities from few samples without forgetting the normal teaching activities, which necessitates fewshot continual learning (FSCL) capability. To address this gap, we constructed a continual learning dataset focused on classroom surveillance image activity recognition called ARIC (Activity Recognition in Classroom). The dataset has advantages such as multiple perspectives, a wide variety of activities, and real-world scenarios, but it also presents challenges like similar activities and imbalanced sample distribution. To overcome these challenges, we designed a few-shot continual learning method that combines supervised contrastive learning (SCL) and an adaptive covariance classifier (ACC). During the base phase, we proposed a SCL approach based on feature augmentation to enhance the model's generalization ability. In the incremental phase, we employed an ACC to more accurately describe the distribution of new classes. Experimental results demonstrate that our method outperforms other existing methods on the ARIC dataset. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03294 [pdf, other]

Federated Prototype-based Contrastive Learning for Privacy-Preserving Cross-domain Recommendation

Authors: Li Wang, Quangui Zhang, Lei Sang, Qiang Wu, Min Xu

Abstract: Cross-domain recommendation (CDR) aims to improve recommendation accuracy in sparse domains by transferring knowledge from data-rich domains. However, existing CDR methods often assume the availability of user-item interaction data across domains, overlooking user privacy concerns. Furthermore, these methods suffer from performance degradation in scenarios with sparse overlapping users, as they ty… ▽ More Cross-domain recommendation (CDR) aims to improve recommendation accuracy in sparse domains by transferring knowledge from data-rich domains. However, existing CDR methods often assume the availability of user-item interaction data across domains, overlooking user privacy concerns. Furthermore, these methods suffer from performance degradation in scenarios with sparse overlapping users, as they typically depend on a large number of fully shared users for effective knowledge transfer. To address these challenges, we propose a Federated Prototype-based Contrastive Learning (CL) method for Privacy-Preserving CDR, named FedPCL-CDR. This approach utilizes non-overlapping user information and prototypes to improve multi-domain performance while protecting user privacy. FedPCL-CDR comprises two modules: local domain (client) learning and global server aggregation. In the local domain, FedPCL-CDR clusters all user data to learn representative prototypes, effectively utilizing non-overlapping user information and addressing the sparse overlapping user issue. It then facilitates knowledge transfer by employing both local and global prototypes returned from the server in a CL manner. Simultaneously, the global server aggregates representative prototypes from local domains to learn both local and global prototypes. The combination of prototypes and federated learning (FL) ensures that sensitive user data remains decentralized, with only prototypes being shared across domains, thereby protecting user privacy. Extensive experiments on four CDR tasks using two real-world datasets demonstrate that FedPCL-CDR outperforms the state-of-the-art baselines. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.02469 [pdf, other]

UAV-Mounted Movable Antenna: Joint Optimization of UAV Placement and Antenna Configuration

Authors: Xiao-Wei Tang, Yunmei Shi, Yi Huang, Qingqing Wu

Abstract: Recently, movable antennas (MAs) have garnered immense attention due to their capability to favorably alter channel conditions through agile movement. In this letter, we delve into a spectrum sharing system enabled by unmanned aerial vehicle (UAV) mounted MAs, thereby introducing a new degree of freedom vertically alongside the horizontal local mobility for MAs. Our objective is to maximize the mi… ▽ More Recently, movable antennas (MAs) have garnered immense attention due to their capability to favorably alter channel conditions through agile movement. In this letter, we delve into a spectrum sharing system enabled by unmanned aerial vehicle (UAV) mounted MAs, thereby introducing a new degree of freedom vertically alongside the horizontal local mobility for MAs. Our objective is to maximize the minimum beamforming gain for secondary users (SUs) while ensuring that interference to the primary users (PUs) remains below a predefined threshold, which necessitates a joint optimization involving the UAV's height, the antenna weight vector (AWV), and the antenna position vector (APV). However, the formulated optimization problem is non-convex and challenging to solve optimally. To tackle this issue, we propose an alternating optimization algorithm that optimizes the UAV's height, APV and AWV in an iterative manner, thus yielding a near-optimal solution. Numerical results demonstrate the superiority of the proposed scheme as well as its ability to deliver full beamforming gain to SUs with reduced computational complexity. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02167 [pdf, other]

Machine Learning-based Search of High-redshift Quasars

Authors: Guangping Ye, Huanian Zhang, Qingwen Wu

Abstract: We present a machine learning search for high-redshift ($5.0 < z < 6.5$) quasars using the combined photometric data from the DESI Imaging Legacy Surveys and the WISE survey. We explore the imputation of missing values for high-redshift quasars, discuss the feature selections, compare different machine learning algorithms, and investigate the selections of class ensemble for the training sample, t… ▽ More We present a machine learning search for high-redshift ($5.0 < z < 6.5$) quasars using the combined photometric data from the DESI Imaging Legacy Surveys and the WISE survey. We explore the imputation of missing values for high-redshift quasars, discuss the feature selections, compare different machine learning algorithms, and investigate the selections of class ensemble for the training sample, then we find that the random forest model is very effective in separating the high-redshift quasars from various contaminators. The 11-class random forest model can achieve a precision of $96.43\%$ and a recall of $91.53\%$ for high-redshift quasars for the test set. We demonstrate that the completeness of the high-redshift quasars can reach as high as $82.20\%$. The final catalog consists of 216,949 high-redshift quasar candidates with 476 high probable ones in the entire Legacy Surveys DR9 footprint, and we make the catalog publicly available. Using MUSE and DESI-EDR public spectra, we find that 14 true high-redshift quasars (11 in the training sample) out of 21 candidates are correctly identified for MUSE, and 20 true high-redshift quasars (11 in the training sample) out of 21 candidates are correctly identified for DESI-EDR. Additionally, we estimate photometric redshift for the high-redshift quasar candidates using random forest regression model with a high precision. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: To appear ApJS, 26 pages, 10 figures and 11 tables

arXiv:2409.00892 [pdf, other]

Multistage Robust Average Randomized Spectral Risk Optimization

Authors: Qiong Wu, Huifu Xu, Harry Zheng

Abstract: In this paper, we revisit the multistage spectral risk minimization models proposed by Philpott et al.~\cite{PdF13} and Guigues and Römisch \cite{GuR12} but with some new focuses. We consider a situation where the decision maker's (DM's) risk preferences may be state-dependent or even inconsistent at some states, and consequently there is not a single deterministic spectral risk measure (SRM) whic… ▽ More In this paper, we revisit the multistage spectral risk minimization models proposed by Philpott et al.~\cite{PdF13} and Guigues and Römisch \cite{GuR12} but with some new focuses. We consider a situation where the decision maker's (DM's) risk preferences may be state-dependent or even inconsistent at some states, and consequently there is not a single deterministic spectral risk measure (SRM) which can be used to represent the DM's preferences at each stage. We adopt the recently introduced average randomized SRM (ARSRM) (in \cite{li2022randomization}) to describe the DM's overall risk preference at each stage. To solve the resulting multistage ARSRM (MARSRM) problem, we apply the well-known stochastic dual dynamic programming (SDDP) method which generates a sequence of lower and upper bounds in an iterative manner. Under some moderate conditions, we prove that the optimal solution can be found in a finite number of iterations. The MARSRM model generalizes the one-stage ARSRM and simplifies the existing multistage state-dependent preference robust model \cite{liu2021multistage}, while also encompassing the mainstream multistage risk-neutral and risk-averse optimization models \cite{GuR12,PdF13}. In the absence of complete information on the probability distribution of the DM's random preferences, we propose to use distributionally robust ARSRM (DR-ARSRM) to describe the DM's preferences at each stage. We detail computational schemes for solving both MARSRM and DR-MARSRM. Finally, we examine the performance of MARSRM and DR-MARSRM by applying them to an asset allocation problem with transaction costs and compare them with standard risk neutral and risk averse multistage linear stochastic programming (MLSP) models. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 33 pages, 4 figures and 3 tables

arXiv:2409.00726 [pdf, other]

LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

Authors: Zhaojie Fang, Xiao Yu, Guanyu Zhou, Ke Zhuang, Yifei Chen, Ruiquan Ge, Changmiao Wang, Gangyong Jia, Qing Wu, Juan Ye, Maimaiti Nuliqiman, Peifang Xu, Ahmed Elazab

Abstract: Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producin… ▽ More Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producing high-quality late-phase UWF-FA, particularly in lesion areas and fine details. Two primary challenges hinder the generation of high-quality late-phase UWF-FA: the scarcity of paired UWF-SLO and early/late-phase UWF-FA datasets, and the need for realistic generation at lesion sites and potential blood leakage regions. This study introduces an improved latent diffusion model framework to generate high-quality late-phase UWF-FA from limited paired UWF images. To address the challenges as mentioned earlier, our approach employs a module utilizing Cross-temporal Regional Difference Loss, which encourages the model to focus on the differences between early and late phases. Additionally, we introduce a low-frequency enhanced noise strategy in the diffusion forward process to improve the realism of medical images. To further enhance the mapping capability of the variational autoencoder module, especially with limited datasets, we implement a Gated Convolutional Encoder to extract additional information from conditional images. Our Latent Diffusion Model for Ultra-Wide-Field Late-Phase Fluorescein Angiography (LPUWF-LDM) effectively reconstructs fine details in late-phase UWF-FA and achieves state-of-the-art results compared to other existing methods when working with limited datasets. Our source code is available at: https://github.com/Tinysqua/****. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 13 pages, 7 figures

arXiv:2409.00636 [pdf, other]

A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine

Authors: Yunxiao Shi, Min Xu, Haimin Zhang, Xing Zi, Qiang Wu

Abstract: Large language models (LLMs) and retrieval-augmented generation (RAG) techniques have revolutionized traditional information access, enabling AI agent to search and summarize information on behalf of users during dynamic dialogues. Despite their potential, current AI search engines exhibit considerable room for improvement in several critical areas. These areas include the support for multimodal i… ▽ More Large language models (LLMs) and retrieval-augmented generation (RAG) techniques have revolutionized traditional information access, enabling AI agent to search and summarize information on behalf of users during dynamic dialogues. Despite their potential, current AI search engines exhibit considerable room for improvement in several critical areas. These areas include the support for multimodal information, the delivery of personalized responses, the capability to logically answer complex questions, and the facilitation of more flexible interactions. This paper proposes a novel AI Search Engine framework called the Agent Collaboration Network (ACN). The ACN framework consists of multiple specialized agents working collaboratively, each with distinct roles such as Account Manager, Solution Strategist, Information Manager, and Content Creator. This framework integrates mechanisms for picture content understanding, user profile tracking, and online evolution, enhancing the AI search engine's response quality, personalization, and interactivity. A highlight of the ACN is the introduction of a Reflective Forward Optimization method (RFO), which supports the online synergistic adjustment among agents. This feature endows the ACN with online learning capabilities, ensuring that the system has strong interactive flexibility and can promptly adapt to user feedback. This learning method may also serve as an optimization approach for agent-based systems, potentially influencing other domains of agent applications. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: ACMMM 2024 MMGR WORKSHOP

arXiv:2409.00364 [pdf, other]

Resource Management for IRS-Assisted Full-Duplex Integrated Sensing, Communication and Computing Systems

Authors: Wanming Hao, Xue Wu, Xingwang Li, Gangcan Sun, Qingqing Wu, Liang Yang

Abstract: In this paper, we investigate an intelligent reflecting surface (IRS) assisted full-duplex (FD) integrated sensing, communication and computing system. Specifically, an FD base station (BS) provides service for uplink and downlink transmission, and a local cache is connected to the BS through a backhaul link to store data. Meanwhile, active sensing elements are deployed on the IRS to receive targe… ▽ More In this paper, we investigate an intelligent reflecting surface (IRS) assisted full-duplex (FD) integrated sensing, communication and computing system. Specifically, an FD base station (BS) provides service for uplink and downlink transmission, and a local cache is connected to the BS through a backhaul link to store data. Meanwhile, active sensing elements are deployed on the IRS to receive target echo signals. On this basis, in order to evaluate the overall performance of the system under consideration, we propose a system utility maximization problem while ensuring the sensing quality, expressed as the difference between the sum of communication throughput, total computation bits (offloading bits and local computation bits) and the total backhaul cost for content delivery. This makes the problem difficult to solve due to the highly non-convex coupling of the optimization variables. To effectively solve this problem, we first design the most effective caching strategy. Then, we develop an algorithm based on weighted minimum mean square error, alternative direction method of multipliers, majorization-minimization framework, semi-definite relaxation techniques, and several complex transformations to jointly solve the optimization variables. Finally, simulation results are provided to verify the utility performance of the proposed algorithm and demonstrate the advantages of the proposed scheme compared with the baseline scheme. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2409.00353 [pdf, other]

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Authors: Kunming Su, Qiuxia Wu, Panpan Cai, Xiaogang Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

Abstract: Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant lat… ▽ More Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant latent representations, and 2) facilitating self-supervised reconstruction in a rotation-invariant manner. For the first challenge, we introduce RI-Transformer, which features disentangled geometry content, rotation-invariant relative orientation and position embedding mechanisms for constructing rotation-invariant point cloud latent space. For the second challenge, a novel dual-branch student-teacher architecture is devised. It enables the self-supervised learning via the reconstruction of masked patches within the learned rotation-invariant latent space. Each branch is based on an RI-Transformer, and they are connected with an additional RI-Transformer predictor. The teacher encodes all point patches, while the student solely encodes unmasked ones. Finally, the predictor predicts the latent features of the masked patches using the output latent embeddings from the student, supervised by the outputs from the teacher. Extensive experiments demonstrate that our method is robust to rotations, achieving the state-of-the-art performance on various downstream tasks. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.17182 [pdf, other]

Hybrid Classification-Regression Adaptive Loss for Dense Object Detection

Authors: Yanquan Huang, Liu Wei Zhen, Yun Hao, Mengyuan Zhang, Qingyao Wu, Zikun Deng, Xueming Liu, Hong Deng

Abstract: For object detection detectors, enhancing model performance hinges on the ability to simultaneously consider inconsistencies across tasks and focus on difficult-to-train samples. Achieving this necessitates incorporating information from both the classification and regression tasks. However, prior work tends to either emphasize difficult-to-train samples within their respective tasks or simply com… ▽ More For object detection detectors, enhancing model performance hinges on the ability to simultaneously consider inconsistencies across tasks and focus on difficult-to-train samples. Achieving this necessitates incorporating information from both the classification and regression tasks. However, prior work tends to either emphasize difficult-to-train samples within their respective tasks or simply compute classification scores with IoU, often leading to suboptimal model performance. In this paper, we propose a Hybrid Classification-Regression Adaptive Loss, termed as HCRAL. Specifically, we introduce the Residual of Classification and IoU (RCI) module for cross-task supervision, addressing task inconsistencies, and the Conditioning Factor (CF) to focus on difficult-to-train samples within each task. Furthermore, we introduce a new strategy named Expanded Adaptive Training Sample Selection (EATSS) to provide additional samples that exhibit classification and regression inconsistencies. To validate the effectiveness of the proposed method, we conduct extensive experiments on COCO test-dev. Experimental evaluations demonstrate the superiority of our approachs. Additionally, we designed experiments by separately combining the classification and regression loss with regular loss functions in popular one-stage models, demonstrating improved performance. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.15668 [pdf, ps, other]

Movable Antennas Meet Intelligent Reflecting Surface: When Do We Need Movable Antennas?

Authors: Xin Wei, Weidong Mei, Qingqing Wu, Boyu Ning, Zhi Chen

Abstract: Intelligent reflecting surface (IRS) and movable antenna (MA)/fluid antenna (FA) techniques have both received increasing attention in the realm of wireless communications due to their ability to reconfigure and improve wireless channel conditions. In this paper, we investigate the integration of MAs/FAs into an IRS-assisted wireless communication system. In particular, we consider the downlink tr… ▽ More Intelligent reflecting surface (IRS) and movable antenna (MA)/fluid antenna (FA) techniques have both received increasing attention in the realm of wireless communications due to their ability to reconfigure and improve wireless channel conditions. In this paper, we investigate the integration of MAs/FAs into an IRS-assisted wireless communication system. In particular, we consider the downlink transmission from a multi-MA base station (BS) to a single-antenna user with the aid of an IRS, aiming to maximize the user's received signal-to-noise ratio (SNR), by jointly optimizing the BS/IRS active/passive beamforming and the MAs' positions. Due to the similar capability of MAs and IRS for channel reconfiguration, we first conduct theoretical analyses of the performance gain of MAs over conventional fixed-position antennas (FPAs) under the line-of-sight (LoS) BS-IRS channel and derive the conditions under which the performance gain becomes more or less significant. Next, to solve the received SNR maximization problem, we propose an alternating optimization (AO) algorithm that decomposes it into two subproblems and solve them alternately. Numerical results are provided to validate our analytical results and evaluate the performance gains of MAs over FPAs under different setups. △ Less

Submitted 1 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

Comments: 6 pages, 6 figures

arXiv:2408.15490 [pdf, ps, other]

Symbiotic Sensing and Communication: Framework and Beamforming Design

Authors: Fanghao Xia, Zesong Fei, Xinyi Wang, Weijie Yuan, Qingqing Wu, Yuanwei Liu, Tony Q. S. Quek

Abstract: In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak… ▽ More In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak target of interest, the sensing node tracks VUEs and shares sensing results with BS to facilitate sensing-assisted beamforming. By considering both fully digital arrays and hybrid analog-digital (HAD) arrays, we investigate the beamforming design in the SSAC system. We first derive the Cramer-Rao lower bound (CRLB) of the two-dimensional angles of arrival estimation as the sensing metric. Next, we formulate an achievable sum rate maximization problem under the CRLB constraint, where the channel state information is reconstructed based on the sensing results. Then, we propose two penalty dual decomposition (PDD)-based alternating algorithms for fully digital and HAD arrays, respectively. Simulation results demonstrate that the proposed algorithms can achieve an outstanding data rate with effective localization capability for both VUEs and the weak target. In particular, the HAD beamforming design exhibits remarkable performance gain compared to conventional schemes, especially with fewer radio frequency chains. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 16 pages, 11 figures, submitted to IEEE journals for possible publication

arXiv:2408.14831 [pdf, other]

DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing

Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Abstract: Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. O… ▽ More Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. Our previous work FLSimCo algorithm, which uses local resources for Federated Self-Supervised Learning (SSL), though vehicles often can't complete all iterations task. Our improved algorithm offloads partial task to RSU and optimizes energy consumption by adjusting transmission power, CPU frequency, and task assignment ratios, balancing local and RSU-based training. Meanwhile, setting an offloading threshold further prevents inefficiencies. Simulation results show that the enhanced algorithm reduces energy consumption, improves offloading efficiency and the accuracy of Federated SSL. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: This paper has been submitted to Digital Communications and Networks. The source code has been released at: https://github.com/qiongwu86/Federated-SSL-task-offloading-and-resource-allocation

arXiv:2408.14689 [pdf, other]

Federated User Preference Modeling for Privacy-Preserving Cross-Domain Recommendation

Authors: Li Wang, Shoujin Wang, Quangui Zhang, Qiang Wu, Min Xu

Abstract: Cross-domain recommendation (CDR) aims to address the data-sparsity problem by transferring knowledge across domains. Existing CDR methods generally assume that the user-item interaction data is shareable between domains, which leads to privacy leakage. Recently, some privacy-preserving CDR (PPCDR) models have been proposed to solve this problem. However, they primarily transfer simple representat… ▽ More Cross-domain recommendation (CDR) aims to address the data-sparsity problem by transferring knowledge across domains. Existing CDR methods generally assume that the user-item interaction data is shareable between domains, which leads to privacy leakage. Recently, some privacy-preserving CDR (PPCDR) models have been proposed to solve this problem. However, they primarily transfer simple representations learned only from user-item interaction histories, overlooking other useful side information, leading to inaccurate user preferences. Additionally, they transfer differentially private user-item interaction matrices or embeddings across domains to protect privacy. However, these methods offer limited privacy protection, as attackers may exploit external information to infer the original data. To address these challenges, we propose a novel Federated User Preference Modeling (FUPM) framework. In FUPM, first, a novel comprehensive preference exploration module is proposed to learn users' comprehensive preferences from both interaction data and additional data including review texts and potentially positive items. Next, a private preference transfer module is designed to first learn differentially private local and global prototypes, and then privately transfer the global prototypes using a federated learning strategy. These prototypes are generalized representations of user groups, making it difficult for attackers to infer individual information. Extensive experiments on four CDR tasks conducted on the Amazon and Douban datasets validate the superiority of FUPM over SOTA baselines. Code is available at https://github.com/Lili1013/FUPM. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13483 [pdf, other]

Transmissive RIS Enabled Transceiver Systems:Architecture, Design Issues and Opportunities

Authors: Zhendong Li, Wen Chen, Qingqing Wu, Ziwei Liu, Chong He, Xudong Bai, Jun Li

Abstract: Reconfigurable intelligent surface (RIS) is anticipated to augment the performance of beyond fifth-generation (B5G) and sixth-generation (6G) networks by intelligently manipulating the state of its components. Rather than employing reflective RIS for aided communications, this paper proposes an innovative transmissive RIS-enabled transceiver (TRTC) architecture that can accomplish the functions of… ▽ More Reconfigurable intelligent surface (RIS) is anticipated to augment the performance of beyond fifth-generation (B5G) and sixth-generation (6G) networks by intelligently manipulating the state of its components. Rather than employing reflective RIS for aided communications, this paper proposes an innovative transmissive RIS-enabled transceiver (TRTC) architecture that can accomplish the functions of traditional multi-antenna systems in a cost-effective and energy-efficient manner. First, the proposed network architecture and its corresponding transmission scheme are elaborated from the perspectives of downlink (DL) and uplink (UL) transmissions. Then, we illustrate several significant advantages and differences of TRTC compared to other multiantenna systems. Furthermore, the downlink modulation and extraction principle based on time-modulation array (TMA) is introduced in detail to tackle the multi-stream communications. Moreover, a near-far field channel model appropriate for this architecture is proposed. Based on the channel model, we summarize some state-of-the-art channel estimation schemes, and the channel estimation scheme of TRTC is also provided. Considering the optimization for DL and UL communications, we present numerical simulations that confirm the superiority of the proposed optimization algorithm. Lastly, numerous prospective research avenues for TRTC systems are delineated to inspire further exploration. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Journal ref: IEEE VTM, 2024

arXiv:2408.12355 [pdf, other]

Class-balanced Open-set Semi-supervised Object Detection for Medical Images

Authors: Zhanyun Lu, Renshu Gu, Huimin Cheng, Siyu Pang, Mingyu Xu, Peifang Xu, Yaqi Wang, Yuichiro Kinoshita, Juan Ye, Gangyong Jia, Qing Wu

Abstract: Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: fir… ▽ More Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: first, the class imbalance is not considered; second, the OOD instances are distinguished and simply discarded during pseudo-labeling. In this paper, we consider the open-set semi-supervised object detection problem which leverages unlabeled data that contain OOD classes to improve object detection for medical images. Our study incorporates two key innovations: Category Control Embed (CCE) and out-of-distribution Detection Fusion Classifier (OODFC). CCE is designed to tackle dataset imbalance by constructing a Foreground information Library, while OODFC tackles open-set challenges by integrating the ``unknown'' information into basic pseudo-labels. Our method outperforms the state-of-the-art SSOD performance, achieving a 4.25 mAP improvement on the public Parasite dataset. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.11432 [pdf, other]

doi 10.1145/3664647.3680673

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Authors: Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu

Abstract: Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in nat… ▽ More Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30\%-50\% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://github.com/Lilidamowang/T2VIndexer-generativeSearch. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.10659 [pdf, other]

Productions of bottom and bottom-strange mesons in pion and kaon induced reactions

Authors: Jing Liu, Quan-Yun Guo, Qi Wu, Jun He, Dian-Yong Chen

Abstract: In the present work, we propose to explore the productions of the bottom and bottom-strange mesons in the high-energy pion and kaon-induced reactions on a proton target. The cross sections are evaluated with an effective Lagrangian constructed by the heavy-quark limit and chiral symmetry. Our estimations show that at $P_π=80$ GeV, the cross sections for $B(5279)$, $B^\ast (5325)$,… ▽ More In the present work, we propose to explore the productions of the bottom and bottom-strange mesons in the high-energy pion and kaon-induced reactions on a proton target. The cross sections are evaluated with an effective Lagrangian constructed by the heavy-quark limit and chiral symmetry. Our estimations show that at $P_π=80$ GeV, the cross sections for $B(5279)$, $B^\ast (5325)$, $B_0^\ast (5738)$, $B_1^\prime (5757)$, $B_1(5721)$ and $B_2^\ast (5747)$ production processes are estimated to be $3.19 \sim 86.26$, $1.86\sim 51.29$, $0.87 \sim 24.25$, $0.84 \sim 23.14$, $162.35 \sim 4477.66$, and $57.16 \sim 1604.43$ nb, respectively, where uncertainties arise from the model parameter. In addition, the cross sections for the corresponding bottom-strange mesons production processes are very similar. Moreover, our estimations indicate that the ratios of these cross sections are almost independent on the model parameters. In particular, the cross-section ratios related to the states in the same doublets are of order one, which is consistent with the expectation of heavy-quark limit. The cross sections related to the states in the $T$ doublets are about two orders larger than those related to the states in the $S$ doublets. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.09194 [pdf, other]

DRL-Based Resource Allocation for Motion Blur Resistant Federated Self-Supervised Learning in IoV

Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Qiang Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Abstract: In the Internet of Vehicles (IoV), Federated Learning (FL) provides a privacy-preserving solution by aggregating local models without sharing data. Traditional supervised learning requires image data with labels, but data labeling involves significant manual effort. Federated Self-Supervised Learning (FSSL) utilizes Self-Supervised Learning (SSL) for local training in FL, eliminating the need for… ▽ More In the Internet of Vehicles (IoV), Federated Learning (FL) provides a privacy-preserving solution by aggregating local models without sharing data. Traditional supervised learning requires image data with labels, but data labeling involves significant manual effort. Federated Self-Supervised Learning (FSSL) utilizes Self-Supervised Learning (SSL) for local training in FL, eliminating the need for labels while protecting privacy. Compared to other SSL methods, Momentum Contrast (MoCo) reduces the demand for computing resources and storage space by creating a dictionary. However, using MoCo in FSSL requires uploading the local dictionary from vehicles to Base Station (BS), which poses a risk of privacy leakage. Simplified Contrast (SimCo) addresses the privacy leakage issue in MoCo-based FSSL by using dual temperature instead of a dictionary to control sample distribution. Additionally, considering the negative impact of motion blur on model aggregation, and based on SimCo, we propose a motion blur-resistant FSSL method, referred to as BFSSL. Furthermore, we address energy consumption and delay in the BFSSL process by proposing a Deep Reinforcement Learning (DRL)-based resource allocation scheme, called DRL-BFSSL. In this scheme, BS allocates the Central Processing Unit (CPU) frequency and transmission power of vehicles to minimize energy consumption and latency, while aggregating received models based on the motion blur level. Simulation results validate the effectiveness of our proposed aggregation and resource allocation methods. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/DRL-BFSSL

arXiv:2408.08713 [pdf, other]

Beyond KAN: Introducing KarSein for Adaptive High-Order Feature Interaction Modeling in CTR Prediction

Authors: Yunxiao Shi, Wujiang Xu, Mingyu Jin, Haimin Zhang, Qiang Wu, Yongfeng Zhang, Min Xu

Abstract: Modeling feature interactions is crucial for click-through rate (CTR) prediction, particularly when it comes to high-order explicit interactions. Traditional methods struggle with this task because they often predefine a maximum interaction order, which relies heavily on prior knowledge and can limit the model's effectiveness. Additionally, modeling high-order interactions typically leads to incre… ▽ More Modeling feature interactions is crucial for click-through rate (CTR) prediction, particularly when it comes to high-order explicit interactions. Traditional methods struggle with this task because they often predefine a maximum interaction order, which relies heavily on prior knowledge and can limit the model's effectiveness. Additionally, modeling high-order interactions typically leads to increased computational costs. Therefore, the challenge lies in adaptively modeling high-order feature interactions while maintaining efficiency. To address this issue, we introduce Kolmogorov-Arnold Represented Sparse Efficient Interaction Network (KarSein), designed to optimize both predictive accuracy and computational efficiency. We firstly identify limitations of directly applying Kolmogorov-Arnold Networks (KAN) to CTR and then introduce KarSein to overcome these issues. It features a novel architecture that reduces the computational costs of KAN and supports embedding vectors as feature inputs. Additionally, KarSein employs guided symbolic regression to address the challenge of KAN in spontaneously learning multiplicative relationships. Extensive experiments demonstrate KarSein's superior performance, achieving significant predictive accuracy with minimal computational overhead. Furthermore, KarSein maintains strong global explainability while enabling the removal of redundant features, resulting in a sparse network structure. These advantages also position KarSein as a promising method for efficient inference. △ Less

Submitted 25 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

Comments: KarSein for CTR

arXiv:2408.07481 [pdf, other]

DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency

Authors: Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin, Qingyao Wu

Abstract: Diffusion models usher a new era of video editing, flexibly manipulating the video contents with text prompts. Despite the widespread application demand in editing human-centered videos, these models face significant challenges in handling complex objects like humans. In this paper, we introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separ… ▽ More Diffusion models usher a new era of video editing, flexibly manipulating the video contents with text prompts. Despite the widespread application demand in editing human-centered videos, these models face significant challenges in handling complex objects like humans. In this paper, we introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separate editable targets, ensuring global spatial-temporal consistency by maintaining the coherence of each individual component. Specifically, we propose a decoupled dynamic human representation that utilizes a parametric human body prior to generate tailored humans while preserving the consistent motions as the original video. In addition, we consider the background as a layered atlas to apply text-guided image editing approaches on it. To further enhance the geometry and texture of humans during the optimization, we extend the calculation of score distillation sampling into normal space and image space. Moreover, we tackle inconsistent lighting between the edited targets by leveraging a lighting-aware video harmonizer, a problem previously overlooked in decompose-edit-combine approaches. Extensive qualitative and numerical experiments demonstrate that DeCo outperforms prior video editing methods in human-centered videos, especially in longer videos. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: European Conference on Computer Vision

arXiv:2408.06870 [pdf, ps, other]

Spectrum Prediction With Deep 3D Pyramid Vision Transformer Learning

Authors: Guangliang Pan, Qihui Wu, Bo Zhou, Jie Li, Wei Wang, Guoru Ding, David K. Y. Yau

Abstract: In this paper, we propose a deep learning (DL)-based task-driven spectrum prediction framework, named DeepSPred. The DeepSPred comprises a feature encoder and a task predictor, where the encoder extracts spectrum usage pattern features, and the predictor configures different networks according to the task requirements to predict future spectrum. Based on the Deep- SPred, we first propose a novel 3… ▽ More In this paper, we propose a deep learning (DL)-based task-driven spectrum prediction framework, named DeepSPred. The DeepSPred comprises a feature encoder and a task predictor, where the encoder extracts spectrum usage pattern features, and the predictor configures different networks according to the task requirements to predict future spectrum. Based on the Deep- SPred, we first propose a novel 3D spectrum prediction method combining a flow processing strategy with 3D vision Transformer (ViT, i.e., Swin) and a pyramid to serve possible applications such as spectrum monitoring task, named 3D-SwinSTB. 3D-SwinSTB unique 3D Patch Merging ViT-to-3D ViT Patch Expanding and pyramid designs help the model accurately learn the potential correlation of the evolution of the spectrogram over time. Then, we propose a novel spectrum occupancy rate (SOR) method by redesigning a predictor consisting exclusively of 3D convolutional and linear layers to serve possible applications such as dynamic spectrum access (DSA) task, named 3D-SwinLinear. Unlike the 3D-SwinSTB output spectrogram, 3D-SwinLinear projects the spectrogram directly as the SOR. Finally, we employ transfer learning (TL) to ensure the applicability of our two methods to diverse spectrum services. The results show that our 3D-SwinSTB outperforms recent benchmarks by more than 5%, while our 3D-SwinLinear achieves a 90% accuracy, with a performance improvement exceeding 10%. △ Less

Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.05813 [pdf, other]

$X(4630)$ and $Y(4626)$ production in the $B^+$ and $B_s^0$ decays

Authors: Zhuo Yu, Qi Wu, Dian-Yong Chen

Abstract: In the present work, we investigate the production of $X(4630)$ and $Y(4626)$ in $B^+$ and $B_s^0$ decays, where $X(4630)$ and $Y(4626)$ are considered as the $C-$ parity pigeon pair in the $D_{s}^{\ast+} D_{s1}(2536)^-$ molecular frame. The branching fractions of $B^+ \to K^+ X(4630)/Y(4626)$ and $B_s^0 \to ηX(4630)/Y(4626)$ have been evaluated using an effective Lagrangian approach, which are of… ▽ More In the present work, we investigate the production of $X(4630)$ and $Y(4626)$ in $B^+$ and $B_s^0$ decays, where $X(4630)$ and $Y(4626)$ are considered as the $C-$ parity pigeon pair in the $D_{s}^{\ast+} D_{s1}(2536)^-$ molecular frame. The branching fractions of $B^+ \to K^+ X(4630)/Y(4626)$ and $B_s^0 \to ηX(4630)/Y(4626)$ have been evaluated using an effective Lagrangian approach, which are of the order of $10^{-5}$ and the ratios of these branching fractions are almost independent on the model parameter. Based on the present estimations, we propose to search $Y(4626)$ in the process $B^+ \to K^+ J/ψη^{(\prime)}$, which should be accessible by the LHCb and Belle II Collaborations. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.05765 [pdf, other]

Scalable and Adaptive Spectral Embedding for Attributed Graph Clustering

Authors: Yunhui Liu, Tieke He, Qing Wu, Tao Zheng, Jianhua Zhao

Abstract: Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clu… ▽ More Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clustering method devoid of parameter learning. SASE comprises three main components: node features smoothing via $k$-order simple graph convolution, scalable spectral clustering using random Fourier features, and adaptive order selection. With these designs, SASE not only effectively captures global cluster structures but also exhibits linear time and space complexity relative to the graph size. Empirical results demonstrate the superiority of SASE. For example, on the ArXiv dataset with 169K nodes and 1.17M edges, SASE achieves a 6.9\% improvement in ACC and a $5.87\times$ speedup compared to the runner-up, S3GC. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: Accepted by CIKM 2024 (Short Paper)

arXiv:2408.03771 [pdf]

Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

Authors: Xian Zhong, Zohaib Salahuddin, Yi Chen, Henry C Woodruff, Haiyi Long, Jianyun Peng, Nuwan Udawatte, Roberto Casale, Ayoub Mokhtari, Xiaoer Zhang, Jiayao Huang, Qingyu Wu, Li Tan, Lili Chen, Dongming Li, Xiaoyan Xie, Manxia Lin, Philippe Lambin

Abstract: Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE… ▽ More Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE-MLP) model for preoperative PHLF prediction. This model integrated counterfactuals and layerwise relevance propagation (LRP) to provide insights into its decision-making mechanism. Additionally, we proposed a methodological framework for evaluating the explainability of AI systems. This framework includes qualitative and quantitative assessments of explanations against recognized biomarkers, usability evaluations, and an in silico clinical trial. Our evaluations demonstrated that the model's explanation correlated with established biomarkers and exhibited high usability at both the case and system levels. Furthermore, results from the three-track in silico clinical trial showed that clinicians' prediction accuracy and confidence increased when AI explanations were provided. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.03519 [pdf, other]

RepoMasterEval: Evaluating Code Completion via Real-World Repositories

Authors: Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu, Haoyu Gan, Bo Jiang, Jinhe Tang, Zhiwen Deng, Zhanming Guan, Cuiyun Gao, Xia Liu, Ping Yang

Abstract: With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion ca… ▽ More With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes the evaluation poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world Python and TypeScript repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of model generated code, we employ mutation testing to measure the effectiveness of the test cases and we manually crafted new test cases for those test suites with low mutation score. Our empirical evaluation on 6 state-of-the-art models shows that test argumentation is critical in improving the accuracy of the benchmark and RepoMasterEval is able to report difference in model performance in real-world scenarios. The deployment of RepoMasterEval in a collaborated company for one month also revealed that the benchmark is useful to give accurate feedback during model training and the score is in high correlation with the model's performance in practice. Based on our findings, we call for the software engineering community to build more LLM benchmarks tailored for code generation tools taking the practical and complex development environment into consideration. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2408.02695 [pdf, other]

Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

Authors: Shaoxu Cheng, Kanglei Geng, Chiyuan He, Zihuan Qiu, Linfeng Xu, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Hongliang Li

Abstract: Continual Learning (CL) aims to enable Deep Neural Networks (DNNs) to learn new data without forgetting previously learned knowledge. The key to achieving this goal is to avoid confusion at the feature level, i.e., avoiding confusion within old tasks and between new and old tasks. Previous prototype-based CL methods generate pseudo features for old knowledge replay by adding Gaussian noise to the… ▽ More Continual Learning (CL) aims to enable Deep Neural Networks (DNNs) to learn new data without forgetting previously learned knowledge. The key to achieving this goal is to avoid confusion at the feature level, i.e., avoiding confusion within old tasks and between new and old tasks. Previous prototype-based CL methods generate pseudo features for old knowledge replay by adding Gaussian noise to the centroids of old classes. However, the distribution in the feature space exhibits anisotropy during the incremental process, which prevents the pseudo features from faithfully reproducing the distribution of old knowledge in the feature space, leading to confusion in classification boundaries within old tasks. To address this issue, we propose the Distribution-Level Memory Recall (DMR) method, which uses a Gaussian mixture model to precisely fit the feature distribution of old knowledge at the distribution level and generate pseudo features in the next stage. Furthermore, resistance to confusion at the distribution level is also crucial for multimodal learning, as the problem of multimodal imbalance results in significant differences in feature responses between different modalities, exacerbating confusion within old tasks in prototype-based CL methods. Therefore, we mitigate the multi-modal imbalance problem by using the Inter-modal Guidance and Intra-modal Mining (IGIM) method to guide weaker modalities with prior information from dominant modalities and further explore useful information within modalities. For the second key, We propose the Confusion Index to quantitatively describe a model's ability to distinguish between new and old tasks, and we use the Incremental Mixup Feature Enhancement (IMFE) method to enhance pseudo features with new sample features, alleviating classification confusion between new and old knowledge. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.02223 [pdf, other]

Large Language Model Aided QoS Prediction for Service Recommendation

Authors: Huiying Liu, Zekun Zhang, Honghao Li, Qilin Wu, Yiwen Zhang

Abstract: Large language models (LLMs) have seen rapid improvement in the recent years, and have been used in a wider range of applications. After being trained on large text corpus, LLMs obtain the capability of extracting rich features from textual data. Such capability is potentially useful for the web service recommendation task, where the web users and services have intrinsic attributes that can be des… ▽ More Large language models (LLMs) have seen rapid improvement in the recent years, and have been used in a wider range of applications. After being trained on large text corpus, LLMs obtain the capability of extracting rich features from textual data. Such capability is potentially useful for the web service recommendation task, where the web users and services have intrinsic attributes that can be described using natural language sentences and are useful for recommendation. In this paper, we explore the possibility and practicality of using LLMs for web service recommendation. We propose the large language model aided QoS prediction (llmQoS) model, which use LLMs to extract useful information from attributes of web users and services via descriptive sentences. This information is then used in combination with the QoS values of historical interactions of users and services, to predict QoS values for any given user-service pair. On the WSDream dataset, llmQoS is shown to overcome the data sparsity issue inherent to the QoS prediction problem, and outperforms comparable baseline models consistently. △ Less

Submitted 15 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.02049 [pdf, other]

3D Single-object Tracking in Point Clouds with High Temporal Variation

Authors: Qiao Wu, Kun Sun, Pei An, Mathieu Salzmann, Yanning Zhang, Jiaqi Yang

Abstract: The high temporal variation of the point clouds is the key challenge of 3D single-object tracking (3D SOT). Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth, failing to cope with high temporal variation data. In this paper, we present a novel framework for 3D SOT in point clouds with high temp… ▽ More The high temporal variation of the point clouds is the key challenge of 3D single-object tracking (3D SOT). Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth, failing to cope with high temporal variation data. In this paper, we present a novel framework for 3D SOT in point clouds with high temporal variation, called HVTrack. HVTrack proposes three novel components to tackle the challenges in the high temporal variation scenario: 1) A Relative-Pose-Aware Memory module to handle temporal point cloud shape variations; 2) a Base-Expansion Feature Cross-Attention module to deal with similar object distractions in expanded search areas; 3) a Contextual Point Guided Self-Attention module for suppressing heavy background noise. We construct a dataset with high temporal variation (KITTI-HV) by setting different frame intervals for sampling in the KITTI dataset. On the KITTI-HV with 5 frame intervals, our HVTrack surpasses the state-of-the-art tracker CXTracker by 11.3%/15.7% in Success/Precision. △ Less

Submitted 6 September, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV24

arXiv:2408.02037 [pdf, ps, other]

Distributionally Robust Optimization for Computation Offloading in Aerial Access Networks

Authors: Guanwang Jiang, Ziye Jia, Lijun He, Chao Dong, Qihui Wu, Zhu Han

Abstract: With the rapid increment of multiple users for data offloading and computation, it is challenging to guarantee the quality of service (QoS) in remote areas. To deal with the challenge, it is promising to combine aerial access networks (AANs) with multi-access edge computing (MEC) equipments to provide computation services with high QoS. However, as for uncertain data sizes of tasks, it is intracta… ▽ More With the rapid increment of multiple users for data offloading and computation, it is challenging to guarantee the quality of service (QoS) in remote areas. To deal with the challenge, it is promising to combine aerial access networks (AANs) with multi-access edge computing (MEC) equipments to provide computation services with high QoS. However, as for uncertain data sizes of tasks, it is intractable to optimize the offloading decisions and the aerial resources. Hence, in this paper, we consider the AAN to provide MEC services for uncertain tasks. Specifically, we construct the uncertainty sets based on historical data to characterize the possible probability distribution of the uncertain tasks. Then, based on the constructed uncertainty sets, we formulate a distributionally robust optimization problem to minimize the system delay. Next,we relax the problem and reformulate it into a linear programming problem. Accordingly, we design a MEC-based distributionally robust latency optimization algorithm. Finally, simulation results reveal that the proposed algorithm achieves a superior balance between reducing system latency and minimizing energy consumption, as compared to other benchmark mechanisms in the existing literature. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.01702 [pdf, ps, other]

Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Authors: Qiucen Wu, Tian Lin, Xianghao Yu, Yu Zhu, Robert Schober

Abstract: Intelligent reflecting surfaces (IRSs) have been regarded as a promising enabler for future wireless communication systems. In the literature, IRSs have been considered power-free or assumed to have constant power consumption. However, recent experimental results have shown that for positive-intrinsic-negative (PIN) diode-based IRSs, the power consumption dynamically changes with the phase shift c… ▽ More Intelligent reflecting surfaces (IRSs) have been regarded as a promising enabler for future wireless communication systems. In the literature, IRSs have been considered power-free or assumed to have constant power consumption. However, recent experimental results have shown that for positive-intrinsic-negative (PIN) diode-based IRSs, the power consumption dynamically changes with the phase shift configuration. This phase shift-dependent power consumption (PS-DPC) introduces a challenging power allocation problem between base station (BS) and IRS. To tackle this issue, in this paper, we investigate a rate maximization problem for IRS-assisted systems under a practical PS-DPC model. For the single-user case, we propose a generalized Benders decomposition-based beamforming method to maximize the achievable rate while satisfying a total system power consumption constraint. Moreover, we propose a low-complexity beamforming design, where the powers allocated to BS and IRS are optimized offline based on statistical channel state information. Furthermore, for the multi-user case, we solve an equivalent weighted mean square error minimization problem with two different joint power allocation and phase shift optimization methods. Simulation results indicate that compared to baseline schemes, our proposed methods can flexibly optimize the power allocation between BS and IRS, thus achieving better performance. The optimized power allocation strategy strongly depends on the system power budget. When the system power budget is high, the PS-DPC is not the dominant factor in the system power consumption, allowing the IRS to turn on as many PIN diodes as needed to achieve high beamforming quality. When the system power budget is limited, however, more power tends to be allocated to the BS to enhance the transmit power, resulting in a lower beamforming quality at the IRS due to the reduced PS-DPC budget. △ Less

Submitted 3 August, 2024; originally announced August 2024.

arXiv:2408.00413 [pdf, other]

Joint Antenna Position and Beamforming Optimization with Self-Interference Mitigation in MA-ISAC System

Authors: Size Peng, Cixiao Zhang, Yin Xu, Qingqing Wu, Xiaowu Ou, Dazhi He

Abstract: Movable antennas (MAs) have demonstrated significant potential in enhancing the performance of integrated sensing and communication (ISAC) systems. However, the application in the integrated and cost-effective full-duplex (FD) monostatic systems remains underexplored. To address this research gap, we develop an MA-ISAC model within a monostatic framework, where the self-interference channel is mod… ▽ More Movable antennas (MAs) have demonstrated significant potential in enhancing the performance of integrated sensing and communication (ISAC) systems. However, the application in the integrated and cost-effective full-duplex (FD) monostatic systems remains underexplored. To address this research gap, we develop an MA-ISAC model within a monostatic framework, where the self-interference channel is modeled in the near field and characterized by antenna position vectors. This model allows us to investigate the use of MAs with the goal of maximizing the weighted sum of communication capacity and sensing mutual information. The resulting optimization problem is non-convex making it challenging to solve optimally. To overcome this, we employ fractional programming (FP) to propose an alternating optimization (AO) algorithm that jointly optimizes the beamforming and antenna positions for both transceivers. Specifically, closed-form solutions for the transmit and receive beamforming matrices are derived using the Karush-Kuhn-Tucker (KKT) conditions, and a novel coarse-to-fine grained search (CFGS) approach is employed to determine the high-quality sub-optimal antenna positions. Numerical results demonstrate that with strong self-interference cancellation (SIC) capabilities, MAs significantly enhance the overall performance and reliability of the ISAC system when utilizing our proposed algorithm, compared to conventional fixed-position antenna designs. △ Less

Submitted 9 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

arXiv:2408.00256 [pdf, other]

Mobility-Aware Federated Self-supervised Learning in Vehicular Network

Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Qiang Fan

Abstract: Federated Learning (FL) is an advanced distributed machine learning approach, that protects the privacy of each vehicle by allowing the model to be trained on multiple devices simultaneously without the need to upload all data to a road side unit (RSU). This enables FL to handle scenarios with sensitive or widely distributed data. However, in these fields, it is well known that the labeling costs… ▽ More Federated Learning (FL) is an advanced distributed machine learning approach, that protects the privacy of each vehicle by allowing the model to be trained on multiple devices simultaneously without the need to upload all data to a road side unit (RSU). This enables FL to handle scenarios with sensitive or widely distributed data. However, in these fields, it is well known that the labeling costs can be a significant expense, and models relying on labels are not suitable for these rapidly evolving fields especially in vehicular networks, or mobile internet of things (MIoT), where new data emerges constantly. To handle this issue, the self-supervised learning paves the way for training without labels. Additionally, for vehicles with high velocity, owing to blurred images, simple aggregation not only impacts the accuracy of the aggregated model but also reduces the convergence speed of FL. This paper proposes a FL algorithm based on image blur level to aggregation, called FLSimCo, which does not require labels and serves as a pre-training stage for self-supervised learning in the vehicular environment. Simulation results demonstrate that the proposed algorithm exhibits fast and stable convergence. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Comments: This paper has been submitted to urban lifeline. The source code has been released at: The source code has been released at: https://github.com/qiongwu86/FLSimCo

arXiv:2408.00223 [pdf, other]

Age of Information Analysis for Multi-Priority Queue and NOMA Enabled C-V2X in IoV

Authors: Zheng Zhang, Qiong Wu, Pingyi Fan, Ke Xiong

Abstract: As development Internet-of-Vehicles (IoV) technology and demand for Intelligent Transportation Systems (ITS) increase, there is a growing need for real-time data and communication by vehicle users. Traditional request-based methods face challenges such as latency and bandwidth limitations. Mode 4 in Connected Vehicle-to-Everything (C-V2X) addresses latency and overhead issues through autonomous re… ▽ More As development Internet-of-Vehicles (IoV) technology and demand for Intelligent Transportation Systems (ITS) increase, there is a growing need for real-time data and communication by vehicle users. Traditional request-based methods face challenges such as latency and bandwidth limitations. Mode 4 in Connected Vehicle-to-Everything (C-V2X) addresses latency and overhead issues through autonomous resource selection. However, Semi-Persistent Scheduling (SPS) based on distributed sensing may lead to increased collision. Non-Orthogonal Multiple Access (NOMA) can alleviate the problem of reduced packet reception probability due to collisions. Moreover, the concept of Age of Information (AoI) is introduced as a comprehensive metric reflecting reliability and latency performance, analyzing the impact of NOMA on C-V2X communication system. AoI indicates the time a message spends in both local waiting and transmission processes. In C-V2X, waiting process can be extended to queuing process, influenced by packet generation rate and Resource Reservation Interval (RRI). The transmission process is mainly affected by transmission delay and success rate. In C-V2X, a smaller selection window (SW) limits the number of available resources for vehicles, resulting in higher collision rates with increased number of vehicles. SW is generally equal to RRI, which not only affects AoI in queuing process but also AoI in the transmission process. Therefore, this paper proposes an AoI estimation method based on multi-priority data type queues and considers the influence of NOMA on the AoI generated in both processes in C-V2X system under different RRI conditions. This work aims to gain a better performance of C-V2X system comparing with some known algorithms. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Comments: This paper has been submitted to WCSP 2024. The source code has been released at: https://github.com/qiongwu86/Analysis-of-the-Impact-of-Multi-Priority-Queue-and-NOMA-on-Age-of-Information-in-C-V2X

Showing 1–50 of 1,970 results for author: Wu, Q