Skip to main content

Showing 1–50 of 3,220 results for author: Zhang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03155  [pdf, other

    cs.CL cs.AI

    Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

    Authors: Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui

    Abstract: Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 12 pages

    ACM Class: I.2.4

  2. arXiv:2409.02889  [pdf, other

    cs.CL cs.AI cs.CV cs.MM

    LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

    Authors: Xidong Wang, Dingjie Song, Shunian Chen, Chen Zhang, Benyou Wang

    Abstract: Expanding the long-context capabilities of Multi-modal Large Language Models~(MLLMs) is crucial for video understanding, high-resolution image understanding, and multi-modal agents. This involves a series of systematic optimizations, including model architecture, data construction and training strategy, particularly addressing challenges such as \textit{degraded performance with more images} and \… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 19 pages, 7 figures, 6 tables

  3. arXiv:2409.02708  [pdf, other

    cs.LG stat.ME

    Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

    Authors: Chaozhi Zhang, Lin Liu, Xiaoqun Zhang

    Abstract: Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learni… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02020  [pdf, other

    cs.CV

    Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique

    Authors: Qiang Zheng, Chao Zhang, Jian Sun

    Abstract: The rapid advancement in point cloud processing technologies has significantly increased the demand for efficient and compact models that achieve high-accuracy classification. Knowledge distillation has emerged as a potent model compression technique. However, traditional KD often requires extensive computational resources for forward inference of large teacher models, thereby reducing training ef… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  5. arXiv:2409.02007  [pdf, other

    cs.CV

    PMT-MAE: Dual-Branch Self-Supervised Learning with Distillation for Efficient Point Cloud Classification

    Authors: Qiang Zheng, Chao Zhang, Jian Sun

    Abstract: Advances in self-supervised learning are essential for enhancing feature extraction and understanding in point cloud processing. This paper introduces PMT-MAE (Point MLP-Transformer Masked Autoencoder), a novel self-supervised learning framework for point cloud classification. PMT-MAE features a dual-branch architecture that integrates Transformer and MLP components to capture rich features. The T… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  6. arXiv:2409.01998  [pdf, other

    cs.CV

    SA-MLP: Enhancing Point Cloud Classification with Efficient Addition and Shift Operations in MLP Architectures

    Authors: Qiang Zheng, Chao Zhang, Jian Sun

    Abstract: This study addresses the computational inefficiencies in point cloud classification by introducing novel MLP-based architectures inspired by recent advances in CNN optimization. Traditional neural networks heavily rely on multiplication operations, which are computationally expensive. To tackle this, we propose Add-MLP and Shift-MLP, which replace multiplications with addition and shift operations… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  7. arXiv:2409.01421  [pdf, other

    cs.GR cs.CV

    DiffCSG: Differentiable CSG via Rasterization

    Authors: Haocheng Yuan, Adrien Bousseau, Hao Pan, Chengquan Zhang, Niloy J. Mitra, Changjian Li

    Abstract: Differentiable rendering is a key ingredient for inverse rendering and machine learning, as it allows to optimize scene parameters (shape, materials, lighting) to best fit target images. Differentiable rendering requires that each scene parameter relates to pixel values through differentiable operations. While 3D mesh rendering algorithms have been implemented in a differentiable way, these algori… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  8. arXiv:2409.00978  [pdf, ps, other

    cs.IT eess.SP

    Uplink Over-the-Air Aggregation for Multi-Model Wireless Federated Learning

    Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed

    Abstract: We propose an uplink over-the-air aggregation (OAA) method for wireless federated learning (FL) that simultaneously trains multiple models. To maximize the multi-model training convergence rate, we derive an upper bound on the optimality gap of the global model update, and then, formulate an uplink joint transmit-receive beamforming optimization problem to minimize this upper bound. We solve this… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 5 pages, 5 figures. Accepted by IEEE SPAWC 2024. arXiv admin note: text overlap with arXiv:2312.13424

  9. arXiv:2409.00856  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

    Authors: William Zhang, Maria Leon, Ryan Xu, Adrian Cardenas, Amelia Wissink, Hanna Martin, Maya Srikanth, Kaya Dorogi, Christian Valadez, Pedro Perez, Citlalli Grijalva, Corey Zhang, Mark Santolucito

    Abstract: Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  10. arXiv:2409.00670  [pdf, other

    cs.LG cs.SI

    Towards Faster Graph Partitioning via Pre-training and Inductive Inference

    Authors: Meng Qin, Chaorui Zhang, Yu Gao, Yibin Ding, Weipeng Jiang, Weixi Zhang, Wei Han, Bo Bai

    Abstract: Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep gra… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Champion winner of IEEE HPEC 2024 Graph Challenge (https://graphchallenge.mit.edu/champions)

  11. arXiv:2409.00633  [pdf, other

    cs.CV

    Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

    Authors: Dingyuan Zhang, Dingkang Liang, Zichang Tan, Xiaoqing Ye, Cheng Zhang, Jingdong Wang, Xiang Bai

    Abstract: Slow inference speed is one of the most crucial concerns for deploying multi-view 3D detectors to tasks with high real-time requirements like autonomous driving. Although many sparse query-based methods have already attempted to improve the efficiency of 3D detectors, they neglect to consider the backbone, especially when using Vision Transformers (ViT) for better performance. To tackle this probl… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2409.00125  [pdf

    cs.LG cs.AI stat.ML

    A Hybrid Framework for Spatial Interpolation: Merging Data-driven with Domain Knowledge

    Authors: Cong Zhang, Shuyi Du, Hongqing Song, Yuhe Wang

    Abstract: Estimating spatially distributed information through the interpolation of scattered observation datasets often overlooks the critical role of domain knowledge in understanding spatial dependencies. Additionally, the features of these data sets are typically limited to the spatial coordinates of the scattered observation locations. In this paper, we propose a hybrid framework that integrates data-d… ▽ More

    Submitted 4 September, 2024; v1 submitted 28 August, 2024; originally announced September 2024.

    Comments: 21 pages, 13 figures; typos corrected, references updated

  13. arXiv:2408.17431  [pdf, other

    eess.AS cs.AI

    Advancing Multi-talker ASR Performance with Large Language Models

    Authors: Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

    Abstract: Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with the idea of concatenating transcriptions from multiple speakers according to the emission times of their speech for training. However, SOT-style transcr… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE SLT 2024

  14. arXiv:2408.16732  [pdf, other

    q-bio.NC cs.SD eess.AS q-bio.QM

    Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

    Authors: Cong Zhang, Wenxing Guo, Hongsheng Dai

    Abstract: This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically ext… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  15. arXiv:2408.15585  [pdf, other

    cs.SD eess.AS

    Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

    Authors: Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks to derive highly discriminative speaker embeddings.Experimental results demonstrate that using the middle to later blocks of the Whisper encoder keeps… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024

  16. arXiv:2408.15425  [pdf, other

    cs.RO cs.AI cs.SE

    Fast and Modular Autonomy Software for Autonomous Racing Vehicles

    Authors: Andrew Saba, Aderotimi Adetunji, Adam Johnson, Aadi Kothari, Matthew Sivaprakasam, Joshua Spisak, Prem Bharatia, Arjun Chauhan, Brendan Duff Jr., Noah Gasparro, Charles King, Ryan Larkin, Brian Mao, Micah Nye, Anjali Parashar, Joseph Attias, Aurimas Balciunas, Austin Brown, Chris Chang, Ming Gao, Cindy Heredia, Andrew Keats, Jose Lavariega, William Muckelroy III, Andre Slavescu , et al. (5 additional authors not shown)

    Abstract: Autonomous motorsports aim to replicate the human racecar driver with software and sensors. As in traditional motorsports, Autonomous Racing Vehicles (ARVs) are pushed to their handling limits in multi-agent scenarios at extremely high ($\geq 150mph$) speeds. This Operational Design Domain (ODD) presents unique challenges across the autonomy stack. The Indy Autonomous Challenge (IAC) is an interna… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Published in Journal of Field Robotics

    Journal ref: Field Robotics Volume 4 (2024) 1-45

  17. arXiv:2408.14507  [pdf, other

    cs.DB cs.AI

    Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework

    Authors: Longyu Feng, Huahang Li, Chen Jason Zhang

    Abstract: Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling proba… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  18. arXiv:2408.14380  [pdf, other

    cs.CL cs.AI

    Probing Causality Manipulation of Large Language Models

    Authors: Chenyang Zhang, Haibo Tong, Bin Zhang, Dongyu Zhang

    Abstract: Large language models (LLMs) have shown various ability on natural language processing, including problems about causality. It is not intuitive for LLMs to command causality, since pretrained models usually work on statistical associations, and do not focus on causes and effects in sentences. So that probing internal manipulation of causality is necessary for LLMs. This paper proposes a novel appr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  19. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  20. arXiv:2408.13960  [pdf, other

    cs.LG cs.AI cs.CY

    Time Series Analysis for Education: Methods, Applications, and Future Directions

    Authors: Shengzhong Mao, Chaoli Zhang, Yichi Song, Jindong Wang, Xiao-Jun Zeng, Zenglin Xu, Qingsong Wen

    Abstract: Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comp… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 24 pages, 3 figures, 6 tables, project page: see https://github.com/ai-for-edu/time-series-analysis-for-education

  21. arXiv:2408.13855  [pdf, other

    cs.SE

    An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

    Authors: Han Cui, Menglei Xie, Ting Su, Chengyu Zhang, Shin Hwei Tan

    Abstract: Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the mainta… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  22. arXiv:2408.13854  [pdf, other

    cs.CV cs.AI

    Tangram: A Challenging Benchmark for Geometric Element Recognizing

    Authors: Jiamin Tang, Chao Zhang, Xudong Zhu, Mengchi Liu

    Abstract: Significant advancements in Large Multimodal Models (LMMs) have enabled them to tackle complex problems involving visual-mathematical reasoning. However, their ability to identify geometric elements remains understudied. To bridge this gap, we introduce Tangram, a novel benchmark designed to evaluate the performance of LMMs on geometric element recognition. Tangram includes 1,080 diverse geometric… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

  23. arXiv:2408.13774  [pdf, other

    cs.CV

    Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild

    Authors: Fares Bougourzi, Fadi Dornaika, Chongsheng Zhang

    Abstract: Text recognition in the wild is an important technique for digital maps and urban scene understanding, in which the natural resembling properties between glyphs is one of the major reasons that lead to wrong recognition results. To address this challenge, we introduce two extremely fine-grained visual recognition benchmark datasets that contain very challenging resembling glyphs (characters/letter… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 Figures, 8 Tables

  24. arXiv:2408.13770  [pdf, other

    cs.CV

    TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

    Authors: Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang

    Abstract: Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlap… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  25. arXiv:2408.13533  [pdf, other

    cs.CL

    Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

    Authors: Jinyang Wu, Feihu Che, Chuyuan Zhang, Jianhua Tao, Shuai Zhang, Pengpeng Shao

    Abstract: Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs). While recent research has extended RAG models to complex noisy scenarios, these explorations often confine themselves to limited noise types and presuppose that noise is inherently detrimental to LLMs, potentially deviating from real-world retrieval environments and r… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  26. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  27. arXiv:2408.11882  [pdf

    q-bio.NC cs.SD eess.AS q-bio.QM

    Prosody of speech production in latent post-stroke aphasia

    Authors: Cong Zhang, Tong Li, Gayle DeDe, Christos Salis

    Abstract: This study explores prosodic production in latent aphasia, a mild form of aphasia associated with left-hemisphere brain damage (e.g. stroke). Unlike prior research on moderate to severe aphasia, we investigated latent aphasia, which can seem to have very similar speech production with neurotypical speech. We analysed the f0, intensity and duration of utterance-initial and utterance-final words of… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Interspeech 2024

  28. arXiv:2408.11824   

    cs.HC cs.AI

    AppAgent v2: Advanced Agent for Flexible Mobile Interactions

    Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

    Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More

    Submitted 23 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Pre-print version, some content needs to be supplemented

  29. arXiv:2408.11416  [pdf, other

    cs.MA cs.RO

    Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

    Authors: Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang

    Abstract: Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simple… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  30. arXiv:2408.11407  [pdf, other

    cs.CV

    Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection

    Authors: Liang Yao, Fan Liu, Chuanyi Zhang, Zhiquan Ou, Ting Wu

    Abstract: Knowledge distillation (KD) is an effective method for compressing models in object detection tasks. Due to limited computational capability, UAV-based object detection (UAV-OD) widely adopt the KD technique to obtain lightweight detectors. Existing methods often overlook the significant differences in feature space caused by the large gap in scale between the teacher and student models. This limi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  31. arXiv:2408.11297  [pdf, other

    cs.CV

    Making Large Vision Language Models to be Good Few-shot Learners

    Authors: Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

    Abstract: Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk lear… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  32. arXiv:2408.09269  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs

    Authors: Anshuman Sinha, Camille Migozzi, Aubin Rey, Chao Zhang

    Abstract: Research on multi-modal contrastive learning strategies for audio and text has rapidly gained interest. Contrastively trained Audio-Language Models (ALMs), such as CLAP, which establish a unified representation across audio and language modalities, have enhanced the efficacy in various subsequent tasks by providing good text aligned audio encoders and vice versa. These improvements are evident in… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 31 pages, 11 figures

  33. arXiv:2408.08931  [pdf, other

    cs.IR cs.AI cs.LG

    Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

    Authors: Zhiwei Li, Guodong Long, Tianyi Zhou, Jing Jiang, Chengqi Zhang

    Abstract: Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usu… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures, 4 tables, conference

  34. arXiv:2408.08882  [pdf, other

    cs.DC

    A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR

    Authors: Yichao Zhang, Marco Bertuletti, Chi Zhang, Samuel Riedel, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: We introduce an open-source architecture for next-generation Radio-Access Network baseband processing: 1024 latency-tolerant 32-bit RISC-V cores share 4 MiB of L1 memory via an ultra-low latency interconnect (7-11 cycles), a modular Direct Memory Access engine provides an efficient link to a high bandwidth memory, such as HBM2E (98% peak bandwidth at 910GBps). The system achieves leading-edge ener… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  35. arXiv:2408.08588  [pdf, other

    cs.IT eess.SP

    Movable Antenna for Wireless Communications:Prototyping and Experimental Results

    Authors: Zhenjun Dong, Zhiwen Zhou, Zhiqiang Xiao, Chaoyue Zhang, Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA), which can flexibly change the position of antenna in three-dimensional (3D) continuous space, is an emerging technology for achieving full spatial performance gains. In this paper, a prototype of MA communication system with ultra-accurate movement control is presented to verify the performance gain of MA in practical environments. The prototype utilizes the feedback control… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  36. arXiv:2408.08315  [pdf, other

    cs.CV cs.AI

    Segment Anything for Videos: A Systematic Survey

    Authors: Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan

    Abstract: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

    Comments: https://github.com/983632847/SAM-for-Videos

  37. arXiv:2408.08209  [pdf, other

    cs.IR

    Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

    Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  38. arXiv:2408.08192  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

    Authors: Chenyu Zhang, Xu Chen, Xuan Di

    Abstract: Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  39. arXiv:2408.08105  [pdf, other

    cs.CV cs.AI

    Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

    Authors: Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

    Abstract: Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 20 pages

  40. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  41. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  42. arXiv:2408.07401  [pdf, other

    cs.CL cs.AI cs.DB

    DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

    Authors: Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

    Abstract: Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  43. arXiv:2408.06901  [pdf, other

    cs.CV

    Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries

    Authors: Qi Song, Qingyong Hu, Chi Zhang, Yongquan Chen, Rui Huang

    Abstract: 3D perception tasks, such as 3D object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic and 3D scene layouts are crucial for this task, existing techniques often neglect the synergistic effects of semantic and depth cues, leading to the occurrence of classification and po… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted by TIP 2024

  44. arXiv:2408.06385  [pdf, other

    cs.SE cs.AI cs.CL

    ViC: Virtual Compiler Is All You Need For Assembly Code Search

    Authors: Zeyu Gao, Hao Wang, Yuanda Wang, Chao Zhang

    Abstract: Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs. Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leverag… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  45. arXiv:2408.06294  [pdf, other

    cs.HC

    AniBalloons: Animated Chat Balloons as Affective Augmentation for Social Messaging and Chatbot Interaction

    Authors: Pengcheng An, Chaoyu Zhang, Haichen Gao, Ziqi Zhou, Yage Xiao, Jian Zhao

    Abstract: Despite being prominent and ubiquitous, message-based interaction is limited in nonverbally conveying emotions. Besides emoticons or stickers, messaging users continue seeking richer options for affective communication. Recent research explored using chat balloons' shape and color to communicate emotional states. However, little work explored whether and how chat-balloon animations could be design… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: under the 2nd review after minor revision by International Journal of Human-Computer Studies

  46. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  47. arXiv:2408.05705  [pdf, other

    eess.IV cs.AI cs.CV

    TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling

    Authors: Ruiquan Ge, Xiao Yu, Yifei Chen, Fan Jia, Shenghao Zhu, Guanyu Zhou, Yiyu Huang, Chenyan Zhang, Dong Zeng, Changmiao Wang, Qiegen Liu, Shanzhou Niu

    Abstract: Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

  48. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  49. arXiv:2408.05508  [pdf, other

    cs.CV

    PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

    Authors: Qiang Zheng, Chao Zhang, Jian Sun

    Abstract: In recent years, point cloud analysis methods based on the Transformer architecture have made significant progress, particularly in the context of multimedia applications such as 3D modeling, virtual reality, and autonomous systems. However, the high computational resource demands of the Transformer architecture hinder its scalability, real-time processing capabilities, and deployment on mobile de… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  50. arXiv:2408.05233  [pdf, other

    cs.AI

    Large Language Model based Agent Framework for Electric Vehicle Charging Behavior Simulation

    Authors: Junkang Feng, Chenggang Cui, Chuanlin Zhang, Zizhu Fan

    Abstract: This paper introduces a new LLM based agent framework for simulating electric vehicle (EV) charging behavior, integrating user preferences, psychological characteristics, and environmental factors to optimize the charging process. The framework comprises several modules, enabling sophisticated, adaptive simulations. Dynamic decision making is supported by continuous reflection and memory updates,… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 7 pages,3 figures