Skip to main content

Showing 1–50 of 1,754 results for author: Ma, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02914  [pdf, other

    cs.CV

    Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving

    Authors: Yuhang Lu, Yichen Yao, Jiadong Tu, Jiangnan Shao, Yuexin Ma, Xinge Zhu

    Abstract: Large Vision-Language Models (LVLMs) have recently garnered significant attention, with many efforts aimed at harnessing their general knowledge to enhance the interpretability and robustness of autonomous driving models. However, LVLMs typically rely on large, general-purpose datasets and lack the specialized expertise required for professional and safe driving. Existing vision-language driving d… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02426  [pdf, other

    cs.LG cs.CV

    Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

    Authors: Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

    Abstract: Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observ… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 39 pages, 9 figures

  3. arXiv:2409.02123  [pdf, other

    cs.LG cs.AI physics.ao-ph

    PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

    Authors: Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

    Abstract: Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  4. arXiv:2409.01807  [pdf, other

    cs.CV

    EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video

    Authors: Zhen Zhou, Yunkai Ma, Junfeng Fan, Shaolin Zhang, Fengshui Jing, Min Tan

    Abstract: Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-v… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  5. arXiv:2409.01055  [pdf, other

    cs.CV

    Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

    Authors: Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu

    Abstract: This paper explores higher-resolution video outpainting with extensive content generation. We point out common issues faced by existing methods when attempting to largely outpaint videos: the generation of low-quality content and limitations imposed by GPU memory. To address these challenges, we propose a diffusion-based method called \textit{Follow-Your-Canvas}. It builds upon two core designs. F… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Github: https://github.com/mayuelala/FollowYourCanvas Page: https://follow-your-canvas.github.io/

  6. arXiv:2409.00405  [pdf, other

    cs.NI

    UAV-Enabled Wireless Networks for Integrated Sensing and Learning-Oriented Communication

    Authors: Wenhao Zhuang, Xinyu He, Yuyi Mao, Juan Liu

    Abstract: Future wireless networks are envisioned to support both sensing and artificial intelligence (AI) services. However, conventional integrated sensing and communication (ISAC) networks may not be suitable due to the ignorance of diverse task-specific data utilities in different AI applications. In this letter, a full-duplex unmanned aerial vehicle (UAV)-enabled wireless network providing sensing and… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 5 pages and 6 figures. This article was submitted to IEEE for possible publication

  7. arXiv:2408.16992  [pdf, other

    cs.DL physics.soc-ph

    Exaptation: Academic mentees' career pathway to be independent and impactful

    Authors: Yanmeng Xing, Ye Sun, Tongxin Pan, Xianglong Liang, Giacomo Livan, Yifang Ma

    Abstract: In science, mentees often follow their mentors' career paths, but exceptional mentees frequently break from this routine, sometimes even outperforming their mentors. However, the pathways to independence for these excellent mentees and their interactions with mentors remain unclear. We analyzed the careers of over 500,000 mentees in Chemistry, Neuroscience, and Physics over the past 60 years to ex… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 20 pages, 5 figures

    MSC Class: 94-00

  8. arXiv:2408.16990  [pdf, other

    cs.MM

    Video to Music Moment Retrieval

    Authors: Zijie Xin, Minquan Wang, Ye Ma, Bo Wang, Quan Chen, Peng Jiang, Xirong Li

    Abstract: Adding proper background music helps complete a short video to be shared. Towards automating the task, previous research focuses on video-to-music retrieval (VMR), aiming to find amidst a collection of music the one best matching the content of a given video. Since music tracks are typically much longer than short videos, meaning the returned music has to be cut to a shorter moment, there is a cle… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.15978  [pdf, other

    cs.AI

    WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

    Authors: Yao Zhang, Zijian Ma, Yunpu Ma, Zhen Han, Yu Wu, Volker Tresp

    Abstract: LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  10. arXiv:2408.15257  [pdf

    cs.CL cs.AI

    Text classification optimization algorithm based on graph neural network

    Authors: Erdi Gao, Haowei Yang, Dan Sun, Haohao Xia, Yuhan Ma, Yuanjing Zhu

    Abstract: In the field of natural language processing, text classification, as a basic task, has important research value and application prospects. Traditional text classification methods usually rely on feature representations such as the bag of words model or TF-IDF, which overlook the semantic connections between words and make it challenging to grasp the deep structural details of the text. Recently, G… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.17460 by other authors

  11. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan , et al. (17 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 3 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  12. arXiv:2408.14197  [pdf, other

    cs.CV

    Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

    Authors: Yu Yang, Jianbiao Mei, Yukai Ma, Siliang Du, Wenqing Chen, Yijie Qian, Yuxiang Feng, Yong Liu

    Abstract: World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D fo… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 18 pages, 10 figures

  13. arXiv:2408.14180  [pdf, other

    cs.CV cs.AI

    I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

    Authors: Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji

    Abstract: Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench,… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Tech report, 39 pages, 41 figures

  14. arXiv:2408.13226  [pdf, other

    cs.CV

    D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

    Authors: Jingyu Liu, Minquan Wang, Ye Ma, Bo Wang, Aozhu Chen, Quan Chen, Peng Jiang, Xirong Li

    Abstract: Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to these key moments, or video decoration with SFX (VDSFX), is crucial for enhancing the user engaging experience. Previous studies ab… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures

  15. arXiv:2408.12942  [pdf, other

    cs.CL cs.AI

    Causal-Guided Active Learning for Debiasing Large Language Models

    Authors: Li Du, Zhouhao Sun, Xiao Ding, Yixuan Ma, Yang Zhao, Kaitao Qiu, Ting Liu, Bing Qin

    Abstract: Although achieving promising performance, recent analyses show that current generative large language models (LLMs) may still capture dataset biases and utilize them for generation, leading to poor generalizability and harmfulness of LLMs. However, due to the diversity of dataset biases and the over-optimization problem, previous prior-knowledge-based debiasing methods and fine-tuning-based debias… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted as ACL 2024 main conference & Rewared as Outstanding Paper

  16. arXiv:2408.12255  [pdf, ps, other

    cs.IT eess.SP

    Fast Iterative ELAA-MIMO Detection Exploiting Static Channel Components

    Authors: Jiuyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Extremely large aperture array (ELAA) is a promising multiple-input multiple-output (MIMO) technique for next generation mobile networks. In this paper, we propose two novel approaches to accelerate the convergence of current iterative MIMO detectors in ELAA channels. Our approaches exploit the static components of the ELAA channel, which include line of sight (LoS) paths and deterministic non-LoS… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: This work has been accepted by the IEEE Information Theory Workshop (ITW) 2024. Copyright may be transferred without notice, after which this version may no longer be accessible

  17. arXiv:2408.11871  [pdf, other

    cs.CL cs.AI

    MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

    Authors: Lionel Z. Wang, Yiming Ma, Renfei Gao, Beichen Guo, Zhuoran Li, Han Zhu, Wenqi Fan, Zexin Lu, Ka Chung Ng

    Abstract: The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psyc… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  18. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  19. arXiv:2408.11296  [pdf, other

    cs.SE cs.CL

    RePair: Automated Program Repair with Process-based Feedback

    Authors: Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

    Abstract: The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedent… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 15 pages, 13 figures

    Journal ref: ACL 2024 Findings

  20. arXiv:2408.11243  [pdf, other

    cs.LG cs.AI

    Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?

    Authors: Qian Ma, Haitao Mao, Jingzhe Liu, Zhehua Zhang, Chunlin Feng, Yu Song, Yihan Shao, Yao Ma

    Abstract: Self-supervised learning~(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data. The reason for its success is that a suitable SSL design can help the model to follow the neural scaling law, i.e., the performance consistently improves with increasing model and dataset sizes. However, it remains a mystery whether exist… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.10613  [pdf, other

    cs.IR

    Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

    Authors: Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu

    Abstract: Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably leads to sub-optimal retrieval performances. In this paper, we propose a new task-le… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  22. arXiv:2408.10599  [pdf, other

    hep-ex cs.CV

    Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

    Authors: Hongtian Yu, Yangu Li, Mingrui Wu, Letian Shen, Yue Liu, Yunxuan Song, Qixiang Ye, Xiaorui Lyu, Yajun Mao, Yangheng Zheng, Yunfan Liu

    Abstract: In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  23. arXiv:2408.10130  [pdf

    cs.CL cs.AI

    Rhyme-aware Chinese lyric generator based on GPT

    Authors: Yixiao Yuan, Yangchen Huang, Yu Ma, Xinjin Li, Zhenglin Li, Yiming Shi, Huapeng Zhou

    Abstract: Neural language representation models such as GPT, pre-trained on large-scale corpora, can effectively capture rich semantic patterns from plain text and be fine-tuned to consistently improve natural language generation performance. However, existing pre-trained language models used to generate lyrics rarely consider rhyme information, which is crucial in lyrics. Using a pre-trained model directly… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  24. arXiv:2408.09667  [pdf, other

    cs.CL

    BLADE: Benchmarking Language Model Agents for Data-Driven Science

    Authors: Ken Gu, Ruoxi Shang, Ruien Jiang, Keying Kuang, Richard-John Lin, Donghe Lyu, Yue Mao, Youran Pan, Teng Wu, Jiaqian Yu, Yikun Zhang, Tianmai M. Zhang, Lanyi Zhu, Mike A. Merrill, Jeffrey Heer, Tim Althoff

    Abstract: Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-dri… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  25. arXiv:2408.09476  [pdf, other

    cs.CV cs.LG

    Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions

    Authors: Jun Wang, Yu Mao, Nan Guan, Chun Jason Xue

    Abstract: Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  26. arXiv:2408.09347  [pdf, other

    cs.CV

    S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis

    Authors: Dongze Li, Kang Zhao, Wei Wang, Yifeng Ma, Bo Peng, Yingya Zhang, Jing Dong

    Abstract: Talking head synthesis is a practical technique with wide applications. Current Neural Radiance Field (NeRF) based approaches have shown their superiority on driving one-shot talking heads with videos or signals regressed from audio. However, most of them failed to take the audio as driven information directly, unable to enjoy the flexibility and availability of speech. Since mapping audio signals… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  27. arXiv:2408.08746  [pdf, other

    cs.IT eess.SP

    Accelerating Iteratively Linear Detectors in Multi-User (ELAA-)MIMO Systems with UW-SVD

    Authors: Jiuyu Liu, Yi Ma, Jinfei Wang, Rahim Tafazolli

    Abstract: Current iterative multiple-input multiple-output (MIMO) detectors suffer from slow convergence when the wireless channel is ill-conditioned. The ill-conditioning is mainly caused by spatial correlation between channel columns corresponding to the same user equipment, known as intra-user interference. In addition, in the emerging MIMO systems using an extremely large aperture array (ELAA), spatial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This work has been accepted by IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

  28. arXiv:2408.08202  [pdf, other

    cs.CV

    Towards Practical Human Motion Prediction with LiDAR Point Clouds

    Authors: Xiao Han, Yiming Ren, Yichen Yao, Yujing Sun, Yuexin Ma

    Abstract: Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  29. arXiv:2408.08147  [pdf, other

    cs.DC cs.CL cs.LG

    P/D-Serve: Serving Disaggregated Large Language Model at Scale

    Authors: Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun, Tiandeng Wu, Xing Chu, Ruizhi Huan, Li Ma, Xiao You, Wenting Zhou, Yunpeng Ye, Wen Liu, Xiangkun Xu, Yongsheng Zhang, Tiantian Dong, Jiawei Zhu, Zhe Wang, Xijian Ju, Jianxun Song , et al. (5 additional authors not shown)

    Abstract: Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-g… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  30. arXiv:2408.07325  [pdf, other

    eess.IV cs.GR

    RoCoSDF: Row-Column Scanned Neural Signed Distance Fields for Freehand 3D Ultrasound Imaging Shape Reconstruction

    Authors: Hongbo Chen, Yuchong Gao, Shuhang Zhang, Jiangjie Wu, Yuexin Ma, Rui Zheng

    Abstract: The reconstruction of high-quality shape geometry is crucial for developing freehand 3D ultrasound imaging. However, the shape reconstruction of multi-view ultrasound data remains challenging due to the elevation distortion caused by thick transducer probes. In this paper, we present a novel learning-based framework RoCoSDF, which can effectively generate an implicit surface through continuous sha… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  31. arXiv:2408.07266  [pdf, other

    cs.CV cs.RO

    Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

    Authors: Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou

    Abstract: Scale-aware monocular depth estimation poses a significant challenge in computer-aided endoscopic navigation. However, existing depth estimation methods that do not consider the geometric priors struggle to learn the absolute scale from training with monocular endoscopic sequences. Additionally, conventional methods face difficulties in accurately estimating details on tissue and instruments bound… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  32. arXiv:2408.07196  [pdf, other

    cs.CV

    SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis

    Authors: Yuchen Mao, Hongwei Li, Wei Pang, Giorgos Papanastasiou, Guang Yang, Chengjia Wang

    Abstract: The persistent challenge of medical image synthesis posed by the scarcity of annotated data and the need to synthesize `missing modalities' for multi-modal analysis, underscored the imperative development of effective synthesis methods. Recently, the combination of Low-Rank Adaptation (LoRA) with latent diffusion models (LDMs) has emerged as a viable approach for efficiently adapting pre-trained l… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Project Page: https://yuchen20.github.io/SeLoRA.github.io/

  33. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  34. arXiv:2408.06935  [pdf, other

    cs.AR

    UFO-MAC: A Unified Framework for Optimization of High-Performance Multipliers and Multiply-Accumulators

    Authors: Dongsheng Zuo, Jiadong Zhu, Chenglin Li, Yuzhe Ma

    Abstract: Multipliers and multiply-accumulators (MACs) are critical arithmetic circuit components in the modern era. As essential components of AI accelerators, they significantly influence the area and performance of compute-intensive circuits. This paper presents UFO-MAC, a unified framework for the optimization of multipliers and MACs. Specifically, UFO-MAC employs an optimal compressor tree structure an… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: In proceeding of ICCAD 2024

  35. arXiv:2408.06030  [pdf, other

    cs.RO

    Developing Smart MAVs for Autonomous Inspection in GPS-denied Constructions

    Authors: Paoqiang Pan, Kewei Hu, Xiao Huang, Wei Ying, Xiaoxuan Xie, Yue Ma, Naizhong Zhang, Hanwen Kang

    Abstract: Smart Micro Aerial Vehicles (MAVs) have transformed infrastructure inspection by enabling efficient, high-resolution monitoring at various stages of construction, including hard-to-reach areas. Traditional manual operation of drones in GPS-denied environments, such as industrial facilities and infrastructure, is labour-intensive, tedious and prone to error. This study presents an innovative framew… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  36. arXiv:2408.05966  [pdf, other

    cs.CV cs.AI cs.GR cs.MM

    Freehand Sketch Generation from Mechanical Components

    Authors: Zhichao Liao, Di Huang, Heming Fang, Yue Ma, Fengyuan Piao, Xinghui Li, Long Zeng, Pingfa Feng

    Abstract: Drawing freehand sketches of mechanical components on multimedia devices for AI-based engineering modeling has become a new trend. However, its development is being impeded because existing works cannot produce suitable sketches for data-driven research. These works either generate sketches lacking a freehand style or utilize generative models not originally designed for this task resulting in poo… ▽ More

    Submitted 21 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Published at ACM Multimedia (ACM MM) 2024

  37. arXiv:2408.05517  [pdf, other

    cs.CL

    SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

    Authors: Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

    Abstract: Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task… ▽ More

    Submitted 18 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  38. arXiv:2408.05029  [pdf, other

    cs.CV

    Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-Like Space Target Detection

    Authors: Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Hongfeng Long, Kaili Lu, Enhai Liu, Rujin Zhao

    Abstract: Stripe-like space target detection (SSTD) is crucial for space situational awareness. Traditional unsupervised methods often fail in low signal-to-noise ratio and variable stripe-like space targets scenarios, leading to weak generalization. Although fully supervised learning methods improve model generalization, they require extensive pixel-level labels for training. In the SSTD task, manually cre… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  39. arXiv:2408.04813  [pdf, other

    cs.CV

    Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

    Authors: Yingfan Ma, Xiaoyuan Luo, Mingzhi Yuan, Xinrong Chen, Manning Wang

    Abstract: Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective, both of which ignore important information contained in some instances and result in limited performance. For example, existing methods often face difficulty in learning hard positive instances. In this paper, we formulate MIL as a semi-supervised instance classificat… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  40. arXiv:2408.04713  [pdf, other

    cs.LG cs.AI

    DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models

    Authors: Zifeng Ding, Yifeng Li, Yuan He, Antonio Norelli, Jingcheng Wu, Volker Tresp, Yunpu Ma, Michael Bronstein

    Abstract: Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2)… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Preprint. Work on progress

  41. arXiv:2408.04539  [pdf, other

    cs.NE cs.HC

    ParetoTracker: Understanding Population Dynamics in Multi-objective Evolutionary Algorithms through Visual Analytics

    Authors: Zherui Zhang, Fan Yang, Ran Cheng, Yuxin Ma

    Abstract: Multi-objective evolutionary algorithms (MOEAs) have emerged as powerful tools for solving complex optimization problems characterized by multiple, often conflicting, objectives. While advancements have been made in computational efficiency as well as diversity and convergence of solutions, a critical challenge persists: the internal evolutionary mechanisms are opaque to human users. Drawing upon… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE VIS 2024 (will appear in IEEE TVCG)

  42. arXiv:2408.04388  [pdf, other

    cs.MM cs.AI cs.IR

    MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

    Authors: Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

    Abstract: We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models. Compared to using text or graph modalities, the investigation of utilizing images for temporal event forecasting has not been fully explored, especially in the era of large language models (LLMs). To bridge this gap, we are particularly interested in two key questions of: 1) why images… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    ACM Class: H.3.3

  43. arXiv:2408.04313  [pdf, other

    stat.ML cs.LG stat.ME

    Better Locally Private Sparse Estimation Given Multiple Samples Per User

    Authors: Yuheng Ma, Ke Jia, Hanfang Yang

    Abstract: Previous studies yielded discouraging results for item-level locally differentially private linear regression with $s^*$-sparsity assumption, where the minimax rate for $nm$ samples is $\mathcal{O}(s^{*}d / nm\varepsilon^2)$. This can be challenging for high-dimensional data, where the dimension $d$ is extremely large. In this work, we investigate user-level locally differentially private sparse l… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Journal ref: ICML2024 Proceedings

  44. arXiv:2408.04171  [pdf, other

    cs.CV

    Rotation center identification based on geometric relationships for rotary motion deblurring

    Authors: Jinhui Qin, Yong Ma, Jun Huang, Fan Fan, You Du

    Abstract: Non-blind rotary motion deblurring (RMD) aims to recover the latent clear image from a rotary motion blurred (RMB) image. The rotation center is a crucial input parameter in non-blind RMD methods. Existing methods directly estimate the rotation center from the RMB image. However they always suffer significant errors, and the performance of RMD is limited. For the assembled imaging systems, the pos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  45. arXiv:2408.03768  [pdf, other

    cs.RO

    HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

    Authors: Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, Guillaume Sartoretti

    Abstract: In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical atten… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Submitted to RA-L

  46. arXiv:2408.03680  [pdf, other

    cs.SE

    Iterative Knowledge Distillation through Feedback-Driven Learning Cycles

    Authors: Yujia Chen, Yang Ye, Zhongqi Li, Yuchi Ma, Cuiyun Gao

    Abstract: Large code models (LCMs) have remarkably advanced the field of code intelligence. Despite their impressive capabilities, they still face practical employment challenges, such as high costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These challenges highlight the critical need for more accessible, lightweight yet effective LCMs. In this paper, we propos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  47. Hierarchical Neural Constructive Solver for Real-world TSP Scenarios

    Authors: Yong Liang Goh, Zhiguang Cao, Yining Ma, Yanfei Dong, Mohammed Haroon Dupty, Wee Sun Lee

    Abstract: Existing neural constructive solvers for routing problems have predominantly employed transformer architectures, conceptualizing the route construction as a set-to-sequence learning task. However, their efficacy has primarily been demonstrated on entirely random problem instances that inadequately capture real-world scenarios. In this paper, we introduce realistic Traveling Salesman Problem (TSP)… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to KDD 2024

  48. arXiv:2408.03393  [pdf, other

    eess.IV cs.CV cs.GR

    Biomedical Image Segmentation: A Systematic Literature Review of Deep Learning Based Object Detection Methods

    Authors: Fazli Wahid, Yingliang Ma, Dawar Khan, Muhammad Aamir, Syed U. K. Bukhari

    Abstract: Biomedical image segmentation plays a vital role in diagnosis of diseases across various organs. Deep learning-based object detection methods are commonly used for such segmentation. There exists an extensive research in this topic. However, there is no standard review on this topic. Existing surveys often lack a standardized approach or focus on broader segmentation techniques. In this paper, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  49. arXiv:2408.01653  [pdf, other

    cs.CV

    MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas

    Authors: Feng Qiao, Zhexiao Xiong, Xinge Zhu, Yuexin Ma, Qiumeng He, Nathan Jacobs

    Abstract: We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth), a two-stage framework for omnidirectional depth estimation via stereo matching between multiple cylindrical panoramas. MCPDepth uses cylindrical panoramas for initial stereo matching and then fuses the resulting depth maps across views. A circular attention module is employed to overcome the distortion along the vertical axis. M… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  50. arXiv:2408.01147  [pdf, other

    cs.RO

    Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning

    Authors: Yueen Ma, Dafeng Chi, Shiguang Wu, Yuecheng Liu, Yuzheng Zhuang, Jianye Hao, Irwin King

    Abstract: Vision-language-action models have gained significant attention for their ability to model trajectories in robot learning. However, most existing models rely on Transformer models with vanilla causal attention, which we find suboptimal for processing segmented multi-modal sequences. Additionally, the autoregressive generation approach falls short in generating multi-dimensional actions. In this pa… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.