Skip to main content

Showing 1–50 of 592 results for author: Peng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.13842  [pdf, other

    cs.CV

    D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

    Authors: Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu

    Abstract: We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively ref… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2410.12793  [pdf, other

    cs.CY cs.AI cs.HC

    Environment Scan of Generative AI Infrastructure for Clinical and Translational Science

    Authors: Betina Idnay, Zihan Xu, William G. Adams, Mohammad Adibuzzaman, Nicholas R. Anderson, Neil Bahroos, Douglas S. Bell, Cody Bumgardner, Thomas Campion, Mario Castro, James J. Cimino, I. Glenn Cohen, David Dorr, Peter L Elkin, Jungwei W. Fan, Todd Ferris, David J. Foran, David Hanauer, Mike Hogarth, Kun Huang, Jayashree Kalpathy-Cramer, Manoj Kandpal, Niranjan S. Karnik, Avnish Katoch, Albert M. Lai , et al. (32 additional authors not shown)

    Abstract: This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With t… ▽ More

    Submitted 27 September, 2024; originally announced October 2024.

  3. arXiv:2410.08634  [pdf, other

    cs.LG cs.IT

    GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning

    Authors: Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang

    Abstract: Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted perso… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  4. arXiv:2410.07654  [pdf, other

    cs.IR

    Firzen: Firing Strict Cold-Start Items with Frozen Heterogeneous and Homogeneous Graphs for Recommendation

    Authors: Hulingxiao He, Xiangteng He, Yuxin Peng, Zifei Shan, Xin Su

    Abstract: Recommendation models utilizing unique identities (IDs) to represent distinct users and items have dominated the recommender systems literature for over a decade. Since multi-modal content of items (e.g., texts and images) and knowledge graphs (KGs) may reflect the interaction-related users' preferences and items' characteristics, they have been utilized as useful side information to further impro… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Accepted by ICDE 2024. The code is available at https://github.com/PKU-ICST-MIPL/Firzen_ICDE2024

  5. arXiv:2410.07528  [pdf, other

    cs.CV

    CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting

    Authors: Hulingxiao He, Yaqi Zhang, Jinglin Xu, Yuxin Peng

    Abstract: Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting. Inspired by the fact that humans count objects in high-resolution images by sequential scanning, we explore the potential of handling plant counting tasks via state space models (SSMs) for generating counting results. In this pap… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted by PRCV 2024

  6. arXiv:2410.05966  [pdf, other

    cs.LG cs.AI

    FLOPS: Forward Learning with OPtimal Sampling

    Authors: Tao Ren, Zishi Zhang, Jinyang Jiang, Guanghao Li, Zeliang Zhang, Mingqian Feng, Yijie Peng

    Abstract: Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling, which hinders the scalability of those algorithms. However, not all data poin… ▽ More

    Submitted 17 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  7. SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection

    Authors: Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng

    Abstract: Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local region… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 9 pages, 7 figures

    ACM Class: I.2.10

  8. arXiv:2410.02825  [pdf, other

    cs.CL cs.CR

    Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG

    Authors: Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert

    Abstract: This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM)… ▽ More

    Submitted 11 October, 2024; v1 submitted 30 September, 2024; originally announced October 2024.

  9. arXiv:2410.02450  [pdf, other

    cs.LG cs.DC cs.IT

    Personalized Federated Learning for Generative AI-Assisted Semantic Communications

    Authors: Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang

    Abstract: Semantic Communication (SC) focuses on transmitting only the semantic information rather than the raw data. This approach offers an efficient solution to the issue of spectrum resource utilization caused by the various intelligent applications on Mobile Users (MUs). Generative Artificial Intelligence (GAI) models have recently exhibited remarkable content generation and signal processing capabilit… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  10. arXiv:2410.01231  [pdf, other

    cs.DB

    Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search

    Authors: Shuo Yang, Jiadong Xie, Yingfan Liu, Jeffrey Xu Yu, Xiyue Gao, Qianru Wang, Yanguo Peng, Jiangtao Cui

    Abstract: Proximity graphs (PG) have gained increasing popularity as the state-of-the-art (SOTA) solutions to $k$-approximate nearest neighbor ($k$-ANN) search on high-dimensional data, which serves as a fundamental function in various fields, e.g. information retrieval and retrieval-augmented generation~(RAG). Although PG-based approaches have the best $k$-ANN search performance, their index construction c… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  11. arXiv:2410.00201  [pdf

    cs.CV cs.CL

    DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

    Authors: Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li, Jeffrey Bigham, Amy Pavel

    Abstract: Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with ta… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: ECCV 2024

  12. arXiv:2409.19599  [pdf, other

    cs.CV

    Gradient is All You Need: Gradient-Based Attention Fusion for Infrared Small Target Detection

    Authors: Chen Hu, Yian Huang, Kexuan Li, Luping Zhang, Yiming Zhu, Yufei Peng, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection (IRSTD) is widely used in civilian and military applications. However, IRSTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Gradient Network (GaNet), which aims to extract and preserve edge and gradient information of small targets. GaNet employs the Gradien… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  13. HybridFlow: A Flexible and Efficient RLHF Framework

    Authors: Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a… ▽ More

    Submitted 2 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

    ACM Class: I.2

  14. arXiv:2409.17484  [pdf, other

    cs.CY

    Crafting Synthetic Realities: Examining Visual Realism and Misinformation Potential of Photorealistic AI-Generated Images

    Authors: Qiyao Peng, Yingdan Lu, Yilang Peng, Sijia Qian, Xinyi Liu, Cuihua Shen

    Abstract: Advances in generative models have created Artificial Intelligence-Generated Images (AIGIs) nearly indistinguishable from real photographs. Leveraging a large corpus of 30,824 AIGIs collected from Instagram and Twitter, and combining quantitative content analysis with qualitative analysis, this study unpacks AI photorealism of AIGIs from four key dimensions, content, human, aesthetic, and producti… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  15. arXiv:2409.16563  [pdf, other

    cs.AI

    Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels

    Authors: Yishu Wei, Xindi Wang, Hanley Ong, Yiliang Zhou, Adam Flanders, George Shih, Yifan Peng

    Abstract: Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synth… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  16. arXiv:2409.14411  [pdf, other

    cs.RO

    Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation

    Authors: Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, Jian Tang

    Abstract: Diffusion Policy is a powerful technique tool for learning end-to-end visuomotor robot control. It is expected that Diffusion Policy possesses scalability, a key attribute for deep neural networks, typically suggesting that increasing model size would lead to enhanced performance. However, our observations indicate that Diffusion Policy in transformer architecture (\DP) struggles to scale effectiv… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  17. arXiv:2409.13700  [pdf, other

    cs.IR cs.AI cs.SI

    MAS4POI: a Multi-Agents Collaboration System for Next POI Recommendation

    Authors: Yuqian Wu, Yuhong Peng, Jiapeng Yu, Raymond S. T. Lee

    Abstract: LLM-based Multi-Agent Systems have potential benefits of complex decision-making tasks management across various domains but their applications in the next Point-of-Interest (POI) recommendation remain underexplored. This paper proposes a novel MAS4POI system designed to enhance next POI recommendations through multi-agent interactions. MAS4POI supports Large Language Models (LLMs) specializing in… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 14 pages, 4 figures

  18. arXiv:2409.13538  [pdf, other

    cs.CV cs.AI cs.LG

    First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

    Authors: Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang

    Abstract: In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  19. arXiv:2409.12514  [pdf, other

    cs.RO cs.CV

    TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

    Authors: Junjie Wen, Yichen Zhu, Jinming Li, Minjie Zhu, Kun Wu, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, Jian Tang

    Abstract: Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face significant challenges: they are slow during inference and require extensive pre-training on large amounts of robotic data, making real-world deployment difficult. In this paper, we introduce a new family of… ▽ More

    Submitted 27 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

    Comments: add more citations

  20. arXiv:2409.12370  [pdf, other

    eess.AS cs.CL cs.CV cs.SD

    Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

    Authors: Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe

    Abstract: Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across diverse video scenarios, presenting a significant challenge. In this paper, we introduce EVA, leveraging the mixture-of-Experts for audioVisual ASR to per… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 6 pages, 2 figures, accepted by IEEE Spoken Language Technology Workshop 2024

  21. arXiv:2409.09506  [pdf, other

    cs.SD cs.AI eess.AS

    ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

    Authors: Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe

    Abstract: We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, a… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Accepted to SLT 2024

  22. arXiv:2409.09216  [pdf, other

    eess.IV cs.CV

    Spectral U-Net: Enhancing Medical Image Segmentation via Spectral Decomposition

    Authors: Yaopeng Peng, Milan Sonka, Danny Z. Chen

    Abstract: This paper introduces Spectral U-Net, a novel deep learning network based on spectral decomposition, by exploiting Dual Tree Complex Wavelet Transform (DTCWT) for down-sampling and inverse Dual Tree Complex Wavelet Transform (iDTCWT) for up-sampling. We devise the corresponding Wave-Block and iWave-Block, integrated into the U-Net architecture, aiming at mitigating information loss during down-sam… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  23. arXiv:2409.09188  [pdf, other

    eess.IV cs.CV

    FiAt-Net: Detecting Fibroatheroma Plaque Cap in 3D Intravascular OCT Images

    Authors: Yaopeng Peng, Zhi Chen, Andreas Wahle, Tomas Kovarnik, Milan Sonk, Danny Z. Chen

    Abstract: The key manifestation of coronary artery disease (CAD) is development of fibroatheromatous plaque, the cap of which may rupture and subsequently lead to coronary artery blocking and heart attack. As such, quantitative analysis of coronary plaque, its plaque cap, and consequently the cap's likelihood to rupture are of critical importance when assessing a risk of cardiovascular events. This paper re… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  24. arXiv:2409.04056  [pdf, other

    cs.AI cs.CL cs.IR

    Refining Wikidata Taxonomy using Large Language Models

    Authors: Yiwen Peng, Thomas Bonald, Mehwish Alam

    Abstract: Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: ACM International Conference on Information and Knowledge Management, Oct 2024, Boise, Idaho, United States

  25. arXiv:2409.03121  [pdf, other

    quant-ph cs.MS math.OC

    QHDOPT: A Software for Nonlinear Optimization with Quantum Hamiltonian Descent

    Authors: Samuel Kushnir, Jiaqi Leng, Yuxiang Peng, Lei Fan, Xiaodi Wu

    Abstract: We develop an open-source, end-to-end software (named QHDOPT), which can solve nonlinear optimization problems using the quantum Hamiltonian descent (QHD) algorithm. QHDOPT offers an accessible interface and automatically maps tasks to various supported quantum backends (i.e., quantum hardware machines). These features enable users, even those without prior knowledge or experience in quantum compu… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 7 figures. The full repository is available at https://github.com/jiaqileng/QHDOPT

  26. arXiv:2409.01704  [pdf, other

    cs.CV

    General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

    Authors: Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

    Abstract: Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  27. arXiv:2409.01410  [pdf, other

    cs.LG stat.CO

    Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

    Authors: Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian, Anthony Quinn, Fadwa Idlahcen, Yiran Chen

    Abstract: Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather t… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  28. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  29. ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

    Authors: Minghang Zheng, Jiahua Zhang, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Visual grounding aims to localize the object referred to in an image based on a natural language query. Although progress has been made recently, accurately localizing target objects within multiple-instance distractions (multiple objects of the same category as the target) remains a significant challenge. Existing methods demonstrate a significant performance drop when there are multiple distract… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

    ACM Class: I.2

  30. arXiv:2408.16219  [pdf, other

    cs.CV

    Training-free Video Temporal Grounding using Large-scale Pre-trained Models

    Authors: Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Video temporal grounding aims to identify video segments within untrimmed videos that are most relevant to a given natural language query. Existing video temporal localization models rely on specific datasets for training and have high data collection costs, but they exhibit poor generalization capability under the across-dataset and out-of-distribution (OOD) settings. In this paper, we propose a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  31. arXiv:2408.11744  [pdf

    cs.AI cs.CV

    JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet

    Authors: Yujia Gu, Haofeng Li, Xinyu Fang, Zihan Peng, Yinan Peng

    Abstract: This study proposes a novel approach to extract stylistic features of Jiehua: the utilization of the Fine-tuned Stable Diffusion Model with ControlNet (FSDMC) to refine depiction techniques from artists' Jiehua. The training data for FSDMC is based on the opensource Jiehua artist's work collected from the Internet, which were subsequently manually constructed in the format of (Original Image, Cann… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: accepted by ICCSMT 2024

  32. arXiv:2408.08561  [pdf

    cs.CV

    A New Chinese Landscape Paintings Generation Model based on Stable Diffusion using DreamBooth

    Authors: Yujia Gu, Xinyu Fang, Xueyuan Deng, Zihan Peng, Yinan Peng

    Abstract: This study mainly introduces a method combining the Stable Diffusion Model (SDM) and Parameter-Efficient Fine-Tuning method for generating Chinese Landscape Paintings. This training process is accelerated by combining LoRA with pre-trained SDM and DreamBooth with pre-trained SDM, respectively. On the Chinese Landscape Paintings Internet dataset used in this paper, this study finds that SDM combine… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: accepted by AHPCAI

  33. arXiv:2408.08407  [pdf, other

    physics.optics cs.ET

    Photonic KAN: a Kolmogorov-Arnold network inspired efficient photonic neuromorphic architecture

    Authors: Yiwei Peng, Sean Hooten, Xinling Yu, Thomas Van Vaerenbergh, Yuan Yuan, Xian Xiao, Bassem Tossoun, Stanley Cheung, Marco Fiorentino, Raymond Beausoleil

    Abstract: Kolmogorov-Arnold Networks (KAN) models were recently proposed and claimed to provide improved parameter scaling and interpretability compared to conventional multilayer perceptron (MLP) models. Inspired by the KAN architecture, we propose the Photonic KAN -- an integrated all-optical neuromorphic platform leveraging highly parametric optical nonlinear transfer functions along KAN edges. In this w… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 1 table

  34. arXiv:2408.06303  [pdf, other

    cs.CL cs.CV

    Long-Form Answers to Visual Questions from Blind and Low Vision People

    Authors: Mina Huh, Fangyuan Xu, Yi-Hao Peng, Chongyan Chen, Hansika Murugu, Danna Gurari, Eunsol Choi, Amy Pavel

    Abstract: Vision language models can now generate long-form answers to questions about images - long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate fu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  35. arXiv:2408.05543  [pdf, other

    cs.CV

    PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive Replacement

    Authors: Delong Zhang, Yi-Xing Peng, Xiao-Ming Wu, Ancong Wu, Wei-Shi Zheng

    Abstract: Online person re-identification services face privacy breaches from potential data leakage and recovery attacks, exposing cloud-stored images to malicious attackers and triggering public concern. The privacy protection of pedestrian images is crucial. Previous privacy-preserving person re-identification methods are unable to resist recovery attacks and compromise accuracy. In this paper, we propos… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: accepted by ACMMM24

  36. arXiv:2408.03616  [pdf, other

    eess.IV cs.CV

    Distillation Learning Guided by Image Reconstruction for One-Shot Medical Image Segmentation

    Authors: Feng Zhou, Yanjie Zhou, Longjie Wang, Yun Peng, David E. Carlson, Liyun Tu

    Abstract: Traditional one-shot medical image segmentation (MIS) methods use registration networks to propagate labels from a reference atlas or rely on comprehensive sampling strategies to generate synthetic labeled data for training. However, these methods often struggle with registration errors and low-quality synthetic images, leading to poor performance and generalization. To overcome this, we introduce… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  37. arXiv:2408.03505  [pdf, other

    cs.CL cs.AI cs.DC

    Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

    Authors: Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

    Abstract: Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation. Nonetheless, existing systems are inefficient to train MLLMs due to substantial GPU bubbles caused by the he… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  38. arXiv:2408.02484  [pdf, other

    cs.CV

    Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection

    Authors: Ting Lei, Shaofeng Yin, Yuxin Peng, Yang Liu

    Abstract: Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic due to its capability to detect HOIs beyond a predefined set of categories. This task entails not only identifying the interactiveness of human-object pairs and localizing them but also recognizing both seen and unseen interaction categories. In this paper, we introduce a novel framework for zero-shot HOI detection… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  39. arXiv:2408.01430  [pdf, other

    cs.CV cs.AI

    SUSTechGAN: Image Generation for Object Recognition in Adverse Conditions of Autonomous Driving

    Authors: Gongjin Lan, Yang Peng, Qi Hao, Chengzhong Xu

    Abstract: Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures

  40. Deep progressive reinforcement learning-based flexible resource scheduling framework for IRS and UAV-assisted MEC system

    Authors: Li Dong, Feibo Jiang, Minjie Wang, Yubo Peng, Xiaolong Li

    Abstract: The intelligent reflection surface (IRS) and unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is widely used in temporary and emergency scenarios. Our goal is to minimize the energy consumption of the MEC system by jointly optimizing UAV locations, IRS phase shift, task offloading, and resource allocation with a variable number of UAVs. To this end, we propose a Flexible R… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems,2024

  41. arXiv:2408.00588  [pdf, other

    cs.CL cs.AI

    Closing the gap between open-source and commercial large language models for medical evidence summarization

    Authors: Gongbo Zhang, Qiao Jin, Yiliang Zhou, Song Wang, Betina R. Idnay, Yiming Luo, Elizabeth Park, Jordan G. Nestor, Matthew E. Spotnitz, Ali Soroush, Thomas Campion, Zhiyong Lu, Chunhua Weng, Yifan Peng

    Abstract: Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones. In this stud… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  42. arXiv:2407.21416  [pdf, other

    cs.CV cs.RO

    VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

    Authors: Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong

    Abstract: Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performan… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  43. arXiv:2407.20143  [pdf, other

    cs.AI

    ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

    Authors: Borui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu

    Abstract: Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallelism configurations. In addition, saved checkpoints are dispatched to evaluation tasks or transferred across different training stages (e.g., from pre-training to post-training). All these scenarios requi… ▽ More

    Submitted 10 October, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  44. arXiv:2407.19728  [pdf, other

    cs.HC cs.CY

    PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality

    Authors: Xintong Zhang, Di Lu, Huiqi Hu, Nan Jiang, Xianhao Yu, Jinan Xu, Yujia Peng, Qing Li, Wenjuan Han

    Abstract: Human cognition significantly influences expressed behavior and is intrinsically tied to authentic personality traits. Personality assessment plays a pivotal role in various fields, including psychology, education, social media, etc. However, traditional self-report questionnaires can only provide data based on what individuals are willing and able to disclose, thereby lacking objective. Moreover,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to COGSCI 2024

  45. arXiv:2407.17126  [pdf

    cs.CL cs.AI

    SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

    Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

    Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  46. arXiv:2407.16639  [pdf, other

    cs.SD eess.AS

    Distortion Recovery: A Two-Stage Method for Guitar Effect Removal

    Authors: Ying-Shuo Lee, Yueh-Po Peng, Jui-Te Wu, Ming Cheng, Li Su, Yi-Hsuan Yang

    Abstract: Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: DAFx 2024

  47. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  48. arXiv:2407.09760  [pdf, other

    cs.CV cs.AI

    ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report

    Authors: Yixiao Yuan, Yingzhe Peng

    Abstract: The Visual-Dialog Based Emotion Explanation Generation Challenge focuses on generating emotion explanations through visual-dialog interactions in art discussions. Our approach combines state-of-the-art multi-modal models, including Language Model (LM) and Large Vision Language Model (LVLM), to achieve superior performance. By leveraging these models, we outperform existing benchmarks, securing the… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  49. arXiv:2407.09059  [pdf, other

    cs.CV

    Domain-adaptive Video Deblurring via Test-time Blurring

    Authors: Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

    Abstract: Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adapta… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  50. arXiv:2407.07468  [pdf, other

    cs.CV

    Rethinking Few-shot Class-incremental Learning: Learning from Yourself

    Authors: Yu-Ming Tang, Yi-Xing Peng, Jingke Meng, Wei-Shi Zheng

    Abstract: Few-shot class-incremental learning (FSCIL) aims to learn sequential classes with limited samples in a few-shot fashion. Inherited from the classical class-incremental learning setting, the popular benchmark of FSCIL uses averaged accuracy (aAcc) and last-task averaged accuracy (lAcc) as the evaluation metrics. However, we reveal that such evaluation metrics may not provide adequate emphasis on th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024