Skip to main content

Showing 1–50 of 728 results for author: Yang, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03164  [pdf, other

    cs.LG cs.GR

    A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

    Authors: Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

    Abstract: The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 15 pages, 10 figures

  2. arXiv:2409.02006  [pdf, other

    cs.CV

    Robust Fitting on a Gate Quantum Computer

    Authors: Frances Fengyi Yang, Michele Sasdelli, Tat-Jun Chin

    Abstract: Gate quantum computers generate significant interest due to their potential to solve certain difficult problems such as prime factorization in polynomial time. Computer vision researchers have long been attracted to the power of quantum computers. Robust fitting, which is fundamentally important to many computer vision pipelines, has recently been shown to be amenable to gate quantum computing. Th… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by the European Conference on Computer Vision 2024 (ECCV2024) as Oral. The paper is written for a computer vision audience who generally has minimal quantum physics background

  3. arXiv:2409.01315  [pdf, other

    physics.comp-ph cs.AI cs.LG

    Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems

    Authors: Daoqi Liu, Tao Shan, Maokun Li, Fan Yang, Shenheng Xu

    Abstract: In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask lea… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    MSC Class: 35Q61 ACM Class: I.2.6; G.1.8; G.1.3

  4. arXiv:2409.00926  [pdf, other

    cs.CV

    Towards Student Actions in Classroom Scenes: New Dataset and Baseline

    Authors: Zhuolin Tan, Chenqiang Gao, Anyong Qin, Ruixin Chen, Tiecheng Song, Feng Yang, Deyu Meng

    Abstract: Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  5. arXiv:2408.16517  [pdf, other

    cs.LG cs.AI

    Adaptive Variational Continual Learning via Task-Heuristic Modelling

    Authors: Fan Yang

    Abstract: Variational continual learning (VCL) is a turn-key learning algorithm that has state-of-the-art performance among the best continual learning models. In our work, we explore an extension of the generalized variational continual learning (GVCL) model, named AutoVCL, which combines task heuristics for informed learning and model optimization. We demonstrate that our model outperforms the standard GV… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 4 pages, 2 figures, 3 tables

  6. arXiv:2408.16131  [pdf, other

    cs.CL

    Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

    Authors: Funing Yang, Carolyn Jane Anderson

    Abstract: Several systems have been developed to extract information about characters to aid computational analysis of English literature. We propose character similarity grouping as a holistic evaluation task for these pipelines. We present AustenAlike, a benchmark suite of character similarities in Jane Austen's novels. Our benchmark draws on three notions of character similarity: a structurally defined n… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  7. arXiv:2408.15079  [pdf, other

    cs.CL cs.AI

    BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

    Authors: Guosheng Dong, Da Pan, Yiding Sun, Shusen Zhang, Zheng Liang, Xin Wu, Yanjun Shen, Fan Yang, Haoze Sun, Tianpeng Li, Mingan Lin, Jianhua Xu, Yufan Zhang, Xiaonan Nie, Lei Su, Bingning Wang, Wentao Zhang, Jiaxin Mao, Zenan Zhou, Weipeng Chen

    Abstract: The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 6 figures

  8. arXiv:2408.13229  [pdf, other

    cs.RO

    Multi-finger Manipulation via Trajectory Optimization with Differentiable Rolling and Geometric Constraints

    Authors: Fan Yang, Thomas Power, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson

    Abstract: Parameterizing finger rolling and finger-object contacts in a differentiable manner is important for formulating dexterous manipulation as a trajectory optimization problem. In contrast to previous methods which often assume simplified geometries of the robot and object or do not explicitly model finger rolling, we propose a method to further extend the capabilities of dexterous manipulation by ac… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  9. arXiv:2408.11032  [pdf, other

    cs.LG cs.CV physics.ao-ph

    Atmospheric Transport Modeling of CO$_2$ with Neural Networks

    Authors: Vitus Benson, Ana Bastos, Christian Reimers, Alexander J. Winkler, Fanny Yang, Markus Reichstein

    Abstract: Accurately describing the distribution of CO$_2$ in the atmosphere with atmospheric tracer transport models is essential for greenhouse gas monitoring and verification support systems to aid implementation of international climate agreements. Large deep neural networks are poised to revolutionize weather prediction, which requires 3D modeling of the atmosphere. While similar in this regard, atmosp… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Code: https://github.com/vitusbenson/carbonbench

  10. arXiv:2408.08972  [pdf, other

    cs.AI cs.IR cs.LG cs.MA

    ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs

    Authors: Debashis Gupta, Aditi Golder, Luis Fernendez, Miles Silman, Greg Lersen, Fan Yang, Bob Plemmons, Sarra Alqahtani, Paul Victor Pauca

    Abstract: Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (A… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  11. arXiv:2408.08592  [pdf, other

    cs.RO

    Case Study: Runtime Safety Verification of Neural Network Controlled System

    Authors: Frank Yang, Sinong Simon Zhan, Yixuan Wang, Chao Huang, Qi Zhu

    Abstract: Neural networks are increasingly used in safety-critical applications such as robotics and autonomous vehicles. However, the deployment of neural-network-controlled systems (NNCSs) raises significant safety concerns. Many recent advances overlook critical aspects of verifying control and ensuring safety in real-time scenarios. This paper presents a case study on using POLAR-Express, a state-of-the… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 5 figures, submitted to Runtime Verification 2024

  12. Physically Aware Synthesis Revisited: Guiding Technology Mapping with Primitive Logic Gate Placement

    Authors: Hongyang Pan, Cunqing Lan, Yiting Liu, Zhiang Wang, Li Shang, Xuan Zeng, Fan Yang, Keren Zhu

    Abstract: A typical VLSI design flow is divided into separated front-end logic synthesis and back-end physical design (PD) stages, which often require costly iterations between these stages to achieve design closure. Existing approaches face significant challenges, notably in utilizing feedback from physical metrics to better adapt and refine synthesis operations, and in establishing a unified and comprehen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures, 2 tables

    Journal ref: 2024 International Conference on Computer-Aided Design, New Jersey, NY, USA, Oct 2024

  13. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream ta… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  14. arXiv:2408.07422  [pdf, other

    cs.CV cs.AI

    LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image

    Authors: Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding

    Abstract: Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms. However, current 3D perception methods, particularly small models, struggle with processing logical reasoning, question-answering, and handling open scenario categories. On the other hand, generative multimodal large language models (MLLMs) excel in general… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.07259  [pdf, other

    cs.CV cs.AI

    GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models

    Authors: Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Fonts are integral to creative endeavors, design processes, and artistic productions. The appropriate selection of a font can significantly enhance artwork and endow advertisements with a higher level of expressivity. Despite the availability of numerous diverse font designs online, traditional retrieval-based methods for font selection are increasingly being supplanted by generation-based approac… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECAI2024

  16. arXiv:2408.06969  [pdf, ps, other

    cs.NI cs.LG

    IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

    Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

    Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.06195  [pdf, other

    cs.CL

    Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

    Authors: Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao Yang

    Abstract: This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  18. arXiv:2408.06003  [pdf, other

    cs.AR cs.LG

    LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

    Authors: Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

    Abstract: As large language model (LLM) inference demands ever-greater resources, there is a rapid growing trend of using low-bit weights to shrink memory usage and boost inference efficiency. However, these low-bit LLMs introduce the need for mixed-precision matrix multiplication (mpGEMM), which is a crucial yet under-explored operation that involves multiplying lower-precision weights with higher-precisio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  19. arXiv:2408.04539  [pdf, other

    cs.NE cs.HC

    ParetoTracker: Understanding Population Dynamics in Multi-objective Evolutionary Algorithms through Visual Analytics

    Authors: Zherui Zhang, Fan Yang, Ran Cheng, Yuxin Ma

    Abstract: Multi-objective evolutionary algorithms (MOEAs) have emerged as powerful tools for solving complex optimization problems characterized by multiple, often conflicting, objectives. While advancements have been made in computational efficiency as well as diversity and convergence of solutions, a critical challenge persists: the internal evolutionary mechanisms are opaque to human users. Drawing upon… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE VIS 2024 (will appear in IEEE TVCG)

  20. arXiv:2408.04259  [pdf, other

    cs.CL cs.AI

    EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

    Authors: Ziyuan Zhuang, Zhiyang Zhang, Sitao Cheng, Fangkai Yang, Jia Liu, Shujian Huang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Retrieval-augmented generation (RAG) methods encounter difficulties when addressing complex questions like multi-hop queries. While iterative retrieval methods improve performance by gathering additional information, current approaches often rely on multiple calls of large language models (LLMs). In this paper, we introduce EfficientRAG, an efficient retriever for multi-hop question answering. Eff… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 20 pages, 4 figures

  21. arXiv:2408.04102  [pdf, other

    cs.CV cs.AI

    ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

    Authors: William Y. Zhu, Keren Ye, Junjie Ke, Jiahui Yu, Leonidas Guibas, Peyman Milanfar, Feng Yang

    Abstract: Recognizing and disentangling visual attributes from objects is a foundation to many computer vision applications. While large vision language representations like CLIP had largely resolved the task of zero-shot object recognition, zero-shot visual attribute recognition remains a challenge because CLIP's contrastively-learned vision-language representation cannot effectively capture object-attribu… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  22. arXiv:2408.01639  [pdf, other

    eess.SY cs.LG

    Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

    Authors: Fengjun Yang, Nikolai Matni

    Abstract: We propose a reinforcement learning (RL)-based algorithm to jointly train (1) a trajectory planner and (2) a tracking controller in a layered control architecture. Our algorithm arises naturally from a rewrite of the underlying optimal control problem that lends itself to an actor-critic learning approach. By explicitly learning a \textit{dual} network to coordinate the interaction between the pla… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  23. arXiv:2408.01122  [pdf, other

    cs.CL

    CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

    Authors: Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBen… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

  24. arXiv:2408.00966  [pdf, other

    cs.CL

    Automatic Extraction of Relationships among Motivations, Emotions and Actions from Natural Language Texts

    Authors: Fei Yang

    Abstract: We propose a new graph-based framework to reveal relationships among motivations, emotions and actions explicitly given natural language texts. A directed acyclic graph is designed to describe human's nature. Nurture beliefs are incorporated to connect outside events and the human's nature graph. No annotation resources are required due to the power of large language models. Amazon Fine Foods Revi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  25. arXiv:2408.00343  [pdf, other

    cs.RO cs.CV cs.LG

    IN-Sight: Interactive Navigation through Sight

    Authors: Philipp Schoch, Fan Yang, Yuntao Ma, Stefan Leutenegger, Marco Hutter, Quentin Leboutet

    Abstract: Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing R… ▽ More

    Submitted 12 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: The 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

    ACM Class: I.2.10

  26. arXiv:2407.21500  [pdf, other

    cs.RO

    DIABLO: A 6-DoF Wheeled Bipedal Robot Composed Entirely of Direct-Drive Joints

    Authors: Dingchuan Liu, Fangfang Yang, Xuanhong Liao, Ximin Lyu

    Abstract: Wheeled bipedal robots offer the advantages of both wheeled and legged robots, combining the ability to traverse a wide range of terrains and environments with high efficiency. However, the conventional approach in existing wheeled bipedal robots involves motor-driven joints with high-ratio gearboxes. While this approach provides specific benefits, it also presents several challenges, including in… ▽ More

    Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: This paper has already been accepted by IROS 2024

  27. arXiv:2407.20105  [pdf, other

    cs.LG cs.CR

    Strong Copyright Protection for Language Models via Adaptive Model Fusion

    Authors: Javier Abad, Konstantin Donhauser, Francesco Pinto, Fanny Yang

    Abstract: The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  28. arXiv:2407.19467  [pdf, other

    cs.IR cs.LG

    Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights

    Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  29. arXiv:2407.15792  [pdf, other

    cs.LG cs.DS stat.ML

    Robust Mixture Learning when Outliers Overwhelm Small Groups

    Authors: Daniil Dmitriev, Rares-Darius Buhai, Stefan Tiegel, Alexander Wolters, Gleb Novikov, Amartya Sanyal, David Steurer, Fanny Yang

    Abstract: We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters - a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  30. arXiv:2407.14402  [pdf, other

    cs.AI cs.CL cs.DC cs.MA cs.SE

    The Vision of Autonomic Computing: Can LLMs Make It a Reality?

    Authors: Zhiyang Zhang, Fangkai Yang, Xiaoting Qin, Jue Zhang, Qingwei Lin, Gong Cheng, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: The Vision of Autonomic Computing (ACV), proposed over two decades ago, envisions computing systems that self-manage akin to biological organisms, adapting seamlessly to changing environments. Despite decades of research, achieving ACV remains challenging due to the dynamic and complex nature of modern computing systems. Recent advancements in Large Language Models (LLMs) offer promising solutions… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  31. arXiv:2407.14177  [pdf, other

    cs.CV

    EVLM: An Efficient Vision-Language Model for Visual Understanding

    Authors: Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, Jiahong Wu, Fan Yang, Size Li, Di Zhang

    Abstract: In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the language models alongside textual tokens. However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to sig… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  32. arXiv:2407.14020  [pdf, other

    q-bio.NC cs.LG

    NeuroBind: Towards Unified Multimodal Representations for Neural Signals

    Authors: Fengyu Yang, Chao Feng, Daniel Wang, Tianye Wang, Ziyao Zeng, Zhiyang Xu, Hyoungseob Park, Pengliang Ji, Hanbin Zhao, Yuanning Li, Alex Wong

    Abstract: Understanding neural activity and information representation is crucial for advancing knowledge of brain function and cognition. Neural activity, measured through techniques like electrophysiology and neuroimaging, reflects various aspects of information processing. Recent advances in deep neural networks offer new approaches to analyzing these signals using pre-trained models. However, challenges… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  33. arXiv:2407.13622  [pdf, other

    cs.LG cs.AI

    Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Authors: Ally Yalei Du, Lin F. Yang, Ruosong Wang

    Abstract: The recent work by Dong & Yang (2023) showed for misspecified sparse linear bandits, one can obtain an $O\left(ε\right)$-optimal policy using a polynomial number of samples when the sparsity is a constant, where $ε$ is the misspecification error. This result is in sharp contrast to misspecified linear bandits without sparsity, which require an exponential number of samples to get the same guarante… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 21 pages

  34. arXiv:2407.13133  [pdf, other

    cs.CV

    FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

    Authors: Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Zicheng Jiao, Hong Cheng

    Abstract: Detecting objects seamlessly blended into their surroundings represents a complex task for both human cognitive capabilities and advanced artificial intelligence algorithms. Currently, the majority of methodologies for detecting camouflaged objects mainly focus on utilizing discriminative models with various unique designs. However, it has been observed that generative models, such as Stable Diffu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 18 pages,7figures

  35. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  36. arXiv:2407.12002  [pdf, other

    cs.MM cs.CV

    A Multimodal Transformer for Live Streaming Highlight Prediction

    Authors: Jiaxin Deng, Shiyao Wang, Dong Shen, Liqin Zhao, Fan Yang, Guorui Zhou, Gaofeng Meng

    Abstract: Recently, live streaming platforms have gained immense popularity. Traditional video highlight detection mainly focuses on visual features and utilizes both past and future content for prediction. However, live streaming requires models to infer without future frames and process complex multimodal interactions, including images, audio and text comments. To address these issues, we propose a multim… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: Accepted at ICME 2024 as poster presentation. arXiv admin note: text overlap with arXiv:2306.14392

  37. arXiv:2407.11282  [pdf, other

    cs.CL

    Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

    Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

    Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More

    Submitted 19 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  38. arXiv:2407.10897  [pdf, other

    physics.optics cs.CV cs.LG

    Optical Diffusion Models for Image Generation

    Authors: Ilker Oguz, Niyazi Ulas Dinc, Mustafa Yildirim, Junjie Ke, Innfarn Yoo, Qifei Wang, Feng Yang, Christophe Moser, Demetri Psaltis

    Abstract: Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output, creating significant latency and energy consumption on digital electronic hardware such as GPUs. In this study, we demonstrate that the propagation of a light beam… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 6 figures

  39. arXiv:2407.06027  [pdf, other

    cs.CL

    PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

    Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.05986  [pdf, other

    cs.CV cs.LG

    KidSat: satellite imagery to map childhood poverty dataset and benchmark

    Authors: Makkunda Sharma, Fan Yang, Duy-Nhat Vo, Esra Suel, Swapnil Mishra, Samir Bhatt, Oliver Fiala, William Rudgard, Seth Flaxman

    Abstract: Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representat… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 15 pages, 1 figure

  41. arXiv:2407.02763  [pdf, other

    cs.CV

    ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers

    Authors: Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li

    Abstract: Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 28 pages,9 figures

  42. arXiv:2407.02081  [pdf, other

    cs.DC

    On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

    Authors: Zhengxian Lu, Fangyu Wang, Zhiwei Xu, Fei Yang, Tao Li

    Abstract: Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements, necessitating the reliance on advanced efficient distributed training methodologies. Prior research has delved into the performance bottlenecks associated with distributed… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  43. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  44. arXiv:2407.00574  [pdf, other

    cs.CV

    OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration

    Authors: Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Angela Yao

    Abstract: Accurate camera motion estimation is critical to estimate human motion in the global space. A standard and widely used method for estimating camera motion is Simultaneous Localization and Mapping (SLAM). However, SLAM only provides a trajectory up to an unknown scale factor. Different from previous attempts that optimize the scale factor, this paper presents Optimization-free Camera Motion Scale C… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures, 4 tables

  45. arXiv:2406.19905  [pdf, other

    cs.CV

    Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

    Authors: Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

    Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually em… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

  46. Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

    Authors: Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

    Abstract: Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates po… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  47. arXiv:2406.19251  [pdf, other

    cs.CL cs.AI

    AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

    Authors: Jia Fu, Xiaoting Qin, Fangkai Yang, Lu Wang, Jue Zhang, Qingwei Lin, Yubo Chen, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  48. arXiv:2406.18529  [pdf, ps, other

    cs.LG

    Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs

    Authors: Tian Tian, Lin F. Yang, Csaba Szepesvári

    Abstract: The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  49. arXiv:2406.18072  [pdf, ps, other

    stat.ML cs.LG

    Learning for Bandits under Action Erasures

    Authors: Osama Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

    Abstract: We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether th… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  50. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 August, 2024; v1 submitted 23 June, 2024; originally announced June 2024.