Skip to main content

Showing 1–50 of 408 results for author: Gao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03439  [pdf, other

    cs.RO cs.AI cs.PL

    KiloBot: A Programming Language for Deploying Perception-Guided Industrial Manipulators at Scale

    Authors: Wei Gao, Jingqiang Wang, Xinv Zhu, Jun Zhong, Yue Shen, Youshuang Ding

    Abstract: We would like industrial robots to handle unstructured environments with cameras and perception pipelines. In contrast to traditional industrial robots that replay offline-crafted trajectories, online behavior planning is required for these perception-guided industrial applications. Aside from perception and planning algorithms, deploying perception-guided manipulators also requires substantial ef… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  3. arXiv:2408.12158  [pdf, other

    cs.CE cs.CY

    Could Bibliometrics Reveal Top Science and Technology Achievements and Researchers? The Case for Evaluatology-based Science and Technology Evaluation

    Authors: Guoxin Kang, Wanling Gao, Lei Wang, Chunjie Luo, Hainan Ye, Qian He, Shaopeng Dai, Jianfeng Zhan

    Abstract: By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition, encompassing e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, and 2 tables

  4. arXiv:2408.12077  [pdf, other

    eess.SP cs.CV cs.LG

    Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

    Authors: Xiaopeng Yang, Weicheng Gao, Xiaodong Qu, Zeyu Ma, Hao Zhang

    Abstract: With the help of micro-Doppler signature, ultra-wideband (UWB) through-the-wall radar (TWR) enables the reconstruction of range and velocity information of limb nodes to accurately identify indoor human activities. However, existing methods are usually trained and validated directly using range-time maps (RTM) and Doppler-time maps (DTM), which have high feature redundancy and poor generalization… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 17 pages, 14 figures, 7 tables, in IEEE Transactions on Microwave Theory and Techniques, 2024

    MSC Class: 94 ACM Class: I.5.1

  5. arXiv:2408.12063  [pdf, other

    stat.ML cs.AI cs.LG physics.ao-ph

    A Deconfounding Approach to Climate Model Bias Correction

    Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

    Abstract: Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  6. arXiv:2408.11492  [pdf, other

    cs.AI

    Estimating Peer Direct and Indirect Effects in Observational Network Data

    Authors: Xiaojing Du, Jiuyong Li, Debo Cheng, Lin Liu, Wentao Gao, Xiongren Chen

    Abstract: Estimating causal effects is crucial for decision-makers in many applications, but it is particularly challenging with observational network data due to peer interactions. Many algorithms have been proposed to estimate causal effects involving network data, particularly peer effects, but they often overlook the variety of peer effects. To address this issue, we propose a general setting which cons… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: AAAI

  7. arXiv:2408.11481  [pdf, other

    cs.CV

    E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment

    Authors: Shangkun Sun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, Wei Gao

    Abstract: Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce E-Bench, a benchmark suite tailored to the assessment of text-driven video editing. Th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2408.09676  [pdf, other

    cs.CV

    Image-based Freeform Handwriting Authentication with Energy-oriented Self-Supervised Learning

    Authors: Jingyao Wang, Luntian Mou, Changwen Zheng, Wen Gao

    Abstract: Freeform handwriting authentication verifies a person's identity from their writing style and habits in messy handwriting data. This technique has gained widespread attention in recent years as a valuable tool for various fields, e.g., fraud prevention and cultural heritage protection. However, it still remains a challenging task in reality due to three reasons: (i) severe damage, (ii) complex hig… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by TMM

  9. arXiv:2408.08682  [pdf, other

    cs.AI cs.CL cs.CV

    LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

    Authors: Yuqi Ye, Wei Gao

    Abstract: The key to effective point cloud compression is to obtain a robust context model consistent with complex 3D data structures. Recently, the advancement of large language models (LLMs) has highlighted their capabilities not only as powerful generators for in-context learning and generation but also as effective compressors. These dual attributes of LLMs make them particularly well-suited to meet the… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  10. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  11. arXiv:2408.00275  [pdf, other

    cs.RO

    A Reinforcement Learning Based Motion Planner for Quadrotor Autonomous Flight in Dense Environment

    Authors: Zhaohong Liu, Wenxuan Gao, Yinshuai Sun, Peng Dong

    Abstract: Quadrotor motion planning is critical for autonomous flight in complex environments, such as rescue operations. Traditional methods often employ trajectory generation optimization and passive time allocation strategies, which can limit the exploitation of the quadrotor's dynamic capabilities and introduce delays and inaccuracies. To address these challenges, we propose a novel motion planning fram… ▽ More

    Submitted 5 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  12. arXiv:2407.20573  [pdf, other

    cs.DC

    Federated Learning as a Service for Hierarchical Edge Networks with Heterogeneous Models

    Authors: Wentao Gao, Omid Tavallaie, Shuaijun Chen, Albert Zomaya

    Abstract: Federated learning (FL) is a distributed Machine Learning (ML) framework that is capable of training a new global model by aggregating clients' locally trained models without sharing users' original data. Federated learning as a service (FLaaS) offers a privacy-preserving approach for training machine learning models on devices with various computational resources. Most proposed FL-based methods t… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  13. arXiv:2407.19633  [pdf, other

    cs.AI

    OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale

    Authors: Ali AhmadiTeshnizi, Wenzhi Gao, Herman Brunborg, Shayan Talaei, Madeleine Udell

    Abstract: Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce a Large Language Model (LLM)-based… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This paper documents OptiMUS-0.3, improving on OptiMUS-0.1 (arXiv:2310.06116) and OptiMUS-0.2 (arXiv:2402.10172). arXiv admin note: text overlap with arXiv:2402.10172

  14. arXiv:2407.17078  [pdf, other

    cs.RO

    Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments

    Authors: Wei Gao, Zezhou Sun, Mingle Zhao, Cheng-Zhong Xu, Hui Kong

    Abstract: The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of t… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  15. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  16. arXiv:2407.15138  [pdf, other

    cs.CV

    D$^4$M: Dataset Distillation via Disentangled Diffusion Model

    Authors: Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, Bowen Tang

    Abstract: Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performa… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR 2024

  17. arXiv:2407.14774  [pdf, other

    cs.CV cs.AI cs.GR

    Intelligent Artistic Typography: A Comprehensive Review of Artistic Text Design and Generation

    Authors: Yuhang Bai, Zichuan Huang, Wenshuo Gao, Shuai Yang, Jiaying Liu

    Abstract: Artistic text generation aims to amplify the aesthetic qualities of text while maintaining readability. It can make the text more attractive and better convey its expression, thus enjoying a wide range of application scenarios such as social media display, consumer electronics, fashion, and graphic design. Artistic text generation includes artistic text stylization and semantic typography. Artisti… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: GitHub Page: https://github.com/williamyang1991/Awesome-Artistic-Typography/

  18. arXiv:2407.10975  [pdf

    cs.OH cs.AI cs.CL

    Stream State-tying for Sign Language Recognition

    Authors: Jiyong Ma, Wen Gao, Chunli Wang

    Abstract: In this paper, a novel approach to sign language recognition based on state tying in each of data streams is presented. In this framework, it is assumed that hand gesture signal is represented in terms of six synchronous data streams, i.e., the left/right hand position, left/right hand orientation and left/right handshape. This approach offers a very accurate representation of the sign space and k… ▽ More

    Submitted 21 April, 2024; originally announced July 2024.

  19. arXiv:2407.10157  [pdf, other

    eess.IV cs.CV

    SACNet: A Spatially Adaptive Convolution Network for 2D Multi-organ Medical Segmentation

    Authors: Lin Zhang, Wenbo Gao, Jie Yi, Yunyun Yang

    Abstract: Multi-organ segmentation in medical image analysis is crucial for diagnosis and treatment planning. However, many factors complicate the task, including variability in different target categories and interference from complex backgrounds. In this paper, we utilize the knowledge of Deformable Convolution V3 (DCNv3) and multi-object segmentation to optimize our Spatially Adaptive Convolution Network… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  20. arXiv:2407.08744  [pdf, ps, other

    cs.NE cs.AI cs.LG

    Toward Efficient Deep Spiking Neuron Networks:A Survey On Compression

    Authors: Hui Xie, Ge Yang, Wenjuan Gao

    Abstract: With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or… ▽ More

    Submitted 3 June, 2024; originally announced July 2024.

  21. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  22. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. Large language mo… ▽ More

    Submitted 20 August, 2024; v1 submitted 23 June, 2024; originally announced July 2024.

  23. arXiv:2407.06886  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

    Authors: Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, Liang Lin

    Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More

    Submitted 25 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: The first comprehensive review of Embodied AI in the era of MLMs, 39 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

  24. arXiv:2407.06334  [pdf, other

    cs.AI q-bio.QM

    Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

    Authors: Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley

    Abstract: Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages main, 4 figures

  25. arXiv:2407.05458  [pdf, other

    cs.AI

    A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions

    Authors: Fei Wang, Weibo Gao, Qi Liu, Jiatong Li, Guanhao Zhao, Zheng Zhang, Zhenya Huang, Mengxiao Zhu, Shijin Wang, Wei Tong, Enhong Chen

    Abstract: Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery. It has been applied to a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness of cognitive status, it can serve as the basis for personalized services such as well-designed medical… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  26. arXiv:2407.03978  [pdf, other

    cs.CL cs.AI

    Benchmarking Complex Instruction-Following with Multiple Constraints Composition

    Authors: Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning Wang, Minlie Huang

    Abstract: Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on m… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 20 pages, 7 figures

  27. arXiv:2407.03122  [pdf, other

    cs.RO

    IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

    Authors: Wei Gao, Bo Ai, Joel Loo, Vinay, David Hsu

    Abstract: This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  28. arXiv:2407.00905  [pdf, other

    cs.CV

    Learning Robust 3D Representation from CLIP via Dual Denoising

    Authors: Shuqing Luo, Bowen Qu, Wei Gao

    Abstract: In this paper, we explore a critical yet under-investigated issue: how to learn robust and well-generalized 3D representation from pre-trained vision language models such as CLIP. Previous works have demonstrated that cross-modal distillation can provide rich and useful knowledge for 3D data. However, like most deep learning models, the resultant 3D learning network is still vulnerable to adversar… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  29. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  30. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  31. arXiv:2406.14194  [pdf, other

    cs.CV cs.AI

    VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

    Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

    Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  32. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.09136  [pdf, other

    cs.CL cs.LG

    Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

    Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  34. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  35. arXiv:2406.06562  [pdf, other

    cs.CL cs.AI

    Achieving Sparse Activation in Small Language Models

    Authors: Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

    Abstract: Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages

  36. arXiv:2406.04628  [pdf, other

    cs.CE q-bio.QM

    Projecting Molecules into Synthesizable Chemical Spaces

    Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

    Abstract: Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in prac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  37. arXiv:2406.02143  [pdf, other

    cs.CL

    Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

    Authors: Ruichao Yang, Wei Gao, Jing Ma, Hongzhan Lin, Bo Wang

    Abstract: Learning multi-task models for jointly detecting stance and verifying rumors poses challenges due to the need for training data of stance at post level and rumor veracity at claim level, which are difficult to obtain. To address this issue, we leverage large language models (LLMs) as the foundation annotators for the joint stance detection (SD) and rumor verification (RV) tasks, dubbed as JSDRV. W… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (Findings)

  38. arXiv:2405.21074  [pdf, other

    cs.CV

    Latent Intrinsics Emerge from Training to Relight

    Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David. A. Forsyth, Anand Bhattad

    Abstract: Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrins… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  39. arXiv:2405.17472  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

    Authors: Kai Huang, Wei Gao

    Abstract: Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such unconstrained adaptability has also been utilized for illegal purposes, such as forging public figures' portraits and duplicating copyrighted artworks. Most existing work focuses on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffu… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 18 pages

  40. arXiv:2405.12491  [pdf, other

    cs.SE

    Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

    Authors: Xu Wen, Wanling Gao, Lei Wang, Jianfeng Zhan

    Abstract: The rapid development of domain-specific frameworks has presented us with a significant challenge: The current approach of implementing solutions on a case-by-case basis incurs a theoretical complexity of O(M*N), thereby increasing the cost of porting applications to different hardware platforms. To address these challenges, we propose a systematic methodology that effectively bridges the gap betw… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15pages, 8 figures

  41. arXiv:2405.09054  [pdf, other

    cs.CV

    Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

    Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

    Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  42. arXiv:2405.08403  [pdf, other

    cs.LG

    TFWT: Tabular Feature Weighting with Transformer

    Authors: Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, Kunpeng Liu

    Abstract: In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it lead… ▽ More

    Submitted 17 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  43. arXiv:2405.07447  [pdf

    cs.HC

    From traces to measures: Large language models as a tool for psychological measurement from text

    Authors: Joseph J. P. Simons, Wong Liang Ze, Prasanta Bhattacharya, Brandon Siyuan Loh, Wei Gao

    Abstract: Digital trace data provide potentially valuable resources for understanding human behaviour, but their value has been limited by issues of unclear measurement. The growth of large language models provides an opportunity to address this limitation in the case of text data. Specifically, recognizing cases where their responses are a form of psychological measurement (the use of observable indicators… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 12 pages, 2 figures, 1 table

  44. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  45. GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

    Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we p… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TCSVT

  46. arXiv:2404.17283  [pdf, other

    cs.CL

    Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

    Authors: Xuan Zhang, Wei Gao

    Abstract: Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-bo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by COLING 2024

  47. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.15892  [pdf, other

    cs.CG

    Filling holes in LoD2 building models

    Authors: Weixiao Gao, Ravi Peters, Hugo Ledoux, Jantien Stoter

    Abstract: This paper presents a new algorithm for filling holes in Level of Detail 2 (LoD2) building mesh models, addressing the challenges posed by geometric inaccuracies and topological errors. Unlike traditional methods that often alter the original geometric structure or impose stringent input requirements, our approach preserves the integrity of the original model while effectively managing a range of… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  49. arXiv:2404.15644  [pdf, other

    cs.CV

    Building-PCC: Building Point Cloud Completion Benchmarks

    Authors: Weixiao Gao, Ravi Peters, Jantien Stoter

    Abstract: With the rapid advancement of 3D sensing technologies, obtaining 3D shape information of objects has become increasingly convenient. Lidar technology, with its capability to accurately capture the 3D information of objects at long distances, has been widely applied in the collection of 3D data in urban scenes. However, the collected point cloud data often exhibit incompleteness due to factors such… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  50. arXiv:2404.15297  [pdf, ps, other

    eess.SP cs.IT cs.LG

    Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

    Authors: Ke Yang, Rongen Dong, Wei Gao, Feng Shu, Weiping Shi, Yan Wang, Xuehui Wang, Jiangzhou Wang

    Abstract: Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its… ▽ More

    Submitted 28 April, 2024; v1 submitted 26 March, 2024; originally announced April 2024.