Skip to main content

Showing 1–50 of 786 results for author: Han, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.08065  [pdf, other

    cond-mat.supr-con cond-mat.mtrl-sci cs.AI physics.comp-ph

    AI-accelerated discovery of high critical temperature superconductors

    Authors: Xiao-Qi Han, Zhenfeng Ouyang, Peng-Jie Guo, Hao Sun, Ze-Feng Gao, Zhong-Yi Lu

    Abstract: The discovery of new superconducting materials, particularly those exhibiting high critical temperature ($T_c$), has been a vibrant area of study within the field of condensed matter physics. Conventional approaches primarily rely on physical intuition to search for potential superconductors within the existing databases. However, the known materials only scratch the surface of the extensive array… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 11 pages, 7 figures, 4 tables

  2. arXiv:2409.07508  [pdf, other

    cs.CR cs.OS

    SafeBPF: Hardware-assisted Defense-in-depth for eBPF Kernel Extensions

    Authors: Soo Yee Lim, Tanya Prasad, Xueyuan Han, Thomas Pasquier

    Abstract: The eBPF framework enables execution of user-provided code in the Linux kernel. In the last few years, a large ecosystem of cloud services has leveraged eBPF to enhance container security, system observability, and network management. Meanwhile, incessant discoveries of memory safety vulnerabilities have left the systems community with no choice but to disallow unprivileged eBPF programs, which un… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 15 pages, 9 figures

  3. arXiv:2409.05250  [pdf, other

    cs.CV

    MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference

    Authors: Jiancheng Huang, Yu Gao, Zequn Jie, Yujie Zhong, Xintong Han, Lin Ma

    Abstract: In this paper, we introduce MRStyle, a comprehensive framework that enables color style transfer using multi-modality reference, including image and text. To achieve a unified style feature space for both modalities, we first develop a neural network called IRStyle, which generates stylized 3D lookup tables for image reference. This is accomplished by integrating an interaction dual-mapping networ… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  4. arXiv:2409.05137  [pdf, other

    cs.CL cs.CV

    READoc: A Unified Benchmark for Realistic Document Structured Extraction

    Authors: Zichao Li, Aizier Abulaiti, Yaojie Lu, Xuanang Chen, Jia Zheng, Hongyu Lin, Xianpei Han, Le Sun

    Abstract: Document Structured Extraction (DSE) aims to extract structured content from raw documents. Despite the emergence of numerous DSE systems, their unified evaluation remains inadequate, significantly hindering the field's advancement. This problem is largely attributed to existing benchmark paradigms, which exhibit fragmented and localized characteristics. To address these limitations and offer a th… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  5. arXiv:2409.04249  [pdf, other

    cs.DC cs.AI cs.LG

    Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices

    Authors: Xueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma, Rajkumar Buyya

    Abstract: The application of Transformer-based large models has achieved numerous success in recent years. However, the exponential growth in the parameters of large models introduces formidable memory challenge for edge deployment. Prior works to address this challenge mainly focus on optimizing the model structure and adopting memory swapping methods. However, the former reduces the inference accuracy, an… ▽ More

    Submitted 9 September, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: Accepted by the 42nd IEEE International Conference on Computer Design (ICCD 2024)

  6. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  7. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  8. arXiv:2409.01641  [pdf, other

    cs.CV

    Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement

    Authors: Kun Zhou, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai, Zhonghang Liu, Xiaoguang Han, Jiangbo Lu

    Abstract: Previous low-light image enhancement (LLIE) approaches, while employing frequency decomposition techniques to address the intertwined challenges of low frequency (e.g., illumination recovery) and high frequency (e.g., noise reduction), primarily focused on the development of dedicated and complex networks to achieve improved performance. In contrast, we reveal that an advanced disentanglement para… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV 2024, Github \url{https://github.com/redrock303/ADF-LLIE}

  9. arXiv:2409.01011  [pdf, other

    cs.CL cs.CV

    Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts

    Authors: Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China. Considering the complex hierarchical structure of ancient Chinese scripts, where a single character may be a combination of multiple sub-cha… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 12 pages, 3 figures

  10. arXiv:2408.16659  [pdf, other

    physics.med-ph cs.GR

    Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures

    Authors: Xingjian Han, Yu Jiang, Weiming Wang, Guoxin Fang, Simeon Gill, Zhiqiang Zhang, Shengfa Wang, Jun Saito, Deepak Kumar, Zhongxuan Luo, Emily Whiting, Charlie C. L. Wang

    Abstract: Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by opt… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. arXiv:2408.16326  [pdf, other

    cs.CL

    Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

    Authors: Xin Zheng, Jie Lou, Boxi Cao, Xueru Wen, Yuqiu Ji, Hongyu Lin, Yaojie Lu, Xianpei Han, Debing Zhang, Le Sun

    Abstract: Self-critic has become an important mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts without further training, which tend to be over-simplified, leading to limited accuracy.Moreover, there is a lack of in-depth investigation of the relationship between LLM's ability to criticism and its task-solving performance.To address these iss… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  12. arXiv:2408.15966  [pdf, other

    cs.CV cs.AI cs.CL

    More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Jinfeng Xu, Yixue Hao, Long Hu, Min Chen

    Abstract: Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understan… ▽ More

    Submitted 5 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  13. arXiv:2408.15708  [pdf, other

    cs.CV

    Towards Realistic Example-based Modeling via 3D Gaussian Stitching

    Authors: Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin

    Abstract: Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appeara… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  14. arXiv:2408.15643  [pdf, other

    cs.CV

    RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

    Authors: Zhaoxuan Wang, Xu Han, Hongxin Liu, Xianzhi Li

    Abstract: The rotation robustness property has drawn much attention to point cloud analysis, whereas it still poses a critical challenge in 3D object detection. When subjected to arbitrary rotation, most existing detectors fail to produce expected outputs due to the poor rotation robustness. In this paper, we present RIDE, a pioneering exploration of Rotation-Invariance for the 3D LiDAR-point-based object D… ▽ More

    Submitted 28 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  15. arXiv:2408.14122  [pdf, other

    cs.CR

    FG-SAT: Efficient Flow Graph for Encrypted Traffic Classification under Environment Shifts

    Authors: Susu Cui, Xueying Han, Dongqi Han, Zhiliang Wang, Weihang Wang, Yun Li, Bo Jiang, Baoxu Liu, Zhigang Lu

    Abstract: Encrypted traffic classification plays a critical role in network security and management. Currently, mining deep patterns from side-channel contents and plaintext fields through neural networks is a major solution. However, existing methods have two major limitations: (1) They fail to recognize the critical link between transport layer mechanisms and applications, missing the opportunity to learn… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

  16. arXiv:2408.13204  [pdf, other

    cs.AI cs.SE

    DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation

    Authors: Qiming Zhu, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung

    Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate the capabilities of Large Language Models (LLMs), providing insights into their strengths and weaknesses. However, current benchmarks primarily exercise LLMs' capability on common coding tasks (e.g., bubble sort, greatest common divisor), leaving domain-specific coding tasks (e.g., computation, system, cryptography) unexplored. To fi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  17. arXiv:2408.13001  [pdf, other

    cs.AI

    CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution

    Authors: Ruiyang Xu, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Shing-Chi Cheung, Le Sun

    Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate Large Language Models' (LLMs) coding capabilities. However, there is an unignorable programming language bias in existing code benchmarks -- over 95% code generation benchmarks are dominated by Python, leaving the LLMs' capabilities in other programming languages such as Java and C/C++ unknown. Moreover, coding task bias is also cruc… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 13pages

  18. arXiv:2408.12817  [pdf, other

    cs.LG physics.chem-ph

    Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage

    Authors: Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, Wen Yan

    Abstract: A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: ByteFF, a machine learning parametrized MMFF

  19. arXiv:2408.11396  [pdf, other

    cs.CL

    MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

    Authors: Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen

    Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, ind… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  20. arXiv:2408.11312  [pdf, other

    cs.CV cs.AI

    Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework

    Authors: Xiao Han, Chen Zhu, Xiangyu Zhao, Hengshu Zhu

    Abstract: Visual geo-localization demands in-depth knowledge and advanced reasoning skills to associate images with real-world geographic locations precisely. In general, traditional methods based on data-matching are hindered by the impracticality of storing adequate visual records of global landmarks. Recently, Large Vision-Language Models (LVLMs) have demonstrated the capability of geo-localization throu… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.11306  [pdf, other

    cs.LG cs.AI

    KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?

    Authors: Xiao Han, Xinfeng Zhang, Yiling Wu, Zhenduo Zhang, Zhe Wu

    Abstract: Time series forecasting is a crucial task that predicts the future values of variables based on historical data. Time series forecasting techniques have been developing in parallel with the machine learning community, from early statistical learning methods to current deep learning methods. Although existing methods have made significant progress, they still suffer from two challenges. The mathema… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  22. arXiv:2408.10663  [pdf, other

    cs.CL

    REInstruct: Building Instruction Data from Unlabeled Corpus

    Authors: Shu Chen, Xinyan Guan, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

    Abstract: Manually annotating instruction data for large language models is difficult, costly, and hard to scale. Meanwhile, current automatic annotation methods typically rely on distilling synthetic data from proprietary LLMs, which not only limits the upper bound of the quality of the instruction data but also raises potential copyright issues. In this paper, we propose REInstruct, a simple and scalable… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL2024 Findings

  23. arXiv:2408.10286  [pdf, other

    cs.LG cs.AI

    GPT-Augmented Reinforcement Learning with Intelligent Control for Vehicle Dispatching

    Authors: Xiao Han, Zijian Zhang, Xiangyu Zhao, Guojiang Shen, Xiangjie Kong, Xuetao Wei, Liqiang Nie, Jieping Ye

    Abstract: As urban residents demand higher travel quality, vehicle dispatch has become a critical component of online ride-hailing services. However, current vehicle dispatch systems struggle to navigate the complexities of urban traffic dynamics, including unpredictable traffic conditions, diverse driver behaviors, and fluctuating supply and demand patterns. These challenges have resulted in travel difficu… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  24. arXiv:2408.09765  [pdf, other

    cs.LG cs.HC

    Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations

    Authors: Xu Han, Felix Yu, Joao Sedoc, Benjamin Van Durme

    Abstract: Our goal is a mechanism for efficiently assigning scalar ratings to each of a large set of elements. For example, "what percent positive or negative is this product review?" When sample sizes are small, prior work has advocated for methods such as Best Worst Scaling (BWS) as being more robust than direct ordinal annotation ("Likert scales"). Here we first introduce IBWS, which iteratively collects… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  25. arXiv:2408.09198  [pdf, other

    cs.RO

    Learning Based Toolpath Planner on Diverse Graphs for 3D Printing

    Authors: Yuming Huang, Yuhu Guo, Renbo Su, Xingjian Han, Junhao Ding, Tianyu Zhang, Tao Liu, Weiming Wang, Guoxin Fang, Xu Song, Emily Whiting, Charlie C. L. Wang

    Abstract: This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  26. arXiv:2408.08495  [pdf, other

    cs.CV

    Achieving Complex Image Edits via Function Aggregation with Diffusion Models

    Authors: Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu

    Abstract: Diffusion models have demonstrated strong performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet two key challenges persist. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  27. arXiv:2408.08459  [pdf, other

    cs.CL cs.CV cs.LG

    JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

    Authors: Xiaochuang Han, Marjan Ghazvininejad, Pang Wei Koh, Yulia Tsvetkov

    Abstract: Recent work in image and video generation has been adopting the autoregressive LLM architecture due to its generality and potentially easy integration into multi-modal systems. The crux of applying autoregressive training in language generation to visual generation is discretization -- representing continuous data like images and videos as discrete tokens. Common methods of discretizing images and… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  28. arXiv:2408.08342  [pdf, other

    cs.GR cs.CV

    CT4D: Consistent Text-to-4D Generation with Animatable Meshes

    Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong

    Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  29. arXiv:2408.08202  [pdf, other

    cs.CV

    Towards Practical Human Motion Prediction with LiDAR Point Clouds

    Authors: Xiao Han, Yiming Ren, Yichen Yao, Yujing Sun, Yuexin Ma

    Abstract: Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  30. arXiv:2408.06614  [pdf, other

    cs.CV cs.MM

    ViMo: Generating Motions from Casual Videos

    Authors: Liangdong Qiu, Chengxing Yu, Yanran Li, Zhao Wang, Haibin Huang, Chongyang Ma, Di Zhang, Pengfei Wan, Xiaoguang Han

    Abstract: Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting i… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    MSC Class: 68Txx

  31. arXiv:2408.06333  [pdf, other

    cs.CL

    FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection

    Authors: Yufei Huang, Xu Han, Maosong Sun

    Abstract: Open Domain Question Answering (ODQA) has been advancing rapidly in recent times, driven by significant developments in dense passage retrieval and pretrained language models. Current models typically incorporate the FiD framework, which is composed by a neural retriever alongside an encoder-decoder neural reader. In the answer generation process, the retriever will retrieve numerous passages (aro… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: ACL 2024 Main Conference

  32. arXiv:2408.06141  [pdf, ps, other

    cs.FL

    [Draft] High-order observers and high-order state-estimation-based properties of discrete-event systems

    Authors: Kuize Zhang, Xiaoguang Han, Alessandro Giua, Carla Seatzu

    Abstract: State-estimation-based properties are central properties in discrete-event systems modeled by labeled finite-state automata studied over the past 3 decades. Most existing results are based on a single agent who knows the structure of a system and can observe a subset of events and estimate the system's state based on the system's structure and the agent's observation to the system. The main tool u… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 32 pages, 38 figures

  33. arXiv:2408.05933  [pdf

    cs.IR cs.AI cs.MA

    Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models

    Authors: Fei Liu, Zejun Kang, Xing Han

    Abstract: With the growing demand for offline PDF chatbots in automotive industrial production environments, optimizing the deployment of large language models (LLMs) in local, low-performance settings has become increasingly important. This study focuses on enhancing Retrieval-Augmented Generation (RAG) techniques for processing complex automotive industry documents using locally deployed Ollama models. Ba… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  34. arXiv:2408.04846  [pdf, other

    math.NA cs.AI cs.LG cs.MS

    UGrid: An Efficient-And-Rigorous Neural Multigrid Solver for Linear PDEs

    Authors: Xi Han, Fei Hou, Hong Qin

    Abstract: Numerical solvers of Partial Differential Equations (PDEs) are of fundamental significance to science and engineering. To date, the historical reliance on legacy techniques has circumscribed possible integration of big data knowledge and exhibits sub-optimal efficiency for certain PDE formulations, while data-driven neural methods typically lack mathematical guarantee of convergence and correctnes… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  35. arXiv:2408.03281  [pdf, other

    cs.CL cs.AI cs.LG

    StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

    Authors: Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun

    Abstract: Evaluation is the baton for the development of large language models. Current evaluations typically employ a single-item assessment paradigm for each atomic test objective, which struggles to discern whether a model genuinely possesses the required capabilities or merely memorizes/guesses the answers to specific questions. To this end, we propose a novel evaluation framework referred to as StructE… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024;Benchmark at https://github.com/c-box/StructEval ;Leaderboard at https://huggingface.co/spaces/Bowieee/StructEval_leaderboard

  36. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  37. arXiv:2408.01262  [pdf, other

    cs.CL cs.IR

    RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

    Authors: Kunlun Zhu, Yifan Luo, Dingling Xu, Ruobing Wang, Shi Yu, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) systems have demonstrated their advantages in alleviating the hallucination of Large Language Models (LLMs). Existing RAG benchmarks mainly focus on evaluating whether LLMs can correctly answer the general knowledge. However, they are unable to evaluate the effectiveness of the RAG system in dealing with the data from different vertical domains. This paper intr… ▽ More

    Submitted 26 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: add github repo

  38. arXiv:2407.16508  [pdf, other

    cs.CV

    ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

    Authors: Zhenhua Wu, Yanlin Jin, Liangdong Qiu, Xiaoguang Han, Xiang Wan, Guanbin Li

    Abstract: Visualizing colonoscopy is crucial for medical auxiliary diagnosis to prevent undetected polyps in areas that are not fully observed. Traditional feature-based and depth-based reconstruction approaches usually end up with undesirable results due to incorrect point matching or imprecise depth estimation in realistic colonoscopy videos. Modern deep-based methods often require a sufficient number of… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  39. arXiv:2407.16260  [pdf, other

    cs.CV

    DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

    Authors: Zizheng Yan, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying Chen, Xiaoguang Han

    Abstract: Text-to-3D generation has recently seen significant progress. To enhance its practicality in real-world applications, it is crucial to generate multiple independent objects with interactions, similar to layer-compositing in 2D image editing. However, existing text-to-3D methods struggle with this task, as they are designed to generate either non-independent objects or independent objects lacking s… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://chester256.github.io/dreamdissector

  40. On Flange-based 3D Hand-Eye Calibration for Soft Robotic Tactile Welding

    Authors: Xudong Han, Ning Guo, Yu Jie, He Wang, Fang Wan, Chaoyang Song

    Abstract: This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined cal… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 25 pages, 14 figures, 2 tables, Accepted by Measurement

  41. arXiv:2407.15062  [pdf, other

    cs.CR

    AGORA: Open More and Trust Less in Binary Verification Service

    Authors: Hongbo Chen, Quan Zhou, Sen Yang, Xing Han, Fan Zhang, Danfeng Zhang, Xiaofeng Wang

    Abstract: Binary verification plays a pivotal role in software security, yet building a verification service that is both open and trustworthy poses a formidable challenge. In this paper, we introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge. At the heart of this approach lies a strategic insight: certain tasks can be delegated to untrusted entities, while… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  42. arXiv:2407.14247  [pdf, other

    cs.LG cs.AI

    Continual Learning for Adaptable Car-Following in Dynamic Traffic Environments

    Authors: Xianda Chen, PakHin Tiu, Xu Han, Junjie Chen, Yuanfei Wu, Xinhu Zheng, Meixin Zhu

    Abstract: The continual evolution of autonomous driving technology requires car-following models that can adapt to diverse and dynamic traffic environments. Traditional learning-based models often suffer from performance degradation when encountering unseen traffic patterns due to a lack of continual learning capabilities. This paper proposes a novel car-following model based on continual learning that addr… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  43. arXiv:2407.13217  [pdf, other

    cs.CV

    LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

    Authors: Wei Huang, Wei Liu, Xiaoming Zhang, Xiaoli Yin, Xu Han, Chunli Li, Yuan Gao, Yu Shi, Le Lu, Ling Zhang, Lei Zhang, Ke Yan

    Abstract: The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors. In this work, a precise LIver tumor DIAgnosis network on multi-phase contrast-enhance CT, named LIDIA, is proposed for real-world scenario. To fully utilize all available phases in contrast-enhanced CT, L… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  44. arXiv:2407.11470  [pdf, other

    cs.SE cs.AI cs.CL

    Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

    Authors: Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun

    Abstract: In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, existing benchmarks primarily focus on assessing the correctness of code generated by LLMs, while neglecting other critical dimensions that also significantly impact code quality. Therefore, this paper proposes the RACE benchmark, which comprehensi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: We release benchmark at https://github.com/jszheng21/RACE and leaderboard at https://huggingface.co/spaces/jszheng/RACE_leaderboard

  45. arXiv:2407.09833  [pdf, other

    cs.CV

    LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment

    Authors: Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun, Yuexin Ma

    Abstract: LiDAR-based human motion capture has garnered significant interest in recent years for its practicability in large-scale and unconstrained environments. However, most methods rely on cleanly segmented human point clouds as input, the accuracy and smoothness of their motion results are compromised when faced with noisy data, rendering them unsuitable for practical applications. To address these lim… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  46. arXiv:2407.07747  [pdf, other

    cs.NI cs.AI

    HGFF: A Deep Reinforcement Learning Framework for Lifetime Maximization in Wireless Sensor Networks

    Authors: Xiaoxu Han, Xin Mu, Jinghui Zhong

    Abstract: Planning the movement of the sink to maximize the lifetime in wireless sensor networks is an essential problem of great research challenge and practical value. Many existing mobile sink techniques based on mathematical programming or heuristics have demonstrated the feasibility of the task. Nevertheless, the huge computation consumption or the over-reliance on human knowledge can result in relativ… ▽ More

    Submitted 11 April, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  47. arXiv:2407.06654  [pdf, other

    cs.CL cs.AI

    SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training

    Authors: Nan He, Weichen Xiong, Hanwen Liu, Yi Liao, Lei Ding, Kai Zhang, Guohua Tang, Xiao Han, Wei Yang

    Abstract: The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable information and neglects the varying degrees of duplication. To address this, we propose a soft deduplication method that maintains dataset integrity while selective… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures

  48. arXiv:2407.05254  [pdf, other

    cs.CV

    GaussReg: Fast 3D Registration with Gaussian Splatting

    Authors: Jiahao Chang, Yinglin Xu, Yihao Li, Yuantao Chen, Xiaoguang Han

    Abstract: Point cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  49. arXiv:2407.02716  [pdf, other

    cs.CV cs.LG

    Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

    Authors: Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu

    Abstract: Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  50. arXiv:2407.02516  [pdf, other

    cs.RO cs.AI

    EditFollower: Tunable Car Following Models for Customizable Adaptive Cruise Control Systems

    Authors: Xianda Chen, Xu Han, Meixin Zhu, Xiaowen Chu, PakHin Tiu, Xinhu Zheng, Yinhai Wang

    Abstract: In the realm of driving technologies, fully autonomous vehicles have not been widely adopted yet, making advanced driver assistance systems (ADAS) crucial for enhancing driving experiences. Adaptive Cruise Control (ACC) emerges as a pivotal component of ADAS. However, current ACC systems often employ fixed settings, failing to intuitively capture drivers' social preferences and leading to potentia… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.