Skip to main content

Showing 1–50 of 177 results for author: Geng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02601  [pdf, other

    cs.CY

    ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society

    Authors: Muzhi Zhou, Lu Yu, Xiaomin Geng, Lan Luo

    Abstract: The extent to which Large Language Models (LLMs) can simulate the data-generating process for social surveys remains unclear. Current research has not thoroughly assessed potential biases in the sociodemographic population represented within the language model's framework. Additionally, the subjective worlds of LLMs often show inconsistencies in how closely their responses match those of groups of… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2408.07966  [pdf, other

    cs.LG cs.DC

    Addressing Skewed Heterogeneity via Federated Prototype Rectification with Personalization

    Authors: Shunxin Guo, Hongsong Wang, Shuxia Lin, Zhiqiang Kou, Xin Geng

    Abstract: Federated learning is an efficient framework designed to facilitate collaborative model training across multiple distributed devices while preserving user data privacy. A significant challenge of federated learning is data-level heterogeneity, i.e., skewed or long-tailed distribution of private data. Although various methods have been proposed to address this challenge, most of them assume that th… ▽ More

    Submitted 22 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.07337  [pdf, other

    cs.CV

    KIND: Knowledge Integration and Diversion in Diffusion Models

    Authors: Yucheng Xie, Fu Feng, Jing Wang, Xin Geng, Yong Rui

    Abstract: Pre-trained models have become the preferred backbone due to the expansion of model parameters, with techniques like Parameter-Efficient Fine-Tuning (PEFTs) typically fixing the parameters of these models. However, pre-trained models may not always be optimal, especially when there are discrepancies between training tasks and target tasks, potentially resulting in negative transfer. To address thi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  4. arXiv:2408.02599  [pdf, other

    cs.CL cs.AI

    Progressively Selective Label Enhancement for Language Model Alignment

    Authors: Biao Liu, Ning Xu, Xin Geng

    Abstract: Large Language Models have demonstrated impressive capabilities in various language tasks but may produce content that misaligns with human expectations, raising ethical and legal concerns. Therefore, it is important to explore the limitations and implement restrictions on the models to ensure safety and compliance, with Reinforcement Learning from Human Feedback (RLHF) being the primary method. D… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  5. arXiv:2408.00804  [pdf, other

    cs.AR cs.AI cs.LG

    ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model

    Authors: Ning Xu, Zhaoyang Zhang, Lei Qi, Wensuo Wang, Chao Zhang, Zihao Ren, Huaiyuan Zhang, Xin Cheng, Yanqi Zhang, Zhichao Liu, Qingwen Wei, Shiyang Wu, Lanlan Yang, Qianfeng Lu, Yiqun Ma, Mengyao Zhao, Junbo Liu, Yufan Song, Xin Geng, Jun Yang

    Abstract: The field of integrated circuit (IC) design is highly specialized, presenting significant barriers to entry and research and development challenges. Although large language models (LLMs) have achieved remarkable success in various domains, existing LLMs often fail to meet the specific needs of students, engineers, and researchers. Consequently, the potential of LLMs in the IC design domain remains… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

  6. arXiv:2407.20439  [pdf, other

    cs.RO cs.HC eess.SY

    Haptic feedback of front car motion can improve driving control

    Authors: Xiaoxiao Cheng, Xianzhe Geng, Yanpei Huang, Etienne Burdet

    Abstract: This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The resul… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  7. arXiv:2407.02098  [pdf, other

    cs.CV

    DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection

    Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applicat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2407.02068  [pdf, other

    cs.CV

    LPViT: Low-Power Semi-structured Pruning for Vision Transformers

    Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More

    Submitted 12 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2406.17503  [pdf, other

    cs.LG

    WAVE: Weight Template for Adaptive Initialization of Variable-sized Models

    Authors: Fu Feng, Yucheng Xie, Jing Wang, Xin Geng

    Abstract: The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking p… ▽ More

    Submitted 15 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.14532  [pdf, other

    cs.LG cs.CL

    RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

    Authors: Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith, Aviral Kumar

    Abstract: Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conceptual understanding of our observations. First, we find that while the typical approach of finetuning a model on synthetic correct or positive problem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  11. arXiv:2406.13185  [pdf, other

    cs.CL

    Learnable In-Context Vector for Visual Question Answering

    Authors: Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

    Abstract: As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, applying ICL us… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.12199  [pdf, other

    cs.LG cs.AI

    Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

    Authors: Haowei Ni, Shuchen Meng, Xieming Geng, Panfeng Li, Zhuoying Li, Xupeng Chen, Xiaotong Wang, Shiyao Zhang

    Abstract: Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 6th International Conference on Electronic Engineering and Informatics

  13. arXiv:2406.09397  [pdf, other

    cs.CV cs.AI

    Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

    Abstract: Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 26 figures, under review

  14. arXiv:2406.07871  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Flexible Music-Conditioned Dance Generation with Style Description Prompts

    Authors: Hongsong Wang, Yin Zhu, Xin Geng

    Abstract: Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framew… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2405.16474  [pdf, other

    cs.LG

    Inaccurate Label Distribution Learning with Dependency Noise

    Authors: Zhiqiang Kou, Jing Wang, Yuheng Jia, Xin Geng

    Abstract: In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instance… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  16. arXiv:2405.13923  [pdf, other

    cs.CL

    Why Not Transform Chat Large Language Models to Non-English?

    Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

    Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  17. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  18. arXiv:2405.06038  [pdf, other

    cs.LG cs.AI

    From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

    Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Chao Jin, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

    Abstract: Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compress… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)

  19. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  20. arXiv:2404.16897  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

    Authors: Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

    Abstract: In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.13565  [pdf, other

    cs.CV cs.AI cs.CL

    Exploring Diverse Methods in Visual Question Answering

    Authors: Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian

    Abstract: This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more com… ▽ More

    Submitted 20 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  22. arXiv:2403.16697  [pdf, other

    cs.CV

    DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

    Authors: Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng

    Abstract: Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Research in SFDG primarily bulids upon the existing knowledge of large-scale vision-language models and utilizes the pre-trained model's joint vision-language space to simulate style transfer across domains, thus eliminating the dependency on source domain ima… ▽ More

    Submitted 14 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TMM

  23. arXiv:2403.14118  [pdf, other

    cs.CL

    From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation

    Authors: Haofei Zhao, Yilun Liu, Shimin Tao, Weibin Meng, Yimeng Chen, Xiang Geng, Chang Su, Min Zhang, Hao Yang

    Abstract: Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodolog… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCNN 2024

  24. arXiv:2403.13351  [pdf, other

    cs.CV

    OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

    Authors: Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang

    Abstract: Redundancy is a persistent challenge in Capsule Networks (CapsNet),leading to high computational costs and parameter counts. Although previous works have introduced pruning after the initial capsule layer, dynamic routing's fully connected nature and non-orthogonal weight matrices reintroduce redundancy in deeper layers. Besides, dynamic routing requires iterating to converge, further increasing c… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages

  25. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  26. arXiv:2402.19145  [pdf, other

    cs.CV

    A SAM-guided Two-stream Lightweight Model for Anomaly Detection

    Authors: Chenghao Li, Lei Qi, Xin Geng

    Abstract: In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors,… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  27. arXiv:2401.13011  [pdf, other

    cs.CV

    CCA: Collaborative Competitive Agents for Image Editing

    Authors: Tiankai Hang, Shuyang Gu, Dong Chen, Xin Geng, Baining Guo

    Abstract: This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructio… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  28. arXiv:2401.08139  [pdf, other

    cs.LG cs.NE

    Transferring Core Knowledge via Learngenes

    Authors: Fu Feng, Jing Wang, Xin Geng

    Abstract: The pre-training paradigm fine-tunes the models trained on large-scale datasets to downstream tasks with enhanced performance. It transfers all knowledge to downstream tasks without discriminating which part is necessary or unnecessary, which may lead to negative transfer. In comparison, knowledge transfer in nature is much more efficient. When passing genetic information to descendants, ancestors… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  29. arXiv:2401.06838  [pdf, other

    cs.CL

    MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

    Authors: Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen

    Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimizatio… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: The project is available at https://github.com/NJUNLP/MAPO

  30. arXiv:2401.06568  [pdf, other

    cs.CL cs.AI

    Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

    Authors: Xu Huang, Zhirui Zhang, Xiang Geng, Yichao Du, Jiajun Chen, Shujian Huang

    Abstract: This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task, aiming to better understand the mechanisms behind their remarkable performance in this task. We design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versu… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL2024 Findings

  31. arXiv:2312.15156  [pdf, other

    cs.CL

    Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

    Authors: Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing

    Abstract: Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on… ▽ More

    Submitted 10 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Technical Report, 6 pages

  32. arXiv:2312.09881  [pdf, other

    cs.LG cs.AI

    Dynamic Heterogeneous Federated Learning with Multi-Level Prototypes

    Authors: Shunxin Guo, Hongsong Wang, Xin Geng

    Abstract: Federated learning shows promise as a privacy-preserving collaborative learning technique. Existing heterogeneous federated learning mainly focuses on skewing the label distribution across clients. However, most approaches suffer from catastrophic forgetting and concept drift, mainly when the global distribution of all classes is extremely unbalanced and the data distribution of the client dynamic… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  33. arXiv:2312.06343  [pdf, other

    cs.LG

    RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Inter-label Correlations

    Authors: Kouzhiqiang Yucheng Xie, Jing Wang, Yuheng Jia, Boyu Shi, Xin Geng

    Abstract: This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL). Addressing the challenge of limited labeled data, RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data, reducing the need for extensive manual labeling in Deep Neural Network (DNN) applications. Specifically, RankMatch… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  34. arXiv:2312.05743  [pdf, other

    cs.LG cs.CV

    Building Variable-sized Models via Learngene Pool

    Authors: Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

    Abstract: Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challe… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  35. arXiv:2312.05614  [pdf, other

    cs.AI cs.LG

    Transformer as Linear Expansion of Learngene

    Authors: Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

    Abstract: We propose expanding the shared Transformer module to produce and initialize Transformers of varying depths, enabling adaptation to diverse resource constraints. Drawing an analogy to genetic expansibility, we term such module as learngene. To identify the expansion mechanism, we delve into the relationship between the layer's position and its corresponding weight value, and find that linear funct… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  36. arXiv:2312.01598  [pdf, other

    cs.CV

    Good Questions Help Zero-Shot Image Reasoning

    Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

    Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  37. arXiv:2312.00785  [pdf, other

    cs.CV

    Sequential Modeling Enables Scalable Learning for Large Vision Models

    Authors: Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

    Abstract: We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Website: https://yutongbai.com/lvm.html

  38. arXiv:2312.00351  [pdf, other

    cs.CV

    Manipulating the Label Space for In-Context Classification

    Authors: Haokun Chen, Xu Yang, Yuhang Huang, Zihan Wu, Jing Wang, Xin Geng

    Abstract: After pre-training by generating the next word conditional on previous words, the Language Model (LM) acquires the ability of In-Context Learning (ICL) that can learn a new task conditional on the context of the given in-context examples (ICEs). Similarly, visually-conditioned Language Modelling is also used to train Vision-Language Models (VLMs) with ICL ability. However, such VLMs typically exhi… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

  39. arXiv:2311.16556  [pdf, other

    cs.LG

    Scalable Label Distribution Learning for Multi-Label Classification

    Authors: Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng

    Abstract: Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their co… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  40. arXiv:2311.13198  [pdf, other

    cs.CV

    DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

    Authors: Lei Qi, Peng Dong, Tan Xiong, Hui Xue, Xin Geng

    Abstract: Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny day images may not generalize well to rainy day images. In this paper, we a… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

  41. arXiv:2311.08734  [pdf, other

    cs.CL

    Thread of Thought Unraveling Chaotic Contexts

    Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen

    Abstract: Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 5 tables

  42. arXiv:2310.19491  [pdf, ps, other

    math.ST cs.LG stat.ML

    Generator Identification for Linear SDEs with Additive and Multiplicative Noise

    Authors: Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong

    Abstract: In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifica… ▽ More

    Submitted 21 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  43. arXiv:2310.11731  [pdf, other

    cs.AI

    Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

    Authors: Jianlan Luo, Perry Dong, Jeffrey Wu, Aviral Kumar, Xinyang Geng, Sergey Levine

    Abstract: The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. While policy constraints, conservatism, and other methods for mitigating distributional shifts have made offline reinforcement learning more effective, the continuous action setting often necessitates various a… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  44. arXiv:2310.10056  [pdf, other

    cs.LG

    Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

    Authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine

    Abstract: In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density fu… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  45. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  46. arXiv:2309.13886  [pdf, other

    cs.LG

    Can Class-Priors Help Single-Positive Multi-Label Learning?

    Authors: Biao Liu, Ning Xu, Jie Wang, Xin Geng

    Abstract: Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem, where each training example is annotated with only one positive label. Existing SPMLL methods typically assign pseudo-labels to unannotated labels with the assumption that prior probabilities of all classes are identical. However, the class-prior of each category may differ significantly in re… ▽ More

    Submitted 26 May, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  47. arXiv:2309.13230  [pdf, other

    cs.CL

    Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

    Authors: Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang

    Abstract: We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe).… ▽ More

    Submitted 11 December, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: WMT2023 System Paper

    Journal ref: https://aclanthology.org/2023.wmt-1.71

  48. arXiv:2308.16718  [pdf, other

    cs.LG

    Robust Representation Learning for Unreliable Partial Label Learning

    Authors: Yu Shi, Dong-Dong Wu, Xin Geng, Min-Ling Zhang

    Abstract: Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (U… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  49. arXiv:2308.10438  [pdf, other

    cs.CV

    Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks

    Authors: Kaixin Xu, Zhe Wang, Xue Geng, Jie Lin, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: In this paper, we propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs) that addresses the challenge of optimizing the output distortion minimization while adhering to a target pruning ratio constraint. Our approach takes into account the collective influence of all layers to design a layer-adaptive pruning scheme. We discover and utilize a very important additivit… ▽ More

    Submitted 24 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  50. arXiv:2308.09583  [pdf, other

    cs.CL cs.AI cs.LG

    WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

    Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: LLM, Mathematical Reasoning