Skip to main content

Showing 1–50 of 371 results for author: Xiao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.11308  [pdf, other

    cs.CL

    SpMis: An Investigation of Synthetic Spoken Misinformation Detection

    Authors: Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, Zhizheng Wu

    Abstract: In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also raised concerns about the misuse of this technology, particularly for generating synthetic misinformation. Current research primarily focuses on distinguishing machi… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted in SLT 2024

  2. arXiv:2409.10764  [pdf, other

    cs.LG cs.CR

    Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities

    Authors: Zikai Zhang, Suman Rath, Jiaohao Xu, Tingsong Xiao

    Abstract: The Smart Grid (SG) is a critical energy infrastructure that collects real-time electricity usage data to forecast future energy demands using information and communication technologies (ICT). Due to growing concerns about data security and privacy in SGs, federated learning (FL) has emerged as a promising training framework. FL offers a balance between privacy, efficiency, and accuracy in SGs by… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    MSC Class: C.2.4

  3. arXiv:2409.10365  [pdf, other

    cs.CV cs.AI

    Robust image representations with counterfactual contrastive learning

    Authors: Mélanie Roschewitz, Fabio De Sousa Ribeiro, Tian Xia, Galvin Khara, Ben Glocker

    Abstract: Contrastive pretraining can substantially increase model generalisation and downstream performance. However, the quality of the learned representations is highly dependent on the data augmentation strategy applied to generate positive pairs. Positive contrastive pairs should preserve semantic meaning while discarding unwanted variations related to the data acquisition domain. Traditional contrasti… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: Code available at https://github.com/biomedia-mira/counterfactual-contrastive/

  4. arXiv:2409.08788  [pdf, other

    cs.LG

    Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling

    Authors: Jialu Tang, Tong Xia, Yuan Lu, Cecilia Mascolo, Aaqib Saeed

    Abstract: Interpreting electrocardiograms (ECGs) and generating comprehensive reports remain challenging tasks in cardiology, often requiring specialized expertise and significant time investment. To address these critical issues, we propose ECG-ReGen, a retrieval-based approach for ECG-to-text report generation and question answering. Our method leverages a self-supervised learning for the ECG encoder, ena… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  5. arXiv:2409.05585  [pdf, other

    cs.CV cs.AI

    Latent 3D Brain MRI Counterfactual

    Authors: Wei Peng, Tian Xia, Fabio De Sousa Ribeiro, Tomas Bosschieter, Ehsan Adeli, Qingyu Zhao, Ben Glocker, Kilian M. Pohl

    Abstract: The number of samples in structural brain MRI studies is often too small to properly train deep learning models. Generative models show promise in addressing this issue by effectively learning the data distribution and generating high-fidelity MRI. However, they struggle to produce diverse, high-quality data outside the distribution defined by the training data. One way to address the issue is usi… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  6. arXiv:2409.04847  [pdf, other

    cs.CV

    Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

    Authors: Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng Zhang

    Abstract: Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-att… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  7. arXiv:2409.03970  [pdf, other

    cs.DC cs.DS

    A Hybrid Vectorized Merge Sort on ARM NEON

    Authors: Jincheng Zhou, Jin Zhang, Xiang Zhang, Tiaojie Xiao, Di Ma, Chunye Gong

    Abstract: Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different architectures offer distinct flavors to the implementation of parallel sorting. In this paper, we propose a hybrid vectorized merge sort on ARM NEON, named NEON Merge Sort for short (NEON-MS). In detail,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by ICA3PP

  8. arXiv:2408.17377  [pdf, other

    cs.CL cs.AI

    NDP: Next Distribution Prediction as a More Broad Target

    Authors: Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li, Chenglong Wang, Yuchun Fan, Yuan Ge, Tong Xiao, Jingbo Zhu

    Abstract: Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and error propagation during inference. In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the pred… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages,5 figures

  9. arXiv:2408.13006  [pdf, other

    cs.CL

    Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates

    Authors: Hui Wei, Shenghua He, Tian Xia, Andy Wong, Jingyang Lin, Mei Han

    Abstract: Alignment approaches such as RLHF and DPO are actively investigated to align large language models (LLMs) with human preferences. Commercial large language models (LLMs) like GPT-4 have been recently employed to evaluate and compare different LLM alignment approaches. These models act as surrogates for human evaluators due to their promising abilities to approximate human preferences with remarkab… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Preprint, under review. 17 pages, 7 figures, 16 tables

  10. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  11. arXiv:2408.09490  [pdf, other

    cs.LG cs.AI

    Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts

    Authors: Jinluan Yang, Zhengyu Chen, Teng Xiao, Wenqiao Zhang, Yong Lin, Kun Kuang

    Abstract: Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better HGNN backbones or architectures for nod… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 20 pages, 7 figures

  12. arXiv:2408.08313  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Can Large Language Models Understand Symbolic Graphics Programs?

    Authors: Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf

    Abstract: Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown ex… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Technical Report v1 (44 pages, 23 figures, project page: https://sgp-bench.github.io/)

  13. arXiv:2408.07249  [pdf, other

    cs.CV cs.IR

    GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

    Authors: Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou

    Abstract: In the rapidly expanding domain of web video content, the task of text-video retrieval has become increasingly critical, bridging the semantic gap between textual queries and video data. This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video, enhancing the effectiveness of text-video retrieval sys… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 18 pages including appendix

  14. arXiv:2408.02714  [pdf, other

    cs.LG cs.AI

    MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis

    Authors: Dongwei Xu, Jiajun Chen, Yao Lu, Tianhao Xia, Qi Xuan, Wei Wang, Yun Lin, Xiaoniu Yang

    Abstract: Recently, deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks. However, the success of deep learning is all attributed to the training on large-scale datasets. Such a large amount of data brings huge pressure on storage, transmission and model training. In order to solve the problem of large amount of data, some researchers put forward the me… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  15. arXiv:2408.02689  [pdf, ps, other

    cs.LG cs.AI

    Spatio-Temporal Partial Sensing Forecast for Long-term Traffic

    Authors: Zibo Liu, Zhe Jiang, Zelin Xu, Tingsong Xiao, Zhengkun Xiao, Haibo Wang, Shigang Chen

    Abstract: Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure inve… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  16. arXiv:2408.01944  [pdf, ps, other

    cs.CV eess.IV

    RobNODDI: Robust NODDI Parameter Estimation with Adaptive Sampling under Continuous Representation

    Authors: Taohui Xiao, Jian Cheng, Wenxin Fan, Jing Yang, Cheng Li, Enqing Dong, Shanshan Wang

    Abstract: Neurite Orientation Dispersion and Density Imaging (NODDI) is an important imaging technology used to evaluate the microstructure of brain tissue, which is of great significance for the discovery and treatment of various neurological diseases. Current deep learning-based methods perform parameter estimation through diffusion magnetic resonance imaging (dMRI) with a small number of diffusion gradie… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  17. arXiv:2408.01890  [pdf, other

    cs.CL

    Cross-layer Attention Sharing for Large Language Models

    Authors: Yongyu Mu, Yuzhang Wu, Yuchun Fan, Chenglong Wang, Hengyu Li, Qiaozhi He, Murun Yang, Tong Xiao, Jingbo Zhu

    Abstract: As large language models (LLMs) evolve, the increase in model depth and parameter number leads to substantial redundancy. To enhance the efficiency of the attention mechanism, previous works primarily compress the KV cache or group attention heads, while largely overlooking redundancy between layers. Our comprehensive analyses across various LLMs show that highly similar attention patterns persist… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Working in process

  18. arXiv:2408.00041  [pdf, other

    cs.AI

    Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

    Authors: Junru Chen, Tianyu Cao, Jing Xu, Jiahe Li, Zhilong Chen, Tao Xiao, Yang Yang

    Abstract: Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  19. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  20. arXiv:2407.13164  [pdf, other

    cs.CL cs.AI

    Translate-and-Revise: Boosting Large Language Models for Constrained Translation

    Authors: Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu

    Abstract: Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prom… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 16 pages

  21. arXiv:2407.01896  [pdf, other

    cs.CL cs.IR

    LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

    Authors: Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, Dan Pei

    Abstract: Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  22. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  23. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  24. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  25. arXiv:2406.14250  [pdf, other

    cs.CV cs.HC

    E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

    Authors: Ke Wang, Tianyu Xia, Zhangxuan Gu, Yi Zhao, Shuheng Shen, Changhua Meng, Weiqiang Wang, Ke Xu

    Abstract: Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, Under review

  26. arXiv:2406.13542  [pdf, other

    cs.CL cs.AI cs.LG

    Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

    Authors: Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou

    Abstract: One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-fol… ▽ More

    Submitted 18 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Work in progress

  27. arXiv:2406.12297  [pdf, other

    cs.LG cs.AI

    Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

    Authors: Ji Xu, Tianlong Xiao, Jinye Yang, Panpan Zhu

    Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper presents a novel approach FaithPDP that takes advantages of both hardware (multi-core architecture of CPU) and modern programming language (Python or Matlab for efficient vector and matrix computation) to achieve clustering result identical to vanilla DP algorithm, while the computing complexity is reduced to pseudo-linear

  28. arXiv:2406.10808  [pdf, other

    cs.LG

    Diffusion Model With Optimal Covariance Matching

    Authors: Zijing Ou, Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yingzhen Li, David Barber

    Abstract: The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  29. FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

    Authors: Tong Xia, Abhirup Ghosh, Xinchi Qiu, Cecilia Mascolo

    Abstract: Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This work was intended as a replacement of arXiv:2312.02327 and any subsequent updates will appear there

  30. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 5 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  31. arXiv:2406.09196  [pdf, other

    cs.CV cs.LG

    Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang

    Abstract: Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  32. arXiv:2406.07168  [pdf, other

    cs.CL

    Teaching Language Models to Self-Improve by Learning from Language Feedback

    Authors: Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, Jingbo Zhu

    Abstract: Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotati… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  33. arXiv:2406.04344  [pdf, other

    cs.LG cs.CL cs.CV

    Verbalized Machine Learning: Revisiting Machine Learning with Language Models

    Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

    Abstract: Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Technical Report v1 (92 pages, 15 figures)

  34. arXiv:2406.01276  [pdf, other

    cs.CL

    EduNLP: Towards a Unified and Modularized Library for Educational Resources

    Authors: Zhenya Huang, Yuting Ning, Longhu Qin, Shiwei Tong, Shangzi Xue, Tong Xiao, Xin Lin, Jiayu Liu, Qi Liu, Enhong Chen, Shijing Wang

    Abstract: Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridg… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  35. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, Yingfeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 20 August, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024

  36. arXiv:2405.16030  [pdf, other

    cs.LG

    Constrained Ensemble Exploration for Unsupervised Skill Discovery

    Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li

    Abstract: Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  37. arXiv:2405.14804  [pdf, other

    cs.CL

    Can LLMs Solve longer Math Word Problems Better?

    Authors: Xin Xu, Tong Xiao, Zitong Chao, Zhenya Huang, Can Yang, Yang Wang

    Abstract: Math Word Problems (MWPs) are crucial for evaluating the capability of Large Language Models (LLMs), with current research primarily focusing on questions with concise contexts. However, as real-world math problems often involve complex circumstances, LLMs' ability to solve long MWPs is vital for their applications in these scenarios, yet remains under-explored. This study pioneers the exploration… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.13409  [pdf, other

    cs.GR

    Specular Polynomials

    Authors: Zhimin Fan, Jie Guo, Yiming Wang, Tianyu Xiao, Hao Zhang, Chenxi Zhou, Zhenyu Chen, Pengpei Hong, Yanwen Guo, Ling-Qi Yan

    Abstract: Finding valid light paths that involve specular vertices in Monte Carlo rendering requires solving many non-linear, transcendental equations in high-dimensional space. Existing approaches heavily rely on Newton iterations in path space, which are limited to obtaining at most a single solution each time and easily diverge when initialized with improper seeds. We propose specular polynomials, a Ne… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted by SIGGRAPH 2024

    ACM Class: I.3.3

  39. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  40. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  41. arXiv:2405.10516  [pdf, other

    cs.CL cs.AI

    Language Models can Evaluate Themselves via Probability Discrepancy

    Authors: Tingyu Xia, Bowen Yu, Yuan Wu, Yi Chang, Chang Zhou

    Abstract: In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. T… ▽ More

    Submitted 8 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings

  42. arXiv:2405.06232  [pdf, other

    cs.AI

    Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

    Authors: Tong Xiao, Jiayu Liu, Zhenya Huang, Jinze Wu, Jing Sha, Shijin Wang, Enhong Chen

    Abstract: Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of enc… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024 Accepted

  43. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  44. arXiv:2405.01649  [pdf, other

    cs.CL

    Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

    Authors: Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

    Abstract: Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propo… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2404.18930  [pdf, other

    cs.CV

    Hallucination of Multimodal Large Language Models: A Survey

    Authors: Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

    Abstract: This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge k… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 140 references

  46. arXiv:2404.13885  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

    Authors: Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

    Abstract: Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  47. arXiv:2404.08679  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

    Authors: Andi Zhang, Tim Z. Xiao, Weiyang Liu, Robert Bamler, Damon Wischik

    Abstract: We revisit the likelihood ratio between a pretrained large language model (LLM) and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to disti… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  48. Direct May Not Be the Best: An Incremental Evolution View of Pose Generation

    Authors: Yuelong Li, Tengfei Xiao, Lei Geng, Jianming Wang

    Abstract: Pose diversity is an inherent representative characteristic of 2D images. Due to the 3D to 2D projection mechanism, there is evident content discrepancy among distinct pose images. This is the main obstacle bothering pose transformation related researches. To deal with this challenge, we propose a fine-grained incremental evolution centered pose generation framework, rather than traditional direct… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at AAAI2024

  49. arXiv:2404.01077  [pdf, other

    cs.CL

    Efficient Prompting Methods for Large Language Models: A Survey

    Authors: Kaiyan Chang, Songcheng Xu, Chenglong Wang, Yingfeng Luo, Tong Xiao, Jingbo Zhu

    Abstract: Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. While this approach opens the door to in-context learning of LLMs, it brings the additional computational burden of model inference and human effort of manual-designed prompts, particularly when using lengthy and complex prompts to guide and control the behavior of LL… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  50. arXiv:2404.00978  [pdf, other

    cs.CL

    Prior Constraints-based Reward Model Training for Aligning Large Language Models

    Authors: Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu

    Abstract: Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs.However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model.This paper proposes a Prior Constraints-b… ▽ More

    Submitted 18 September, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by CCL 2024