Skip to main content

Showing 1–50 of 737 results for author: Zhu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03449  [pdf, other

    cs.IR

    MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search

    Authors: Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun, Ping Li

    Abstract: Baidu runs the largest commercial web search engine in China, serving hundreds of millions of online users every day in response to a great variety of queries. In order to build a high-efficiency sponsored search engine, we used to adopt a three-layer funnel-shaped structure to screen and sort hundreds of ads from billions of ad candidates subject to the requirement of low response latency and the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by KDD'19

  2. arXiv:2409.03365  [pdf, other

    cs.DC cs.LG

    Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management

    Authors: Yujie Wang, Shenhan Zhu, Fangcheng Fu, Xupeng Miao, Jie Zhang, Juan Zhu, Fan Hong, Yong Li, Bin Cui

    Abstract: Recent foundation models are capable of handling multiple machine learning (ML) tasks and multiple data modalities with the unified base model structure and several specialized model components. However, the development of such multi-task (MT) multi-modal (MM) models poses significant model management challenges to existing training systems. Due to the sophisticated model architecture and the hete… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.02123  [pdf, other

    cs.LG cs.AI physics.ao-ph

    PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

    Authors: Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

    Abstract: Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  4. Priority based inter-twin communication in vehicular digital twin networks

    Authors: Qasim Zia, Chenyu Wang, Saide Zhu, Yingshu Li

    Abstract: With the advancement and boom of autonomous vehicles, vehicular digital twins (VDTs) have become an emerging research area. VDT can solve the issues related to autonomous vehicles and provide improved and enhanced services to users. Recent studies have demonstrated the potential of using priorities in acquiring improved response time. However, since VDT is comprised of intra-twin and inter-twin co… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: This is an Accepted Manuscript of an article published by Taylor & Francis Group in the International Journal of Parallel, Emergent & Distributed Systems on 02 Sep 2024

  5. arXiv:2409.01559  [pdf, other

    cs.RO

    PR2: A Physics- and Photo-realistic Testbed for Embodied AI and Humanoid Robots

    Authors: Hangxin Liu, Qi Xie, Zeyu Zhang, Tao Yuan, Xiaokun Leng, Lining Sun, Song-Chun Zhu, Jingwen Zhang, Zhicheng He, Yao Su

    Abstract: This paper presents the development of a Physics-realistic and Photo-\underline{r}ealistic humanoid robot testbed, PR2, to facilitate collaborative research between Embodied Artificial Intelligence (Embodied AI) and robotics. PR2 offers high-quality scene rendering and robot dynamic simulation, enabling (i) the creation of diverse scenes using various digital assets, (ii) the integration of advanc… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.00598  [pdf, other

    cs.CL cs.CR cs.CY cs.LG

    Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

    Authors: Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang

    Abstract: Safety-aligned large language models (LLMs) sometimes falsely refuse pseudo-harmful prompts, like "how to kill a mosquito," which are actually harmless. Frequent false refusals not only frustrate users but also provoke a public backlash against the very values alignment seeks to protect. In this paper, we propose the first method to auto-generate diverse, content-controlled, and model-dependent ps… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  7. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  8. arXiv:2408.15792  [pdf, other

    cs.LG

    Efficient LLM Scheduling by Learning to Rank

    Authors: Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang

    Abstract: In Large Language Model (LLM) inference, the output length of an LLM request is typically regarded as not known a priori. Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Line (HOL) blocking and reduced throughput and service quality. In this paper, we reexamine this assumption -- we show that, although predicting the exac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  9. arXiv:2408.12419  [pdf, other

    cs.LG cs.AI

    4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment

    Authors: Kaihui Cheng, Ce Liu, Qingkun Su, Jun Wang, Liwei Zhang, Yining Tang, Yao Yao, Siyu Zhu, Yuan Qi

    Abstract: Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  10. arXiv:2408.12413  [pdf, other

    q-bio.BM cs.AI

    Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

    Authors: Ce Liu, Jun Wang, Zhiqiang Cai, Yingxu Wang, Huizhen Kuang, Kaihui Cheng, Liwei Zhang, Qingkun Su, Yining Tang, Fenglei Cao, Limei Han, Siyu Zhu, Yuan Qi

    Abstract: Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  11. arXiv:2408.11480  [pdf, other

    eess.IV cs.CV

    OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal

    Authors: Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu

    Abstract: Deep learning-based methods have shown remarkable performance in single JPEG artifacts removal task. However, existing methods tend to degrade on double JPEG images, which are prevalent in real-world scenarios. To address this issue, we propose Offset-Aware Partition Transformer for double JPEG artifacts removal, termed as OAPT. We conduct an analysis of double JPEG compression that results in up… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures. Codes and models are available at https://github.com/QMoQ/OAPT.git

  12. arXiv:2408.09452  [pdf, other

    cs.CL

    Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning

    Authors: Yuchen Yan, Hanjie Zhao, Senbin Zhu, Hongde Liu, Zhihong Zhang, Yuxiang Jia

    Abstract: Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by NLPCC 2024

  13. arXiv:2408.07324  [pdf, other

    cs.AI cs.LO

    On-the-fly Synthesis for LTL over Finite Traces: An Efficient Approach that Counts

    Authors: Shengping Xiao, Yongkang Li, Shufang Zhu, Jun Sun, Jianwen Li, Geguang Pu, Moshe Y. Vardi

    Abstract: We present an on-the-fly synthesis framework for Linear Temporal Logic over finite traces (LTLf) based on top-down deterministic automata construction. Existing approaches rely on constructing a complete Deterministic Finite Automaton (DFA) corresponding to the LTLf specification, a process with doubly exponential complexity relative to the formula size in the worst case. In this case, the synthes… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 32 pages, 3 figures, 3 tables

  14. arXiv:2408.07055  [pdf, other

    cs.CL cs.LG

    LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

    Authors: Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li

    Abstract: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarc… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  15. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  16. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  17. arXiv:2408.06273  [pdf, other

    cs.CL

    FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data

    Authors: Haoran Sun, Renren Jin, Shaoyang Xu, Leiyu Pan, Supryadi, Menglong Cui, Jiangcun Du, Yikun Lei, Lei Yang, Ling Shi, Juesi Xiao, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) have demonstrated prowess in a wide range of tasks. However, many LLMs exhibit significant performance discrepancies between high- and low-resource languages. To mitigate this challenge, we present FuxiTranyu, an open-source multilingual LLM, which is designed to satisfy the need of the research community for balanced and high-performing multilingual capabilities. Fuxi… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  18. arXiv:2408.05705  [pdf, other

    eess.IV cs.AI cs.CV

    TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling

    Authors: Ruiquan Ge, Xiao Yu, Yifei Chen, Fan Jia, Shenghao Zhu, Guanyu Zhou, Yiyu Huang, Chenyan Zhang, Dong Zeng, Changmiao Wang, Qiegen Liu, Shanzhou Niu

    Abstract: Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

  19. arXiv:2408.00297  [pdf, other

    cs.CV

    EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

    Authors: Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu

    Abstract: We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations,… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  20. arXiv:2408.00296  [pdf, other

    cs.CV

    Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

    Authors: Yuxiao He, Yiyu Zhuang, Yanwen Wang, Yao Yao, Siyu Zhu, Xiaoyu Li, Qi Zhang, Xun Cao, Hao Zhu

    Abstract: Creating a 360° parametric model of a human head is a very challenging task. While recent advancements have demonstrated the efficacy of leveraging synthetic data for building such parametric head models, their performance remains inadequate in crucial areas such as expression-driven animation, hairstyle editing, and text-based modifications. In this paper, we build a dataset of artist-designed hi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  21. arXiv:2407.21705  [pdf, other

    cs.CV

    Tora: Trajectory-oriented Diffusion Transformer for Video Generation

    Authors: Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang

    Abstract: Recent advancements in Diffusion Transformer (DiT) have demonstrated remarkable proficiency in producing high-quality video content. Nonetheless, the potential of transformer-based diffusion models for effectively generating videos with controllable motion remains an area of limited exploration. This paper introduces Tora, the first trajectory-oriented DiT framework that concurrently integrates te… ▽ More

    Submitted 27 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  22. arXiv:2407.19456  [pdf, other

    cs.MM

    An Inverse Partial Optimal Transport Framework for Music-guided Movie Trailer Generation

    Authors: Yutong Wang, Sidan Zhu, Hongteng Xu, Dixin Luo

    Abstract: Trailer generation is a challenging video clipping task that aims to select highlighting shots from long videos like movies and re-organize them in an attractive way. In this study, we propose an inverse partial optimal transport (IPOT) framework to achieve music-guided movie trailer generation. In particular, we formulate the trailer generation task as selecting and sorting key movie shots based… ▽ More

    Submitted 30 July, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: acmmm2024

  23. arXiv:2407.19166  [pdf, other

    cs.CV

    Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

    Authors: Shengjie Zhu, Xiaoming Liu

    Abstract: Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrate… ▽ More

    Submitted 6 August, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

  24. arXiv:2407.19154  [pdf, other

    cs.CV

    RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

    Authors: Shengjie Zhu, Girish Chandar Ganesan, Abhinav Kumar, Xiaoming Liu

    Abstract: 3D sensing is a fundamental task for Autonomous Vehicles. Its deployment often relies on aligned RGB cameras and LiDAR. Despite meticulous synchronization and calibration, systematic misalignment persists in LiDAR projected depthmap. This is due to the physical baseline distance between the two sensors. The artifact is often reflected as background LiDAR incorrectly projected onto the foreground,… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  25. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 1 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  26. arXiv:2407.17417  [pdf, other

    cs.LG

    Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

    Authors: Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis an… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 21 pages, 6 figures

  27. arXiv:2407.16412  [pdf, other

    cs.RO

    Cross Anything: General Quadruped Robot Navigation through Complex Terrains

    Authors: Shaoting Zhu, Derun Li, Yong Liu, Ningyi Xu, Hang Zhao

    Abstract: The application of vision-language models (VLMs) has achieved impressive success in various robotics tasks, but there are few explorations for foundation models used in quadruped robot navigation. We introduce Cross Anything System (CAS), an innovative system composed of a high-level reasoning module and a low-level control policy, enabling the robot to navigate across complex 3D terrains and reac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  28. arXiv:2407.16397  [pdf, other

    cs.LG cs.AI

    On ADMM in Heterogeneous Federated Learning: Personalization, Robustness, and Fairness

    Authors: Shengkun Zhu, Jinshan Zeng, Sheng Wang, Yuan Sun, Xiaodong Li, Yuan Yao, Zhiyong Peng

    Abstract: Statistical heterogeneity is a root cause of tension among accuracy, fairness, and robustness of federated learning (FL), and is key in paving a path forward. Personalized FL (PFL) is an approach that aims to reduce the impact of statistical heterogeneity by developing personalized models for individual users, while also inherently providing benefits in terms of fairness and robustness. However, e… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.06756

  29. arXiv:2407.15719  [pdf, other

    cs.CV cs.AI

    GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

    Authors: Zhaojie Fang, Shenghao Zhu, Yifei Chen, Binfeng Zou, Fan Jia, Linwei Qiu, Chang Liu, Yiyu Huang, Xiang Feng, Feiwei Qin, Changmiao Wang, Yeru Wang, Jin Fan, Changbiao Chu, Wan-Zhen Wu, Hu Zhao

    Abstract: Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that often progresses from Mild Cognitive Impairment (MCI), leading to memory loss and significantly impacting patients' lives. Clinical trials indicate that early targeted interventions for MCI patients can potentially slow or halt the development and progression of AD. Previous research has shown that accurate medical classif… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 35 pages, 4 figures

  30. arXiv:2407.15341  [pdf, other

    cs.CL

    ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning

    Authors: Senbin Zhu, Hanjie Zhao, Xingren Wang, Shanhong Liu, Yuxiang Jia, Hongying Zan

    Abstract: The DimABSA task requires fine-grained sentiment intensity prediction for restaurant reviews, including scores for Valence and Arousal dimensions for each Aspect Term. In this study, we propose a Coarse-to-Fine In-context Learning(CFICL) method based on the Baichuan2-7B model for the DimABSA task in the SIGHAN 2024 workshop. Our method improves prediction accuracy through a two-stage optimization… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  31. arXiv:2407.12687  [pdf, other

    cs.CY cs.AI cs.LG

    Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

    Authors: Irina Jurenka, Markus Kunesch, Kevin R. McKee, Daniel Gillick, Shaojian Zhu, Sara Wiltberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhoopchand, Ankit Anand, Miruna Pîslar, Stephanie Chan, Lisa Wang, Jennifer She, Parsa Mahmoudieh, Aliya Rysbek, Wei-Jen Ko, Andrea Huber, Brett Wiltshire, Gal Elidan, Roni Rabin, Jasmin Rubinovitz, Amit Pitaru, Mac McAllister , et al. (49 additional authors not shown)

    Abstract: A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily… ▽ More

    Submitted 19 July, 2024; v1 submitted 21 May, 2024; originally announced July 2024.

  32. arXiv:2407.11522  [pdf, other

    cs.CV

    FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

    Authors: Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li

    Abstract: Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data c… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  33. arXiv:2407.11343  [pdf, other

    cs.CV

    Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering

    Authors: Jingqian Wu, Shuo Zhu, Chutian Wang, Edmund Y. Lam

    Abstract: Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  34. arXiv:2407.09157  [pdf, other

    cs.IR cs.AI cs.LG

    Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion

    Authors: Linhan Xia, Yicheng Yang, Ziou Chen, Zheng Yang, Shengxin Zhu

    Abstract: Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  35. arXiv:2407.05625  [pdf, other

    stat.ME cs.LG

    New User Event Prediction Through the Lens of Causal Inference

    Authors: Henry Shaowu Yuchi, Shixiang Zhu, Li Dong, Yigit M. Arisoy, Matthew C. Spencer

    Abstract: Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  37. arXiv:2407.01929  [pdf, other

    cs.CL cs.AI

    What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language Models

    Authors: Shengqi Zhu, Jeffrey M. Rzeszotarski

    Abstract: The term Language Models (LMs), as a time-specific collection of models of interest, is constantly reinvented, with its referents updated much like the $\textit{Ship of Theseus}$ replaces its parts but remains the same ship in essence. In this paper, we investigate this $\textit{Ship of Language Models}$ problem, wherein scientific evolution takes the form of continuous, implicit retrofits of key… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  38. arXiv:2406.16557  [pdf, other

    cs.LG cs.CY

    Efficient k-means with Individual Fairness via Exponential Tilting

    Authors: Shengkun Zhu, Jinshan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng

    Abstract: In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve indivi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  39. arXiv:2406.16294  [pdf, other

    cs.CL cs.AI

    LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments

    Authors: Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, Zilong Zheng

    Abstract: Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 represen… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  40. arXiv:2406.16028  [pdf, other

    cs.LG cs.AI

    TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing

    Authors: Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

    Abstract: In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion… ▽ More

    Submitted 15 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  41. arXiv:2406.10802  [pdf, other

    cs.CL cs.AI

    KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

    Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

    Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  42. arXiv:2406.10484  [pdf, other

    cs.CV

    Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

    Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

    Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  43. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  44. arXiv:2406.09693  [pdf, other

    cs.CV eess.IV

    Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

    Authors: Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng

    Abstract: In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames. The proposed model consists of the intra-group feature alignment (IntraGFA) module, the inter-group feature fusion (InterGFF) module, and the feature enhancement (FE) module. We form the group of pictures (GoP) by selecting fr… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  45. arXiv:2406.08801  [pdf, other

    cs.CV

    Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

    Authors: Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu, Jingdong Wang, Yao Yao, Siyu Zhu

    Abstract: The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 20 pages

  46. arXiv:2406.08002  [pdf, other

    cs.AI cs.MA

    Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

    Authors: Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng

    Abstract: Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred in… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  47. arXiv:2406.07151  [pdf, other

    cs.MM cs.AI cs.IR

    EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels

    Authors: Shuqi Zhu, Ziyi Ye, Qingyao Ai, Yiqun Liu

    Abstract: Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  48. arXiv:2406.07081  [pdf, other

    cs.CL

    Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

    Authors: Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 long paper (Findings)

  49. arXiv:2406.06962  [pdf, other

    cs.CL cs.AI

    Evolving Subnetwork Training for Large Language Models

    Authors: Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu

    Abstract: Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  50. Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: As a sub-field of object detection, moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily rely on the features extracted only from spatio-temporal domain. Frequency domain has hardly been concerned yet, although it has been widely applied in image processing. To extend feature sourc… ▽ More

    Submitted 5 September, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has accepted IEEE TGRS

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing 2024