Skip to main content

Showing 1–50 of 869 results for author: Lin, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02969  [pdf, other

    cs.MS cs.LG math.OC

    LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

    Authors: Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Zhenkun Wang, Han Zhao, Qingfu Zhang

    Abstract: Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultane… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02425  [pdf

    cs.IR cs.LG

    Deep Adaptive Interest Network: Personalized Recommendation with Context-Aware Learning

    Authors: Shuaishuai Huang, Haowei Yang, You Yao, Xueting Lin, Yuming Tu

    Abstract: In personalized recommendation systems, accurately capturing users' evolving interests and combining them with contextual information is a critical research area. This paper proposes a novel model called the Deep Adaptive Interest Network (DAIN), which dynamically models users' interests while incorporating context-aware learning mechanisms to achieve precise and adaptive personalized recommendati… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.01641  [pdf, other

    cs.CV

    Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement

    Authors: Kun Zhou, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai, Zhonghang Liu, Xiaoguang Han, Jiangbo Lu

    Abstract: Previous low-light image enhancement (LLIE) approaches, while employing frequency decomposition techniques to address the intertwined challenges of low frequency (e.g., illumination recovery) and high frequency (e.g., noise reduction), primarily focused on the development of dedicated and complex networks to achieve improved performance. In contrast, we reveal that an advanced disentanglement para… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV 2024, Github \url{https://github.com/redrock303/ADF-LLIE}

  4. arXiv:2409.00262  [pdf, other

    cs.CL

    DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity

    Authors: Xiaoyu Lin, Xinkai Yu, Ankit Aich, Salvatore Giorgi, Lyle Ungar

    Abstract: Large Language Models (LLMs), which simulate human users, are frequently employed to evaluate chatbots in applications such as tutoring and customer service. Effective evaluation necessitates a high degree of human-like diversity within these simulations. In this paper, we demonstrate that conversations generated by GPT-4o mini, when used as simulated human participants, systematically differ from… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  5. arXiv:2408.17347  [pdf, other

    cs.CV

    LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation

    Authors: Shuyi Ouyang, Jinyang Zhang, Xiangye Lin, Xilai Wang, Qingqing Chen, Yen-Wei Chen, Lanfen Lin

    Abstract: Conventional medical image segmentation methods have been found inadequate in facilitating physicians with the identification of specific lesions for diagnosis and treatment. Given the utility of text as an instructional format, we introduce a novel task termed Medical Image Referring Segmentation (MIRS), which requires segmenting specified lesions in images based on the given language expressions… ▽ More

    Submitted 2 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures

    ACM Class: I.4.6

  6. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  7. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  8. arXiv:2408.10883  [pdf, other

    cs.AI cs.CV

    DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

    Authors: Xinqi Su, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Zitong Yu

    Abstract: In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  9. arXiv:2408.10381  [pdf, other

    stat.ML cs.AI cs.LG

    Efficient Reinforcement Learning in Probabilistic Reward Machines

    Authors: Xiaofeng Lin, Xuezhou Zhang

    Abstract: In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of $\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt{T})$, where $H$ is the time horizon, $O$ is the number of observations, $A$ is the number of actions… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 33 pages, 4 figures

  10. arXiv:2408.09723  [pdf, other

    cs.LG

    sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

    Authors: Jiaheng Yin, Zhengxin Shi, Jianshen Zhang, Xiaomin Lin, Yulin Huang, Yongzhi Qi, Wei Qi

    Abstract: In recent years, numerous Transformer-based models have been applied to long-term time-series forecasting (LTSF) tasks. However, recent studies with linear models have questioned their effectiveness, demonstrating that simple linear layers can outperform sophisticated Transformer-based models. In this work, we review and categorize existing Transformer-based models into two main types: (1) modific… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  11. arXiv:2408.09265  [pdf, other

    cs.CR cs.LG cs.NI eess.SY

    ByCAN: Reverse Engineering Controller Area Network (CAN) Messages from Bit to Byte Level

    Authors: Xiaojie Lin, Baihe Ma, Xu Wang, Guangsheng Yu, Ying He, Ren Ping Liu, Wei Ni

    Abstract: As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive unde… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accept by IEEE Internet of Things Journal, 15 pages, 5 figures, 6 tables

  12. arXiv:2408.09241  [pdf, other

    cs.CV eess.IV

    Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration

    Authors: Xin Lin, Yuyan Zhou, Jingtong Yue, Chao Ren, Kelvin C. K. Chan, Lu Qi, Ming-Hsuan Yang

    Abstract: Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: This paper is an extended and revised version of our previous work "Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches"(https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_Unsupervised_Image_Denoising_in_Real-World_Scenarios_via_Self-Collaboration_Parallel_Generative_ICCV_2023_paper.pdf)

  13. arXiv:2408.09031  [pdf

    cs.NI

    A Primer on Generative AI for Telecom: From Theory to Practice

    Authors: Xingqin Lin, Lopamudra Kundu, Chris Dick, Maria Amparo Canaveras Galdon, Janaki Vamaraju, Swastika Dutta, Vinay Raman

    Abstract: The rise of generative artificial intelligence (GenAI) is transforming the telecom industry. GenAI models, particularly large language models (LLMs), have emerged as powerful tools capable of driving innovation, improving efficiency, and delivering superior customer services in telecom. This paper provides an overview of GenAI for telecom from theory to practice. We review GenAI models and discuss… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 7 pages, 6 figures, submitted for possible publication

  14. arXiv:2408.08623  [pdf, other

    cs.CV cs.AI

    SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis

    Authors: Xingyue Lin, Xingjian Hu, Shuai Peng, Jianhua Zhu, Liangcai Gao

    Abstract: Sketch, a powerful artistic technique to capture essential visual information about real-world objects, is increasingly gaining attention in the image synthesis field. However, evaluating the quality of synthesized sketches presents unique unsolved challenges. Current evaluation methods for sketch synthesis are inadequate due to the lack of a unified benchmark dataset, over-reliance on classificat… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  15. arXiv:2408.06037  [pdf, other

    cs.SE

    Hyperion: Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic Execution

    Authors: Shuo Yang, Xingwei Lin, Jiachi Chen, Qingyuan Zhong, Lei Xiao, Renke Huang, Yanlin Wang, Zibin Zheng

    Abstract: The rapid advancement of blockchain platforms has significantly accelerated the growth of decentralized applications (DApps). Similar to traditional applications, DApps integrate front-end descriptions that showcase their features to attract users, and back-end smart contracts for executing their business logic. However, inconsistencies between the features promoted in front-end descriptions and t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by ICSE 2025

  16. arXiv:2408.05432  [pdf, other

    cs.DB

    Simpler is More: Efficient Top-K Nearest Neighbors Search on Large Road Networks

    Authors: Yiqi Wang, Long Yuan, Wenjie Zhang, Xuemin Lin, Zi Chen, Qing Liu

    Abstract: Top-k Nearest Neighbors (kNN) problem on road network has numerous applications on location-based services. As direct search using the Dijkstra's algorithm results in a large search space, a plethora of complex-index-based approaches have been proposed to speedup the query processing. However, even with the current state-of-the-art approach, long query processing delays persist, along with signifi… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 15 pages, 15 figures

  17. arXiv:2408.03005  [pdf, other

    cs.DB

    Automatic String Data Validation with Pattern Discovery

    Authors: Xinwei Lin, Jing Zhao, Peng Di, Chuan Xiao, Rui Mao, Yan Ji, Makoto Onizuka, Zishuo Ding, Weiyi Shang, Jianbin Qin

    Abstract: In enterprise data pipelines, data insertions occur periodically and may impact downstream services if data quality issues are not addressed. Typically, such problems can be investigated and fixed by on-call engineers, but locating the cause of such problems and fixing errors are often time-consuming. Therefore, automatic data validation is a better solution to defend the system and downstream ser… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  18. arXiv:2408.01808  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features

    Authors: Peng Cheng, Yuwei Wang, Peng Huang, Zhongjie Ba, Xiaodong Lin, Feng Lin, Li Lu, Kui Ren

    Abstract: Extensive research has revealed that adversarial examples (AE) pose a significant threat to voice-controllable smart devices. Recent studies have proposed black-box adversarial attacks that require only the final transcription from an automatic speech recognition (ASR) system. However, these attacks typically involve many queries to the ASR, resulting in substantial costs. Moreover, AE-based adver… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Published in the 2024 IEEE Symposium on Security and Privacy (SP)

  19. arXiv:2408.00793  [pdf

    physics.chem-ph cs.LG

    From 2015 to 2023: How Machine Learning Aids Natural Product Analysis

    Authors: Suwen Shi, Ziwei Huang, Xingxin Gu, Xu Lin, Chaoying Zhong, Junjie Hang, Jianli Lin, Claire Chenwen Zhong, Lin Zhang, Yu Li, Junjie Huang

    Abstract: In recent years, conventional chemistry techniques have faced significant challenges due to their inherent limitations, struggling to cope with the increasing complexity and volume of data generated in contemporary research endeavors. Computational methodologies represent robust tools in the field of chemistry, offering the capacity to harness potent machine-learning models to yield insightful ana… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 19 pages, 4 figures

  20. arXiv:2407.21770  [pdf, other

    cs.AI cs.LG

    MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

    Authors: Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srinivasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettlemoyer, Armen Aghajanyan

    Abstract: We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adap… ▽ More

    Submitted 12 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: v2 -> update related work section v3 -> fix spelling

  21. arXiv:2407.20906  [pdf, other

    cs.CL cs.AI physics.data-an

    Automated Review Generation Method Based on Large Language Models

    Authors: Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Zhi-Jian Zhao, Jinlong Gong

    Abstract: Literature research, vital for scientific advancement, is overwhelmed by the vast ocean of available information. Addressing this, we propose an automated review generation method based on Large Language Models (LLMs) to streamline literature processing and reduce cognitive load. In case study on propane dehydrogenation (PDH) catalysts, our method swiftly generated comprehensive reviews from 343 a… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures, 3 tables

  22. arXiv:2407.20141  [pdf, other

    cs.CV

    DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models

    Authors: Jing Yang, Runping Xi, Yingxin Lai, Xun Lin, Zitong Yu

    Abstract: Diffusion-based personalized visual content generation technologies have achieved significant breakthroughs, allowing for the creation of specific objects by just learning from a few reference photos. However, when misused to fabricate fake news or unsettling content targeting individuals, these technologies could cause considerable societal harm. To address this problem, current methods generate… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCB 2024

  23. arXiv:2407.18209  [pdf, other

    cs.ET cs.AR

    SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits

    Authors: Yanyue Xie, Peiyan Dong, Geng Yuan, Zhengang Li, Masoud Zabihi, Chao Wu, Sung-En Chang, Xufeng Zhang, Xue Lin, Caiwen Ding, Nobuyuki Yoshikawa, Olivia Chen, Yanzhi Wang

    Abstract: Superconducting circuits, like Adiabatic Quantum-Flux-Parametron (AQFP), offer exceptional energy efficiency but face challenges in physical design due to sophisticated spacing and timing constraints. Current design tools often neglect the importance of constraint adherence throughout the entire design flow. In this paper, we propose SuperFlow, a fully-customized RTL-to-GDS design flow tailored fo… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by DATE 2024

  24. arXiv:2407.18175  [pdf, other

    cs.LG cs.AI cs.CV

    Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

    Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICS 2024

  25. arXiv:2407.17678  [pdf, other

    cs.CL

    Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

    Authors: Xihui Lin, Yunan Zhang, Suyu Ge, Barun Patra, Vishrav Chaudhary, Hao Peng, Xia Song

    Abstract: Existing LLM training and inference frameworks struggle in boosting efficiency with sparsity while maintaining the integrity of context and model architecture. Inspired by the sharding concept in database and the fact that attention parallelizes over heads on accelerators, we propose Sparsely-Sharded (S2) Attention, an attention algorithm that allocates heterogeneous context partitions for differe… ▽ More

    Submitted 27 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages

  26. arXiv:2407.17112  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Dueling Bandits

    Authors: Arun Verma, Zhongxiang Dai, Xiaoqiang Lin, Patrick Jaillet, Bryan Kian Hsiang Low

    Abstract: Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web searc… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Workshop on Foundations of Reinforcement Learning and Control

  27. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  28. arXiv:2407.15229  [pdf, other

    cs.CL cs.AI

    The Hitchhiker's Guide to Human Alignment with *PO

    Authors: Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song

    Abstract: With the growing utilization of large language models (LLMs) across domains, alignment towards human preferences has become one of the most critical aspects of training models. At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). However, prior research has often concentrated on identifying the best-performing method, typically involving a grid se… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages

  29. arXiv:2407.11588  [pdf, other

    cs.CV

    Progressive Pretext Task Learning for Human Trajectory Prediction

    Authors: Xiaotong Lin, Tianming Liang, Jianhuang Lai, Jian-Fang Hu

    Abstract: Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in huma… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  30. arXiv:2407.10873  [pdf, other

    cs.NE cs.AI

    Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models

    Authors: Rui Zhang, Fei Liu, Xi Lin, Zhenkun Wang, Zhichao Lu, Qingfu Zhang

    Abstract: Automated heuristic design (AHD) has gained considerable attention for its potential to automate the development of effective heuristics. The recent advent of large language models (LLMs) has paved a new avenue for AHD, with initial efforts focusing on framing AHD as an evolutionary program search (EPS) problem. However, inconsistent benchmark settings, inadequate baselines, and a lack of detailed… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by the 18th International Conference on Parallel Problem Solving From Nature (PPSN 2024)

  31. arXiv:2407.10548  [pdf, other

    cs.IT

    Fluid Antenna Multiple Access Assisted Integrated Data and Energy Transfer: Outage and Multiplexing Gain Analysis

    Authors: Xiao Lin, Yizhe Zhao, Halvin Yang, Jie Hu, Kai-Kit Wong

    Abstract: Fluid antenna multiple access (FAMA) exploits the spatial opportunities in wireless channels to overcome multiuser interference by position (a.k.a.~port) switching, which can achieve better performance compared to traditional fixed multiple-input multiple-output (MIMO) systems. Additionally, integrated data and energy transfer (IDET) is capable of providing both wireless data transfer (WDT) and wi… ▽ More

    Submitted 1 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  32. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  33. arXiv:2407.05784  [pdf, other

    cs.AR

    Hecaton: Training and Finetuning Large Language Models with Scalable Chiplet Systems

    Authors: Zongle Huang, Shupei Fan, Chen Tang, Xinyuan Lin, Shuwen Deng, Yongpan Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success in various fields, but their training and finetuning require massive computation and memory, necessitating parallelism which introduces heavy communication overheads. Driven by advances in packaging, the chiplet architecture emerges as a potential solution, as it can integrate computing power, as well as utilize on-package links with be… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.03954  [pdf, other

    cs.DB

    Efficient Maximal Frequent Group Enumeration in Temporal Bipartite Graphs

    Authors: Yanping Wu, Renjie Sun, Xiaoyang Wang, Dong Wen, Ying Zhang, Lu Qin, Xuemin Lin

    Abstract: Cohesive subgraph mining is a fundamental problem in bipartite graph analysis. In reality, relationships between two types of entities often occur at some specific timestamps, which can be modeled as a temporal bipartite graph. However, the temporal information is widely neglected by previous studies. Moreover, directly extending the existing models may fail to find some critical groups in tempora… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  35. arXiv:2407.02744  [pdf, other

    eess.IV cs.CV

    Highly Accelerated MRI via Implicit Neural Representation Guided Posterior Sampling of Diffusion Models

    Authors: Jiayue Chu, Chenhe Du, Xiyue Lin, Yuyao Zhang, Hongjiang Wei

    Abstract: Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and u… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  36. arXiv:2407.02586  [pdf, ps, other

    cs.CV

    Improving Visual Storytelling with Multimodal Large Language Models

    Authors: Xiaochuan Lin, Xiangyong Chen

    Abstract: Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the complexity of aligning visual and textual information. This paper presents a novel approach leveraging large language models (LLMs) and large vision-language m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages

  37. arXiv:2407.01917  [pdf, other

    cs.NI cs.CR cs.DC

    Securing Distributed Network Digital Twin Systems Against Model Poisoning Attacks

    Authors: Zifan Zhang, Minghong Fang, Mingzhe Chen, Gaolei Li, Xi Lin, Yuchen Liu

    Abstract: In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by Internet of Things Journal (IoT-J). arXiv admin note: substantial text overlap with arXiv:2404.14389

  38. arXiv:2407.01026  [pdf, other

    cs.CL cs.AI

    Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

    Authors: Xiangyu Lin, Weijia Jia, Zhiguo Gong

    Abstract: Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilizati… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  39. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  40. arXiv:2406.19859  [pdf, other

    cs.AI cs.HC cs.MM

    MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

    Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

    Abstract: MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. At the core of this framework lies a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively enable the creation of customized WordArt, ranging from semantic enhancements to the imposition… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 16 figures, Project: https://modelscope.cn/studios/WordArt/WordArt

  41. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  42. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  43. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 August, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  44. arXiv:2406.15906  [pdf, other

    cs.NI cs.AI

    OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization

    Authors: Siyuan Li, Xi Lin, Yaju Liu, Gaolei Li, Jianhua Li

    Abstract: Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated pol… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM SIGCOMM 2024 Workshop on Hot Topics in Optical Technologies and Applications in Networking

  45. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  46. arXiv:2406.14408  [pdf, other

    cs.AI cs.CL cs.LG

    FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

    Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as anoth… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  47. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  48. arXiv:2406.13939  [pdf, other

    cs.CV

    2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu

    Abstract: Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task focuses more on the motion in video content for language-guided video object segmentation, requiring an enhanced ability to model longer temporal, moti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  49. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  50. arXiv:2406.10923  [pdf, other

    cs.CV cs.CL cs.LG

    Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

    Authors: Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

    Abstract: Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reaso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Project page: https://ander1119.github.io/TiM