Skip to main content

Showing 1–50 of 2,761 results for author: Liu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03164  [pdf, other

    cs.LG cs.GR

    A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

    Authors: Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

    Abstract: The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 15 pages, 10 figures

  2. arXiv:2409.02097  [pdf, other

    cs.CV cs.LG

    LinFusion: 1 GPU, 1 Minute, 16K Image

    Authors: Songhua Liu, Weihao Yu, Zhenxiong Tan, Xinchao Wang

    Abstract: Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance. However, this existing paradigm faces significant challenges in generating high-resolution visual content due to its quadratic time and memory complexity with respect to the… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: Work in Progress. Codes are available at https://github.com/Huage001/LinFusion

  3. arXiv:2409.01212  [pdf, other

    cs.CV

    MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation

    Authors: Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li

    Abstract: With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational comp… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV Workshop 2024

  4. arXiv:2408.17165  [pdf, other

    cs.LG cs.DS stat.ML

    Efficient Testable Learning of General Halfspaces with Adversarial Label Noise

    Authors: Ilias Diakonikolas, Daniel M. Kane, Sihan Liu, Nikos Zarifis

    Abstract: We study the task of testable learning of general -- not necessarily homogeneous -- halfspaces with adversarial label noise with respect to the Gaussian distribution. In the testable learning framework, the goal is to develop a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data.Our main result is the first polynomial time tester-… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Presented to COLT'24

    Journal ref: "Testable Learning of General Halfspaces with Adversarial Label Noise." In The Thirty Seventh Annual Conference on Learning Theory, pp. 1308-1335. PMLR, 2024

  5. arXiv:2408.16849  [pdf

    cs.LG cs.DC

    Machine Learning-Based Research on the Adaptability of Adolescents to Online Education

    Authors: Mingwei Wang, Sitong Liu

    Abstract: With the rapid advancement of internet technology, the adaptability of adolescents to online learning has emerged as a focal point of interest within the educational sphere. However, the academic community's efforts to develop predictive models for adolescent online learning adaptability require further refinement and expansion. Utilizing data from the "Chinese Adolescent Online Education Survey"… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  6. arXiv:2408.16236  [pdf, other

    cs.CV

    Neural Spectral Decomposition for Dataset Distillation

    Authors: Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu

    Abstract: In this paper, we propose Neural Spectrum Decomposition, a generic decomposition framework for dataset distillation. Unlike previous methods, we consider the entire dataset as a high-dimensional observation that is low-rank across all dimensions. We aim to discover the low-rank representation of the entire dataset and perform distillation efficiently. Toward this end, we learn a set of spectrum te… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  7. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.15881  [pdf, other

    cs.CV

    LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

    Authors: Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

    Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  9. arXiv:2408.15876  [pdf, other

    cs.CV

    Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

    Authors: Shaofei Huang, Rui Ling, Hongyu Li, Tianrui Hui, Zongheng Tang, Xiaoming Wei, Jizhong Han, Si Liu

    Abstract: In this paper, we propose an Audio-Language-Referenced SAM 2 (AL-Ref-SAM 2) pipeline to explore the training-free paradigm for audio and language-referenced video object segmentation, namely AVS and RVOS tasks. The intuitive solution leverages GroundingDINO to identify the target object from a single frame and SAM 2 to segment the identified object throughout the video, which is less robust to spa… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  10. arXiv:2408.15829  [pdf, other

    cs.CV

    SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

    Authors: Sicheng Liu, Lintao Wang, Xiaogan Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

    Abstract: Extreme Multimodal Summarization with Multimodal Output (XMSMO) becomes an attractive summarization approach by integrating various types of information to create extremely concise yet informative summaries for individual modalities. Existing methods overlook the issue that multimodal data often contains more topic irrelevant information, which can mislead the model into producing inaccurate summa… ▽ More

    Submitted 28 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures, submitted to ACM Multimedia Asia 2024

    ACM Class: I.2.10

  11. arXiv:2408.15101  [pdf, other

    cs.CV cs.AI

    MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.02228

  12. arXiv:2408.14757  [pdf, other

    cs.CV cs.LG

    Learning effective pruning at initialization from iterative pruning

    Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

    Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  13. arXiv:2408.14717  [pdf, other

    cs.DB cs.AI

    Text2SQL is Not Enough: Unifying AI and Databases with TAG

    Authors: Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia

    Abstract: AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitrary natural language questions over custom data so… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  14. Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

    Authors: Ziheng Wang, Pedro Reviriego, Farzad Niknia, Javier Conde, Shanshan Liu, Fabrizio Lombardi

    Abstract: The implementation of machine learning in Internet of Things devices poses significant operational challenges due to limited energy and computation resources. In recent years, significant efforts have been made to implement simplified ML models that can achieve reasonable performance while reducing computation and energy, for example by pruning weights in neural networks, or using reduced precisio… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Journal ref: IEEE Internet of Things Journal 2023 Volume:11, Issue:8

  15. arXiv:2408.14492  [pdf, other

    cs.LG

    Evolvable Psychology Informed Neural Network for Memory Behavior Modeling

    Authors: Xiaoxuan Shen, Zhihai Hu, Qirong Chen, Shengyingjie Liu, Ruxia Liang, Jianwen Sun

    Abstract: Memory behavior modeling is a core issue in cognitive psychology and education. Classical psychological theories typically use memory equations to describe memory behavior, which exhibits insufficient accuracy and controversy, while data-driven memory modeling methods often require large amounts of training data and lack interpretability. Knowledge-informed neural network models have shown excelle… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  16. arXiv:2408.14060  [pdf

    cs.CV

    Evaluating the Visual Similarity of Southwest China's Ethnic Minority Brocade Based on Deep Learning

    Authors: Shichen Liu, Huaxing Lu

    Abstract: This paper employs deep learning methods to investigate the visual similarity of ethnic minority patterns in Southwest China. A customized SResNet-18 network was developed, achieving an accuracy of 98.7% on the test set, outperforming ResNet-18, VGGNet-16, and AlexNet. The extracted feature vectors from SResNet-18 were evaluated using three metrics: cosine similarity, Euclidean distance, and Manha… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 8 pages,2tables,5 figures

  17. arXiv:2408.14032  [pdf, other

    cs.CV

    More Pictures Say More: Visual Intersection Network for Open Set Object Detection

    Authors: Bingcheng Dong, Yuning Ding, Jinrong Zhang, Sifan Zhang, Shenglan Liu

    Abstract: Open Set Object Detection has seen rapid development recently, but it continues to pose significant challenges. Language-based methods, grappling with the substantial modal disparity between textual and visual modalities, require extensive computational resources to bridge this gap. Although integrating visual prompts into these frameworks shows promise for enhancing performance, it always comes w… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 7pages

  18. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  19. arXiv:2408.13742  [pdf, other

    cs.RO

    Multi-modal Integrated Prediction and Decision-making with Adaptive Interaction Modality Explorations

    Authors: Tong Li, Lu Zhang, Sikang Liu, Shaojie Shen

    Abstract: Navigating dense and dynamic environments poses a significant challenge for autonomous driving systems, owing to the intricate nature of multimodal interaction, wherein the actions of various traffic participants and the autonomous vehicle are complex and implicitly coupled. In this paper, we propose a novel framework, Multi-modal Integrated predictioN and Decision-making (MIND), which addresses t… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  20. arXiv:2408.12672  [pdf

    cs.DC cs.CV

    Research on Improved U-net Based Remote Sensing Image Segmentation Algorithm

    Authors: Qiming Yang, Zixin Wang, Shinan Liu, Zizheng Li

    Abstract: In recent years, although U-Net network has made significant progress in the field of image segmentation, it still faces performance bottlenecks in remote sensing image segmentation. In this paper, we innovatively propose to introduce SimAM and CBAM attention mechanism in U-Net, and the experimental results show that after adding SimAM and CBAM modules alone, the model improves 17.41% and 12.23% i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  21. arXiv:2408.12470  [pdf, other

    cs.IR

    DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems

    Authors: Jiaju Chen, Chongming Gao, Shuai Yuan, Shuchang Liu, Qingpeng Cai, Peng Jiang

    Abstract: The integration of Large Language Models (LLMs) into recommender systems has led to substantial performance improvements. However, this often comes at the cost of diminished recommendation diversity, which can negatively impact user satisfaction. To address this issue, controllable recommendation has emerged as a promising approach, allowing users to specify their preferences and receive recommend… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  22. arXiv:2408.12425  [pdf, other

    eess.AS cs.LG cs.SD

    Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

    Authors: Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the com… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted to Interspeech 2024

  23. arXiv:2408.12169  [pdf, other

    cs.HC

    ReorderBench: A Benchmark for Matrix Reordering

    Authors: Jiangning Zhu, Zheng Wang, Zhiyang Shen, Lai Wei, Fengyuan Tian, Mengchen Liu, Shixia Liu

    Abstract: Matrix reordering permutes the rows and columns of a matrix to reveal meaningful visual patterns, such as blocks that represent clusters. A comprehensive collection of matrices, along with a scoring method for measuring the quality of visual patterns in these matrices, contributes to building a benchmark. This benchmark is essential for selecting or designing suitable reordering algorithms for spe… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Submitted to IEEE TVCG

  24. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  25. arXiv:2408.11540  [pdf, other

    cs.CV

    DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments

    Authors: Shuhong Liu, Xiang Chen, Hongming Chen, Quanfeng Xu, Mingrui Li

    Abstract: Reconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction i… ▽ More

    Submitted 21 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  26. arXiv:2408.10230  [pdf, other

    cs.AR cs.AI cs.CL cs.HC cs.RO

    A General-Purpose Device for Interaction with LLMs

    Authors: Jiajun Xu, Qun Wang, Yuhang Cao, Baitao Zeng, Sicheng Liu

    Abstract: This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware.… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  27. arXiv:2408.10005  [pdf, ps, other

    cs.IT

    Optimal Few-GHW Linear Codes and Their Subcode Support Weight Distributions

    Authors: Xu Pan, Hao Chen, Hongwei Liu, Shengwei Liu

    Abstract: Few-weight codes have been constructed and studied for many years, since their fascinating relations to finite geometries, strongly regular graphs and Boolean functions. Simplex codes are one-weight Griesmer $[\frac{q^k-1}{q-1},k ,q^{k-1}]_q$-linear codes and they meet all Griesmer bounds of the generalized Hamming weights of linear codes. All the subcodes with dimension $r$ of a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  28. arXiv:2408.09709  [pdf, other

    cs.CV

    Dataset Distillation for Histopathology Image Classification

    Authors: Cong Cong, Shiyu Xuan, Sidong Liu, Maurice Pagnucco, Shiliang Zhang, Yang Song

    Abstract: Deep neural networks (DNNs) have exhibited remarkable success in the field of histopathology image analysis. On the other hand, the contemporary trend of employing large models and extensive datasets has underscored the significance of dataset distillation, which involves compressing large-scale datasets into a condensed set of synthetic samples, offering distinct advantages in improving training… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  29. arXiv:2408.09706  [pdf, other

    cs.CV

    MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model

    Authors: Xinyang Wang, Yi Yang, Minfeng Zhu, Kecheng Zheng, Shi Liu, Wei Chen

    Abstract: Recent advancements in pre-trained Vision-Language Models (VLMs) have highlighted the significant potential of prompt tuning for adapting these models to a wide range of downstream tasks. However, existing prompt tuning methods typically map an image to a single representation, limiting the model's ability to capture the diverse ways an image can be described. To address this limitation, we invest… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  30. arXiv:2408.09554  [pdf, other

    q-bio.QM cs.CV eess.IV

    Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images

    Authors: Yi Kan Wang, Ludmila Tydlitatova, Jeremy D. Kunz, Gerard Oakley, Ran A. Godrich, Matthew C. H. Lee, Chad Vanderbilt, Razik Yousfi, Thomas Fuchs, David S. Klimstra, Siqi Liu

    Abstract: Many molecular alterations serve as clinically prognostic or therapy-predictive biomarkers, typically detected using single or multi-gene molecular assays. However, these assays are expensive, tissue destructive and often take weeks to complete. Using AI on routine H&E WSIs offers a fast and economical approach to screen for multiple molecular biomarkers. We present a high-throughput AI-based syst… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  31. arXiv:2408.08926  [pdf, other

    cs.CR cs.AI cs.CL cs.CY cs.LG

    Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

    Authors: Andy K. Zhang, Neil Perry, Riya Dulepet, Eliot Jones, Justin W. Lin, Joey Ji, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh , et al. (2 additional authors not shown)

    Abstract: Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetrat… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 86 pages, 7 figures

  32. arXiv:2408.08877  [pdf

    cs.DL

    Hotspots and Trends in Magnetoencephalography Research (2013-2022): A Bibliometric Analysis

    Authors: Shen Liu, Jingwen Zhao

    Abstract: This study aimed to utilize bibliometric methods to analyze trends in international Magnetoencephalography (MEG) research from 2013 to 2022. Due to the limited volume of domestic literature on MEG, this analysis focuses solely on the global research landscape, providing insights from the past decade as a representative sample. This study utilized bibliometric methods to explore and analyze the pro… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  33. arXiv:2408.08847  [pdf, other

    eess.IV cs.CV cs.LG

    HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis

    Authors: Zhi-Bo Liu, Xiaobo Pang, Jizhao Wang, Shuai Liu, Chen Li

    Abstract: In pathological research, education, and clinical practice, the decision-making process based on pathological images is critically important. This significance extends to digital pathology image analysis: its adequacy is demonstrated by the extensive information contained within tissue structures, which is essential for accurate cancer classification and grading. Additionally, its necessity is hig… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  34. arXiv:2408.08706  [pdf, other

    cs.LG

    Efficient Multi-Policy Evaluation for Reinforcement Learning

    Authors: Shuze Liu, Yuxin Chen, Shangtong Zhang

    Abstract: To unbiasedly evaluate multiple target policies, the dominant approach among RL practitioners is to run and evaluate each target policy separately. However, this evaluation method is far from efficient because samples are not shared across policies, and running target policies to evaluate themselves is actually not optimal. In this paper, we address these two weaknesses by designing a tailored beh… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  35. arXiv:2408.08650  [pdf, other

    cs.CL

    An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

    Authors: Peiming Guo, Sinuo Liu, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

    Abstract: Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a text generation model, and an image generation model to handle this complex multi-modal task. However, representing the images with text captions may l… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Work in progress

  36. arXiv:2408.08554  [pdf, other

    cs.LG

    ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

    Authors: Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quan… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  37. arXiv:2408.08506  [pdf, other

    cs.CL cs.AI

    Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

    Authors: Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen

    Abstract: Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and even… ▽ More

    Submitted 1 September, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  38. arXiv:2408.08201  [pdf, other

    cs.CV

    Heavy Labels Out! Dataset Distillation with Label Space Lightening

    Authors: Ruonan Yu, Songhua Liu, Zigeng Chen, Jingwen Ye, Xinchao Wang

    Abstract: Dataset distillation or condensation aims to condense a large-scale training dataset into a much smaller synthetic one such that the training performance of distilled and original sets on neural networks are similar. Although the number of training samples can be reduced substantially, current state-of-the-art methods heavily rely on enormous soft labels to achieve satisfactory performance. As a r… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  39. arXiv:2408.07171  [pdf, other

    eess.IV cs.CV

    BVI-UGC: A Video Quality Database for User-Generated Content Transcoding

    Authors: Zihao Qi, Chen Feng, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

    Abstract: In recent years, user-generated content (UGC) has become one of the major video types consumed via streaming networks. Numerous research contributions have focused on assessing its visual quality through subjective tests and objective modeling. In most cases, objective assessments are based on a no-reference scenario, where the corresponding reference content is assumed not to be available. Howeve… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 12 pages, 11 figures

  40. arXiv:2408.07116  [pdf, other

    cs.CV cs.GR

    Generative Photomontage

    Authors: Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu

    Abstract: Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images… ▽ More

    Submitted 16 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: Project webpage: https://lseancs.github.io/generativephotomontage/ ; corrected typos in v2

  41. arXiv:2408.06920  [pdf, other

    cs.AI cs.MA

    Multi-Agent Continuous Control with Generative Flow Networks

    Authors: Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang, Yunfeng Shao, Chao Wu

    Abstract: Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous jo… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  42. arXiv:2408.06831  [pdf, other

    cs.CG

    Polynomial 2D Green Coordinates for High-order Cages

    Authors: Shibo Liu, Ligang Liu, Xiao-Ming Fu

    Abstract: We propose conformal polynomial coordinates for 2D closed high-order cages, which consist of polynomial curves of any order. The coordinates enable the transformation of the input polynomial curves into polynomial curves of any order. We extend the classical 2D Green coordinates to define our coordinates, thereby leading to cage-aware conformal harmonic deformations. We extensively test our method… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  43. arXiv:2408.06646  [pdf, other

    cs.CV

    Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

    Authors: Chenqian Yan, Songwei Liu, Hongjian Liu, Xurui Peng, Xiaojian Wang, Fangming Chen, Lean Fu, Xing Mei

    Abstract: Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on se… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  44. arXiv:2408.06264  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

    Authors: Manuel Milling, Shuo Liu, Andreas Triantafyllopoulos, Ilhan Aslan, Björn W. Schuller

    Abstract: Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  45. arXiv:2408.05945  [pdf, other

    cs.CV

    MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection

    Authors: Zitian Wang, Zehao Huang, Yulu Gao, Naiyan Wang, Si Liu

    Abstract: The rise of autonomous vehicles has significantly increased the demand for robust 3D object detection systems. While cameras and LiDAR sensors each offer unique advantages--cameras provide rich texture information and LiDAR offers precise 3D spatial data--relying on a single modality often leads to performance limitations. This paper introduces MV2DFusion, a multi-modal detection framework that in… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  46. arXiv:2408.05502  [pdf, other

    cs.CV cs.CL

    GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph

    Authors: Shaonan Liu, Wenting Chen, Jie Liu, Xiaoling Luo, Linlin Shen

    Abstract: Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians' ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical ra… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 9 figures

  47. Enhancing Representation Learning of EEG Data with Masked Autoencoders

    Authors: Yifei Zhou, Sitong Liu

    Abstract: Self-supervised learning has been a powerful training paradigm to facilitate representation learning. In this study, we design a masked autoencoder (MAE) to guide deep learning models to learn electroencephalography (EEG) signal representation. Our MAE includes an encoder and a decoder. A certain proportion of input EEG signals are randomly masked and sent to our MAE. The goal is to recover these… ▽ More

    Submitted 1 September, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

  48. arXiv:2408.05117  [pdf, other

    eess.IV cs.AI cs.CV

    Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

    Authors: Shouyue Liu, Jinkui Hao, Yonghuai Liu, Huazhu Fu, Xinyu Guo, Shuting Zhang, Yitian Zhao

    Abstract: Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  49. arXiv:2408.04216  [pdf

    cs.CL cs.AI

    Attention Mechanism and Context Modeling System for Text Mining Machine Translation

    Authors: Shi Bo, Yuwei Zhang, Junming Huang, Sitong Liu, Zexi Chen, Zizheng Li

    Abstract: This paper advances a novel architectural schema anchored upon the Transformer paradigm and innovatively amalgamates the K-means categorization algorithm to augment the contextual apprehension capabilities of the schema. The transformer model performs well in machine translation tasks due to its parallel computing power and multi-head attention mechanism. However, it may encounter contextual ambig… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  50. arXiv:2408.04158  [pdf, other

    eess.IV cs.CV

    Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

    Authors: Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

    Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient S… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024