Skip to main content

Showing 1–50 of 207 results for author: Qu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01192  [pdf, other

    cs.IR

    SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation

    Authors: Haohao Qu, Yifeng Zhang, Liangbo Ning, Wenqi Fan, Qing Li

    Abstract: Sequential recommendation methods are crucial in modern recommender systems for their remarkable capability to understand a user's changing interests based on past interactions. However, a significant challenge faced by current methods (e.g., RNN- or Transformer-based models) is to effectively and efficiently capture users' preferences by modeling long behavior sequences, which impedes their vario… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2409.00657  [pdf, other

    cs.DC

    HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

    Authors: Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang, Dan Feng

    Abstract: Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a featu… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  3. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  4. arXiv:2408.05105  [pdf, other

    cs.HC cs.GR

    Evaluating Layout Dimensionalities in PC+VR Asymmetric Collaborative Decision Making

    Authors: Daniel Enriquez, Wai Tong, Chris North, Huamin Qu, Yalong Yang

    Abstract: With the commercialization of virtual/augmented reality (VR/AR) devices, there is an increasing interest in combining immersive and non-immersive devices (e.g., desktop computers) for asymmetric collaborations. While such asymmetric settings have been examined in social platforms, significant questions around layout dimensionality in data-driven decision-making remain underexplored. A crucial inqu… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: To be presented at ACM ISS 2024

  5. arXiv:2408.03876  [pdf, other

    cs.HC

    From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems

    Authors: Leixian Shen, Haotian Li, Yun Wang, Huamin Qu

    Abstract: Creating data stories from raw data is challenging due to humans' limited attention spans and the need for specialized skills. Recent advancements in large language models (LLMs) offer great opportunities to develop systems with autonomous agents to streamline the data storytelling workflow. Though multi-agent systems have benefits such as fully realizing LLM potentials with decomposed tasks for i… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages, 2 figures, IEEE VIS 2024 Gen4DS Workshop

  6. WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization

    Authors: Liwenhan Xie, Chengbo Zheng, Haijun Xia, Huamin Qu, Chen Zhu-Tian

    Abstract: Large language models (LLMs) support data analysis through conversational user interfaces, as exemplified in OpenAI's ChatGPT (formally known as Advanced Data Analysis or Code Interpreter). Essentially, LLMs produce code for accomplishing diverse analysis tasks. However, presenting raw code can obscure the logic and hinder user verification. To empower users with enhanced comprehension and augment… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted in the 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

  7. arXiv:2408.01129  [pdf, other

    cs.LG cs.AI

    A Survey of Mamba

    Authors: Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Hui Liu, Xin Xu, Qing Li

    Abstract: As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computati… ▽ More

    Submitted 22 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  8. arXiv:2407.18581  [pdf, other

    cs.CL cs.AI

    Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing

    Authors: Hukai Huang, Shenghui Lu, Yahui Shan, He Qu, Wenhao Guan, Qingyang Hong, Lin Li

    Abstract: The Mixture of Experts (MoE) approach is well-suited for multilingual and code-switching (CS) tasks due to its multi-expert architecture. This work introduces the DLG-MoE, a Dynamic Language Group-based MoE optimized for bilingual and CS scenarios. DLG-MoE operates based on a hierarchical routing mechanism. First, the language router explicitly models the language and dispatches the representation… ▽ More

    Submitted 7 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

  9. arXiv:2407.17291  [pdf, other

    cs.HC cs.AI cs.CL cs.CV

    How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?

    Authors: Leo Yu-Ho Lo, Huamin Qu

    Abstract: In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, leading to misinterpretations and decisions based on false information. The development of effective automatic detection methods for misleading charts is an urgent field of research. The rece… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: To be presented at IEEE VIS 2024

  10. arXiv:2407.12423  [pdf, other

    cs.HC cs.AI

    StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

    Authors: Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu

    Abstract: The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this e… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages. To be published at IEEE Visualization 2024

  11. arXiv:2407.10805  [pdf, other

    cs.CL cs.AI

    Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

    Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

    Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More

    Submitted 6 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  12. arXiv:2407.03045  [pdf, other

    cs.HC cs.CL cs.LG

    JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets

    Authors: Zhihua Jin, Shiyi Liu, Haotian Li, Xun Zhao, Huamin Qu

    Abstract: Large Language Models (LLMs) have gained significant attention but also raised concerns due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards LLMs, have appeared and constantly evolved to breach the safety protocols of LLMs. To address this issue, LLMs are regularly updated with safety patches based on reported jailbreak prompts. However, malicious users often… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 9 figures

  13. arXiv:2406.13050  [pdf, other

    cs.CL

    Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation

    Authors: Yige Shen, Hao Jiang, Hua Qu, Jihong Zhao

    Abstract: Despite their impressive capabilities, large language models (LLMs) often face challenges such as temporal misalignment and generating hallucinatory content. Enhancing LLMs with retrieval mechanisms to fetch relevant information from external sources offers a promising solution. Inspired by the proverb "Think twice before you act," we propose a dual-angle evaluated retrieval-augmented generation f… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  14. arXiv:2406.12285  [pdf, other

    cs.CV cs.AI

    DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection

    Authors: Haodong Li, Haicheng Qu

    Abstract: The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to… ▽ More

    Submitted 22 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  15. arXiv:2406.11637  [pdf, other

    cs.HC

    PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis

    Authors: Yue Yu, Leixian Shen, Fei Long, Huamin Qu, Hao Chen

    Abstract: Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python librar… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  16. arXiv:2406.10450  [pdf, other

    cs.IR cs.AI cs.CL

    TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

    Authors: Haohao Qu, Wenqi Fan, Zihuai Zhao, Qing Li

    Abstract: There is a growing interest in utilizing large-scale language models (LLMs) to advance next-generation Recommender Systems (RecSys), driven by their outstanding language understanding and in-context learning capabilities. In this scenario, tokenizing (i.e., indexing) users and items becomes essential for ensuring a seamless alignment of LLMs with recommendations. While several studies have made pr… ▽ More

    Submitted 18 August, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE TKDE. Our code and dataset will be made available upon acceptance of the paper

  17. arXiv:2406.03843  [pdf, other

    cs.HC cs.AI

    POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

    Authors: Jianben He, Xingbo Wang, Shiyi Liu, Guande Wu, Claudio Silva, Huamin Qu

    Abstract: Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modaliti… ▽ More

    Submitted 14 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures

    MSC Class: 68 ACM Class: H.5; I.2.1

  18. arXiv:2406.03317  [pdf, other

    cs.HC cs.MM

    Save It for the "Hot" Day: An LLM-Empowered Visual Analytics System for Heat Risk Management

    Authors: Haobo Li, Wong Kam-Kwai, Yan Luo, Juntong Chen, Chengzhong Liu, Yaxuan Zhang, Alexis Kai Hon Lau, Huamin Qu, Dongyu Liu

    Abstract: The escalating frequency and intensity of heat-related climate events, particularly heatwaves, emphasize the pressing need for advanced heat risk management strategies. Current approaches, primarily relying on numerical models, face challenges in spatial-temporal resolution and in capturing the dynamic interplay of environmental, social, and behavioral factors affecting heat risks. This has led to… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  19. arXiv:2406.01954  [pdf, other

    cs.CV

    Plug-and-Play Diffusion Distillation

    Authors: Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot

    Abstract: Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We sh… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 project page: https://5410tiffany.github.io/plug-and-play-diffusion-distillation.github.io/

  20. arXiv:2406.01341  [pdf, other

    cs.SI

    Important node identification for complex networks based on improved Electre Multi-Attribute fusion

    Authors: Qi Cao, Yurong Song, Min Li, Ruqi Li, Hongbo Qu, Guo-Ping Jiang, Jinye Xiong

    Abstract: Influence maximization problem involves selecting a subset of seed nodes within a social network to maximize information spread under a given diffusion model, so how to identify the important nodes is the problem to be considered in this paper. Due to the great differences in the reality of the network, a class of multi-attribute decision fusion methods is often used to solve this problem. Electre… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2405.15267  [pdf, other

    cs.CV

    Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor

    Authors: Haoxuan Qu, Zhaoyang He, Zeyu Hu, Yujun Cai, Jun Liu

    Abstract: To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  22. arXiv:2405.15196  [pdf, other

    cs.CV

    DisC-GS: Discontinuity-aware Gaussian Splatting

    Authors: Haoxuan Qu, Zhuoling Li, Hossein Rahmani, Yujun Cai, Jun Liu

    Abstract: Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  23. arXiv:2405.13672  [pdf, other

    cs.CV

    Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning

    Authors: Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu

    Abstract: Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiot… ▽ More

    Submitted 27 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  24. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  25. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  26. arXiv:2404.18219  [pdf, other

    physics.ins-det cs.LG hep-ex hep-ph physics.data-an

    BUFF: Boosted Decision Tree based Ultra-Fast Flow matching

    Authors: Cheng Jiang, Sitian Qian, Huilin Qu

    Abstract: Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning mo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 9 pages, 10 figures, 1 additional figure in appendix

  27. arXiv:2404.11614  [pdf, other

    cs.CV

    Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

    Authors: Zichen Liu, Yihao Meng, Hao Ouyang, Yue Yu, Bolin Zhao, Daniel Cohen-Or, Huamin Qu

    Abstract: Text animation serves as an expressive medium, transforming static communication into dynamic experiences by infusing words with motion to evoke emotions, emphasize meanings, and construct compelling narratives. Crafting animations that are semantically aware poses significant challenges, demanding expertise in graphic design and animation. We present an automated text animation scheme, termed "Dy… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Our demo page is available at: https://animate-your-word.github.io/demo/

  28. arXiv:2404.00532  [pdf, other

    cs.CV

    LLMs are Good Action Recognizers

    Authors: Haoxuan Qu, Yujun Cai, Jun Liu

    Abstract: Skeleton-based action recognition has attracted lots of research attention. Recently, to build an accurate skeleton-based action recognizer, a variety of works have been proposed. Among them, some works use large model architectures as backbones of their recognizers to boost the skeleton data representation capability, while some other works pre-train their recognizers on external data to enrich t… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  29. arXiv:2403.16212  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis

    Authors: Shaojie Li, Haichen Qu, Xinqi Dong, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to a… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  30. arXiv:2403.14947  [pdf, other

    cs.CV

    GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

    Authors: Haoxuan Qu, Ziyan Guo, Jun Liu

    Abstract: Recently, while text-driven human motion generation has received massive research attention, most existing text-driven motion generators are generally only designed to generate motion sequences in a blank background. While this is the case, in practice, human beings naturally perform their motions in 3D scenes, rather than in a blank background. Considering this, we here aim to perform scene-aware… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  31. arXiv:2403.11131  [pdf, other

    cs.CV

    Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

    Authors: Yonggan Fu, Huaizhi Qu, Zhifan Ye, Chaojian Li, Kevin Zhao, Yingyan Lin

    Abstract: Recent breakthroughs in Neural Radiance Fields (NeRFs) have sparked significant demand for their integration into real-world 3D applications. However, the varied functionalities required by different 3D applications often necessitate diverse NeRF models with various pipelines, leading to tedious NeRF training for each target task and cumbersome trial-and-error experiments. Drawing inspiration from… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024

  32. arXiv:2403.10107  [pdf, other

    cs.CV cs.AI cs.MM

    Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning

    Authors: Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu

    Abstract: Human-centered dynamic scene understanding plays a pivotal role in enhancing the capability of robotic and autonomous systems, in which Video-based Human-Object Interaction (V-HOI) detection is a crucial task in semantic scene understanding, aimed at comprehensively understanding HOI relationships within a video to benefit the behavioral decisions of mobile robots and autonomous driving systems. A… ▽ More

    Submitted 19 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  33. arXiv:2403.09121  [pdf, other

    cs.HC

    OutlineSpark: Igniting AI-powered Presentation Slides Creation from Computational Notebooks through Outlines

    Authors: Fengjie Wang, Yanna Lin, Leni Yang, Haotian Li, Mingyang Gu, Min Zhu, Huamin Qu

    Abstract: Computational notebooks are widely utilized for exploration and analysis. However, creating slides to communicate analysis results from these notebooks is quite tedious and time-consuming. Researchers have proposed automatic systems for generating slides from notebooks, which, however, often do not consider the process of users conceiving and organizing their messages from massive code cells. Thos… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: To appear in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2024)

  34. arXiv:2403.08499  [pdf

    cs.CV

    Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks

    Authors: Zongqing Qi, Danqing Ma, Jingyu Xu, Ao Xiang, Hedi Qu

    Abstract: In recent years, there have been frequent incidents of foreign objects intruding into railway and Airport runways. These objects can include pedestrians, vehicles, animals, and debris. This paper introduces an improved YOLOv5 architecture incorporating FasterNet and attention mechanisms to enhance the detection of foreign objects on railways and Airport runways. This study proposes a new dataset,… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  35. arXiv:2403.04812  [pdf, other

    cs.LG cs.HC

    TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

    Authors: Zezheng Feng, Yifan Jiang, Hongjun Wang, Zipei Fan, Yuxin Ma, Shuang-Hua Yang, Huamin Qu, Xuan Song

    Abstract: Recent achievements in deep learning (DL) have shown its potential for predicting traffic flows. Such predictions are beneficial for understanding the situation and making decisions in traffic control. However, most state-of-the-art DL models are considered "black boxes" with little to no transparency for end users with respect to the underlying mechanisms. Some previous work tried to "open the bl… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  36. arXiv:2403.03822  [pdf, other

    cs.HC

    HoLens: A Visual Analytics Design for Higher-order Movement Modeling and Visualization

    Authors: Zezheng Feng, Fang Zhu, Hongjun Wang, Jianing Hao, ShuangHua Yang, Wei Zeng, Huamin Qu

    Abstract: Higher-order patterns reveal sequential multistep state transitions, which are usually superior to origin-destination analysis, which depicts only first-order geospatial movement patterns. Conventional methods for higher-order movement modeling first construct a directed acyclic graph (DAG) of movements, then extract higher-order patterns from the DAG. However, DAG-based methods heavily rely on th… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 20 pages, 18 figures, is accepted by computational visual media journal

  37. arXiv:2402.08978  [pdf, other

    cs.HC cs.CE cs.LG

    Prismatic: Interactive Multi-View Cluster Analysis of Concept Stocks

    Authors: Wong Kam-Kwai, Yan Luo, Xuanwu Yue, Wei Chen, Huamin Qu

    Abstract: Financial cluster analysis allows investors to discover investment alternatives and avoid undertaking excessive risks. However, this analytical task faces substantial challenges arising from many pairwise comparisons, the dynamic correlations across time spans, and the ambiguity in deriving implications from business relational knowledge. We propose Prismatic, a visual analytics system that integr… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 14 pages. A preprint version submitted to IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024

  38. arXiv:2402.04991  [pdf, other

    cs.HC

    Exploring the Opportunity of Augmented Reality (AR) in Supporting Older Adults Explore and Learn Smartphone Applications

    Authors: Xiaofu Jin, Wai Tong, Xiaoying Wei, Xian Wang, Emily Kuang, Xiaoyu Mo, Huamin Qu, Mingming Fan

    Abstract: The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop w… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  39. arXiv:2402.03325  [pdf, other

    cs.CV cs.LG

    Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations

    Authors: Helen Qu, Sang Michael Xie

    Abstract: Models trained on a labeled source domain (e.g., labeled images from wildlife camera traps) often generalize poorly when deployed on an out-of-distribution (OOD) target domain (e.g., images from new camera trap locations). In the domain adaptation setting where unlabeled target data is available, self-supervised pretraining (e.g., masked autoencoding or contrastive learning) is a promising method… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 January, 2024; originally announced February 2024.

    Comments: ICML 2024

  40. arXiv:2401.09160  [pdf, other

    cs.RO cs.CV

    DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

    Authors: Hao Qu, Lilian Zhang, Jun Mao, Junbo Tie, Xiaofeng He, Xiaoping Hu, Yifei Shi, Changhao Chen

    Abstract: The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in continuous motion scenes, adversely affecting loop detection accuracy. O… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: In submission

  41. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  42. arXiv:2401.01667  [pdf, other

    cs.CL

    MLPs Compass: What is learned when MLPs are combined with PLMs?

    Authors: Li Zhou, Wenyu Chen, Yong Cao, Dingyi Zeng, Wanlong Liu, Hong Qu

    Abstract: While Transformer-based pre-trained language models and their variants exhibit strong semantic representation capabilities, the question of comprehending the information gain derived from the additional components of PLMs remains an open question in this field. Motivated by recent efforts that prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outper… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  43. arXiv:2401.00029  [pdf, other

    cs.CV

    6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

    Authors: Li Xu, Haoxuan Qu, Yujun Cai, Jun Liu

    Abstract: Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile, diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability, we propose a novel diffusion-based… ▽ More

    Submitted 22 March, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: CVPR 2024 CAMERA-READY

  44. arXiv:2312.14401  [pdf, other

    cs.HC

    Towards an Exploratory Visual Analytics System for Griefer Identification in MOBA Games

    Authors: Zixin Chen, Shiyi Liu, Zhihua Jin, Gaoping Huang, Yang Chao, Zhenchuan Yang, Quan Li, Huamin Qu

    Abstract: Multiplayer Online Battle Arenas (MOBAs) have gained a significant player base worldwide, generating over two billion US dollars in annual game revenue. However, the presence of griefers, who deliberately irritate and harass other players within the game, can have a detrimental impact on players' experience, compromising game fairness and potentially leading to the emergence of gray industries. Un… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: IEEE VIS 2023 (Poster)

  45. arXiv:2311.06570  [pdf, other

    cs.CV

    SynA-ResNet: Spike-driven ResNet Achieved through OR Residual Connection

    Authors: Yimeng Shan, Xuerui Qiu, Rui-jie Zhu, Jason K. Eshraghian, Malu Zhang, Haicheng Qu

    Abstract: Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. As the demand for heightened performance in SNNs surges, the trend towards training deeper networks becomes imperative, while residual learning stands as a pivotal method for training deep neural networks. In… ▽ More

    Submitted 7 July, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures and 10 tables

  46. arXiv:2310.16316  [pdf, other

    cs.LG cs.AI

    Sum-of-Parts Models: Faithful Attributions for Groups of Features

    Authors: Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

    Abstract: An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature at… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  47. arXiv:2310.12069  [pdf, other

    astro-ph.IM cs.LG

    Transformers for scientific data: a pedagogical review for astronomers

    Authors: Dimitrios Tanoglidis, Bhuvnesh Jain, Helen Qu

    Abstract: The deep learning architecture associated with ChatGPT and related generative AI products is known as transformers. Initially applied to Natural Language Processing, transformers and the self-attention mechanism they exploit have gained widespread interest across the natural sciences. The goal of this pedagogical and informal review is to introduce transformers to scientists. The review includes t… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 17 pages, 5 figures

  48. arXiv:2310.11742  [pdf, other

    cs.HC

    AdaVis: Adaptive and Explainable Visualization Recommendation for Tabular Data

    Authors: Songheng Zhang, Haotian Li, Huamin Qu, Yong Wang

    Abstract: Automated visualization recommendation facilitates the rapid creation of effective visualizations, which is especially beneficial for users with limited time and limited knowledge of data visualization. There is an increasing trend in leveraging machine learning (ML) techniques to achieve an end-to-end visualization recommendation. However, existing ML-based approaches implicitly assume that there… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  49. arXiv:2310.05991  [pdf, other

    cs.CL

    Enhancing Document-level Event Argument Extraction with Contextual Clues and Role Relevance

    Authors: Wanlong Liu, Shaohuan Cheng, Dingyi Zeng, Hong Qu

    Abstract: Document-level event argument extraction poses new challenges of long input and cross-sentence inference compared to its sentence-level counterpart. However, most prior works focus on capturing the relations between candidate arguments and the event trigger in each event, ignoring two crucial points: a) non-argument contextual clue information; b) the relevance among argument roles. In this paper,… ▽ More

    Submitted 19 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Findings of ACL2023, correct some mistakes. arXiv admin note: text overlap with arXiv:2310.05116

  50. MARVisT: Authoring Glyph-based Visualization in Mobile Augmented Reality

    Authors: Chen Zhu-Tian, Yijia Su, Yifang Wang, Qianwen Wang, Huamin Qu, Yingcai Wu

    Abstract: Recent advances in mobile augmented reality (AR) techniques have shed new light on personal visualization for their advantages of fitting visualization within personal routines, situating visualization in a real-world context, and arousing users' interests. However, enabling non-experts to create data visualization in mobile AR environments is challenging given the lack of tools that allow in-situ… ▽ More

    Submitted 10 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics ( Volume: 26, Issue: 8, 01 August 2020)