Skip to main content

Showing 1–50 of 305 results for author: Qin, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03225  [pdf, other

    cs.CL

    Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration

    Authors: Jeremy Qin, Bang Liu, Quoc Dinh Nguyen

    Abstract: Black-box large language models (LLMs) are increasingly deployed in various environments, making it essential for these models to effectively convey their confidence and uncertainty, especially in high-stakes settings. However, these models often exhibit overconfidence, leading to potential risks and misjudgments. Existing techniques for eliciting and calibrating LLM confidence have primarily focu… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2408.12088  [pdf, other

    cs.CY

    Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment

    Authors: Jinghui Qin, Changsong Liu, Tianchi Tang, Dahuang Liu, Minghao Wang, Qianying Huang, Yang Xu, Rumin Zhang

    Abstract: Mental disorders, such as anxiety and depression, have become a global issue that affects the regular lives of people across different ages. Without proper detection and treatment, anxiety and depression can hinder the sufferer's study, work, and daily life. Fortunately, recent advancements of digital and AI technologies provide new opportunities for better mental health care and many efforts have… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2408.11823  [pdf, other

    cs.NE cs.AI

    Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing

    Authors: Jiahao Qin, Feng Liu

    Abstract: The field of neuromorphic computing has gained significant attention in recent years, aiming to bridge the gap between the efficiency of biological neural networks and the performance of artificial intelligence systems. This paper introduces Mamba-Spike, a novel neuromorphic architecture that integrates a spiking front-end with the Mamba backbone to achieve efficient and robust temporal data proce… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures, accepted by CGI 2024

  4. arXiv:2408.11319  [pdf, other

    cs.CL cs.AI

    SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding

    Authors: Yazhou Zhang, Chunwang Zou, Zheng Lian, Prayag Tiwari, Jing Qin

    Abstract: In the era of large language models (LLMs), the task of ``System I''~-~the fast, unconscious, and intuitive tasks, e.g., sentiment analysis, text classification, etc., have been argued to be successfully solved. However, sarcasm, as a subtle linguistic phenomenon, often employs rhetorical devices like hyperbole and figuration to convey true sentiments and intentions, involving a higher level of ab… ▽ More

    Submitted 23 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.04171  [pdf, other

    cs.CV

    Rotation center identification based on geometric relationships for rotary motion deblurring

    Authors: Jinhui Qin, Yong Ma, Jun Huang, Fan Fan, You Du

    Abstract: Non-blind rotary motion deblurring (RMD) aims to recover the latent clear image from a rotary motion blurred (RMB) image. The rotation center is a crucial input parameter in non-blind RMD methods. Existing methods directly estimate the rotation center from the RMB image. However they always suffer significant errors, and the performance of RMD is limited. For the assembled imaging systems, the pos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  7. arXiv:2408.03005  [pdf, other

    cs.DB

    Automatic String Data Validation with Pattern Discovery

    Authors: Xinwei Lin, Jing Zhao, Peng Di, Chuan Xiao, Rui Mao, Yan Ji, Makoto Onizuka, Zishuo Ding, Weiyi Shang, Jianbin Qin

    Abstract: In enterprise data pipelines, data insertions occur periodically and may impact downstream services if data quality issues are not addressed. Typically, such problems can be investigated and fixed by on-call engineers, but locating the cause of such problems and fixing errors are often time-consuming. Therefore, automatic data validation is a better solution to defend the system and downstream ser… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  8. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  9. arXiv:2407.21022  [pdf, other

    cs.IR

    A Comprehensive Survey on Retrieval Methods in Recommender Systems

    Authors: Junjie Huang, Jizheng Chen, Jianghao Lin, Jiarui Qin, Ziming Feng, Weinan Zhang, Yong Yu

    Abstract: In an era dominated by information overload, effective recommender systems are essential for managing the deluge of data across digital platforms. Multi-stage cascade ranking systems are widely used in the industry, with retrieval and ranking being two typical stages. Retrieval methods sift through vast candidates to filter out irrelevant items, while ranking methods prioritize these candidates to… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 38 pages

  10. arXiv:2407.12828  [pdf, other

    cs.CL cs.AI

    Why Does New Knowledge Create Messy Ripple Effects in LLMs?

    Authors: Jiaxin Qin, Zixuan Zhang, Chi Han, Manling Li, Pengfei Yu, Heng Ji

    Abstract: Extensive previous research has focused on post-training knowledge editing (KE) for language models (LMs) to ensure that knowledge remains accurate and up-to-date. One desired property and open question in KE is to let edited LMs correctly handle ripple effects, where LM is expected to answer its logically related knowledge accurately. In this paper, we answer the question of why most KE methods s… ▽ More

    Submitted 18 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.12725  [pdf, other

    cs.CL

    Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?

    Authors: Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin

    Abstract: Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensiv… ▽ More

    Submitted 24 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures

  12. arXiv:2407.12271  [pdf, other

    cs.CV eess.IV

    RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection

    Authors: Hao Wang, Wenhui Zhu, Jiayou Qin, Xin Li, Oana Dumitrascu, Xiwen Chen, Peijie Qiu, Abolfazl Razi

    Abstract: Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  13. arXiv:2407.12006  [pdf, other

    cs.CE cs.LG

    Form-Finding and Physical Property Predictions of Tensegrity Structures Using Deep Neural Networks

    Authors: Muhao Chen, Jing Qin

    Abstract: In the design of tensegrity structures, traditional form-finding methods utilize kinematic and static approaches to identify geometric configurations that achieve equilibrium. However, these methods often fall short when applied to actual physical models due to imperfections in the manufacturing of structural elements, assembly errors, and material non-linearities. In this work, we develop a deep… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

  14. arXiv:2407.10131  [pdf, other

    cs.CV

    WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

    Authors: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

    Abstract: Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  15. arXiv:2407.10081  [pdf, other

    cs.IR

    All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era

    Authors: Bo Chen, Xinyi Dai, Huifeng Guo, Wei Guo, Weiwen Liu, Yong Liu, Jiarui Qin, Ruiming Tang, Yichao Wang, Chuhan Wu, Yaxiong Wu, Hao Zhang

    Abstract: Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  16. arXiv:2407.07554  [pdf, other

    cs.GR cs.SD eess.AS

    Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

    Authors: Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

    Abstract: Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  17. arXiv:2407.06844  [pdf, other

    cs.CV

    Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

    Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

    Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: submitted to TIP

  18. arXiv:2407.05703  [pdf, other

    cs.CV

    LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos

    Authors: Huihui Xu, Yijun Yang, Angelica I Aviles-Rivero, Guang Yang, Jing Qin, Lei Zhu

    Abstract: Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the l… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: MICCAI2024 Early Accept

  19. arXiv:2407.03886  [pdf, other

    cs.CV eess.IV

    DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment

    Authors: Jinsong Shi, Pan Gao, Xiaojiang Peng, Jie Qin

    Abstract: Image quality assessment (IQA) has long been a fundamental challenge in image understanding. In recent years, deep learning-based IQA methods have shown promising performance. However, the lack of large amounts of labeled data in the IQA field has hindered further advancements in these methods. This paper introduces DSMix, a novel data augmentation technique specifically designed for IQA tasks, ai… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  20. arXiv:2407.03318  [pdf, other

    cs.GT

    Fair Division of Indivisible Chores via Earning Restricted Equilibria

    Authors: Jugal Garg, Aniket Murhekar, John Qin

    Abstract: We study fair division of $m$ indivisible chores among $n$ agents with additive preferences. We consider the desirable fairness notions of envy-freeness up to any chore (EFX) and envy-freeness up to $k$ chores (EF$k$), alongside the efficiency notion of Pareto optimality (PO). We present the first constant approximations of these notions, showing the existence of: - 4-EFX allocations, which impr… ▽ More

    Submitted 27 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Improves the existence result from 5-EFX to 4-EFX. 59 pages

  21. arXiv:2407.02392  [pdf, other

    cs.CV

    TokenPacker: Efficient Visual Projector for Multimodal LLM

    Authors: Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang

    Abstract: The visual projector serves as an essential bridge between the visual encoder and the Large Language Model (LLM) in a Multimodal LLM (MLLM). Typically, MLLMs adopt a simple MLP to preserve all visual contexts via one-to-one transformation. However, the visual tokens are redundant and can be considerably increased when dealing with high-resolution images, impairing the efficiency of MLLMs significa… ▽ More

    Submitted 28 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 16 pages, Codes:https://github.com/CircleRadon/TokenPacker

  22. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  23. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: Jing Zou, Lanqing Liu, Qi Chen, Shujun Wang, Zhanli Hu, Xiaohan Xing, Jing Qin

    Abstract: Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  24. arXiv:2406.17858  [pdf, other

    cs.CV

    Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection

    Authors: Jialun Pei, Ruize Cui, Yaoqian Li, Weixin Si, Jing Qin, Pheng-Ann Heng

    Abstract: Laparoscopic liver surgery poses a complex intraoperative dynamic environment for surgeons, where remains a significant challenge to distinguish critical or even hidden structures inside the liver. Liver anatomical landmarks, e.g., ridge and ligament, serve as important markers for 2D-3D alignment, which can significantly enhance the spatial perception of surgeons for precise surgery. To facilitat… ▽ More

    Submitted 27 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by MICCAI 2024

  25. arXiv:2406.15474  [pdf, other

    cs.AI cs.CL cs.HC

    WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist

    Authors: Chenyu Ren, Yazhou Zhang, Daihai He, Jing Qin

    Abstract: Large language models (LLMs) are raging over the medical domain, and their momentum has carried over into the mental health domain, leading to the emergence of few mental health LLMs. Although such mental health LLMs could provide reasonable suggestions for psychological counseling, how to develop an authentic and effective doctor-patient relationship (DPR) through LLMs is still an important probl… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.14534  [pdf, other

    eess.IV cs.CV

    Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

    Authors: Long Lei, Jun Zhou, Jialun Pei, Baoliang Zhao, Yueming Jin, Yuen-Chun Jeremy Teoh, Jing Qin, Pheng-Ann Heng

    Abstract: A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations betwe… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by MICCAI 2024

  27. arXiv:2406.09870  [pdf, other

    cs.LG cs.AI

    IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

    Authors: Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyujin Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to bia… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  28. arXiv:2406.08866  [pdf, other

    cs.CV cs.AI

    Zoom and Shift are All You Need

    Authors: Jiahao Qin

    Abstract: Feature alignment serves as the primary mechanism for fusing multimodal data. We put forth a feature alignment approach that achieves full integration of multimodal information. This is accomplished via an alternating process of shifting and expanding feature representations across modalities to obtain a consistent unified representation in a joint feature space. The proposed technique can reliabl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  29. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  30. arXiv:2405.19775  [pdf, other

    cs.CV

    Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network

    Authors: Sizhe Zheng, Pan Gao, Peng Zhou, Jie Qin

    Abstract: Style transfer aims to render an image with the artistic features of a style image, while maintaining the original structure. Various methods have been put forward for this task, but some challenges still exist. For instance, it is difficult for CNN-based methods to handle global information and long-range dependencies between input images, for which transformer-based methods have been proposed. A… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures, to be published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2024)

  31. arXiv:2405.15265  [pdf, other

    cs.CV

    Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation

    Authors: Jiayi Chen, Rong Quan, Jie Qin

    Abstract: Cross-Domain Few-shot Semantic Segmentation (CD-FSS) aims to train generalized models that can segment classes from different domains with a few labeled images. Previous works have proven the effectiveness of feature transformation in addressing CD-FSS. However, they completely rely on support images for feature transformation, and repeatedly utilizing a few support images for each class may easil… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  32. arXiv:2405.14932  [pdf, other

    cs.LG hep-ph stat.ML

    Fast Inference Using Automatic Differentiation and Neural Transport in Astroparticle Physics

    Authors: Dorian W. P. Amaral, Shixiao Liang, Juehang Qin, Christopher Tunnell

    Abstract: Multi-dimensional parameter spaces are commonly encountered in astroparticle physics theories that attempt to capture novel phenomena. However, they often possess complicated posterior geometries that are expensive to traverse using techniques traditional to this community. Effectively sampling these spaces is crucial to bridge the gap between experiment and theory. Several recent innovations, whi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 20 pages, 7 figures, 4 tables, 6 appendices

  33. arXiv:2405.10832  [pdf, other

    cs.CV

    Open-Vocabulary Spatio-Temporal Action Detection

    Authors: Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, Limin Wang

    Abstract: Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  34. arXiv:2405.08275  [pdf, other

    math.OC cs.CV math.NA

    Power of $\ell_1$-Norm Regularized Kaczmarz Algorithms for High-Order Tensor Recovery

    Authors: Katherine Henneberger, Jing Qin

    Abstract: Tensors serve as a crucial tool in the representation and analysis of complex, multi-dimensional data. As data volumes continue to expand, there is an increasing demand for developing optimization algorithms that can directly operate on tensors to deliver fast and effective computations. Many problems in real-world applications can be formulated as the task of recovering high-order tensors charact… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.00783

  35. arXiv:2405.07648  [pdf, other

    cs.CV eess.IV

    CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

    Authors: Qingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin

    Abstract: Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  36. arXiv:2405.07293  [pdf, other

    cs.CV cs.AI

    Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV Videos

    Authors: Jing Xu, Wentao Shi, Sheng Ren, Pan Gao, Peng Zhou, Jie Qin

    Abstract: In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detectin… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  37. arXiv:2405.06674  [pdf, other

    cs.CL cs.AI

    Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

    Authors: Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang

    Abstract: Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy f… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  38. arXiv:2405.01312  [pdf, other

    cs.DB cs.CR

    Privacy-Enhanced Database Synthesis for Benchmark Publishing

    Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

    Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating syn… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  39. arXiv:2405.00951  [pdf, other

    cs.CV math.NA math.OC

    Hyperspectral Band Selection based on Generalized 3DTV and Tensor CUR Decomposition

    Authors: Katherine Henneberger, Jing Qin

    Abstract: Hyperspectral Imaging (HSI) serves as an important technique in remote sensing. However, high dimensionality and data volume typically pose significant computational challenges. Band selection is essential for reducing spectral redundancy in hyperspectral imagery while retaining intrinsic critical information. In this work, we propose a novel hyperspectral band selection model by decomposing the d… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  40. arXiv:2405.00358  [pdf, other

    cs.AI cs.LG

    Arbitrary Time Information Modeling via Polynomial Approximation for Temporal Knowledge Graph Embedding

    Authors: Zhiyu Fang, Jingyan Qin, Xiaobin Zhu, Chun Yang, Xu-Cheng Yin

    Abstract: Distinguished from traditional knowledge graphs (KGs), temporal knowledge graphs (TKGs) must explore and reason over temporally evolving facts adequately. However, existing TKG approaches still face two main challenges, i.e., the limited capability to model arbitrary timestamps continuously and the lack of rich inference patterns under temporal constraints. In this paper, we propose an innovative… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024 (long paper, camera-ready version)

  41. Transformer-based Reasoning for Learning Evolutionary Chain of Events on Temporal Knowledge Graph

    Authors: Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, Jingyan Qin

    Abstract: Temporal Knowledge Graph (TKG) reasoning often involves completing missing factual elements along the timeline. Although existing methods can learn good embeddings for each factual element in quadruples by integrating temporal information, they often fail to infer the evolution of temporal facts. This is mainly because of (1) insufficiently exploring the internal structure and semantic relationshi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGIR 2024 (the Full paper track, camera ready version)

  42. Retrieval-Oriented Knowledge for Click-Through Rate Prediction

    Authors: Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

    Abstract: Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  43. arXiv:2404.17031  [pdf, other

    cs.CV

    Motor Focus: Ego-Motion Prediction with All-Pixel Matching

    Authors: Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  44. Unified Unsupervised Salient Object Detection via Knowledge Transfer

    Authors: Yao Yuan, Wutao Liu, Pan Gao, Qun Dai, Jie Qin

    Abstract: Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling… ▽ More

    Submitted 13 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  45. arXiv:2404.07581  [pdf, other

    cs.IR

    M-scan: A Multi-Scenario Causal-driven Adaptive Network for Recommendation

    Authors: Jiachen Zhu, Yichao Wang, Jianghao Lin, Jiarui Qin, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: We primarily focus on the field of multi-scenario recommendation, which poses a significant challenge in effectively leveraging data from different scenarios to enhance predictions in scenarios with limited data. Current mainstream efforts mainly center around innovative model network architectures, with the aim of enabling the network to implicitly acquire knowledge from diverse scenarios. Howeve… ▽ More

    Submitted 14 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by WWW'24

  46. arXiv:2403.12865  [pdf, other

    cs.RO

    PE-Planner: A Performance-Enhanced Quadrotor Motion Planner for Autonomous Flight in Complex and Dynamic Environments

    Authors: Jiaxin Qiu, Qingchen Liu, Jiahu Qin, Dewang Cheng, Yawei Tian, Qichao Ma

    Abstract: The role of a motion planner is pivotal in quadrotor applications, yet existing methods often struggle to adapt to complex environments, limiting their ability to achieve fast, safe, and robust flight. In this letter, we introduce a performance-enhanced quadrotor motion planner designed for autonomous flight in complex environments including dense obstacles, dynamic obstacles, and unknown disturba… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  47. arXiv:2403.12415  [pdf, other

    cs.CV cs.HC

    VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

    Authors: Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  48. arXiv:2403.07684  [pdf, other

    cs.CV

    Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

    Authors: Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero, Yulun Zhang, Jing Qin, Lei Zhu

    Abstract: Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather condition… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  49. arXiv:2403.06197  [pdf, other

    eess.IV cs.CV cs.LG

    DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

    Authors: Wenfang Yao, Kejing Yin, William K. Cheung, Jia Liu, Jing Qin

    Abstract: The combination of electronic health records (EHR) and medical images is crucial for clinicians in making diagnoses and forecasting prognosis. Strategically fusing these two data modalities has great potential to improve the accuracy of machine learning models in clinical prediction tasks. However, the asynchronous and complementary nature of EHR and medical images presents unique challenges. Miss… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI-24

  50. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.