Skip to main content

Showing 1–50 of 70 results for author: Huo, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11297  [pdf, other

    cs.CV

    Making Large Vision Language Models to be Good Few-shot Learners

    Authors: Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

    Abstract: Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk lear… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2407.17418  [pdf, other

    cs.CV

    3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

    Authors: Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. It can effectively transform multi-view images into explicit 3D Gaussian representations through efficient training, and achieve real-time rendering of novel views. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  3. arXiv:2406.14826  [pdf, other

    eess.IV cs.AI

    Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

    Authors: Jiayu Huo, Sebastien Ourselin, Rachel Sparks

    Abstract: Accurate brain lesion delineation is important for planning neurosurgical treatment. Automatic brain lesion segmentation methods based on convolutional neural networks have demonstrated remarkable performance. However, neural network performance is constrained by the lack of large-scale well-annotated training datasets. In this manuscript, we propose a comprehensive framework to efficiently genera… ▽ More

    Submitted 18 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures, 8 tables

  4. arXiv:2406.11193  [pdf, other

    cs.CL

    MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.04888  [pdf, other

    cs.CV

    Zero-Shot Video Editing through Adaptive Sliding Score Distillation

    Authors: Lianghan Zhu, Yanqi Bao, Jing Huo, Jing Wu, Yu-Kun Lai, Wenbin Li, Yang Gao

    Abstract: The rapidly evolving field of Text-to-Video generation (T2V) has catalyzed renewed interest in controllable video editing research. While the application of editing prompts to guide diffusion model denoising has gained prominence, mirroring advancements in image editing, this noise-based inference process inherently compromises the original video's integrity, resulting in unintended over-editing a… ▽ More

    Submitted 6 September, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  6. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  7. arXiv:2404.10160  [pdf, other

    cs.AI

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

    Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: The first three authors contributed equally to this work

  8. arXiv:2404.08016  [pdf, other

    cs.LG

    ONNXPruner: ONNX-Based General Model Pruning Adapter

    Authors: Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao

    Abstract: Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process acros… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  9. arXiv:2404.00563  [pdf, other

    cs.CV

    Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

    Authors: Wenxiao Deng, Wenbin Li, Tianyu Ding, Lei Wang, Hongguang Zhang, Kuihua Huang, Jing Huo, Yang Gao

    Abstract: Dataset distillation has emerged as a promising approach in deep learning, enabling efficient training with small synthetic datasets derived from larger real ones. Particularly, distribution matching-based distillation methods attract attention thanks to its effectiveness and low computational cost. However, these methods face two primary limitations: the dispersed feature distribution within the… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  10. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  11. arXiv:2403.18211  [pdf, other

    cs.CV cs.LG

    NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

    Authors: Jingyang Huo, Yikai Wang, Xuelin Qian, Yun Wang, Chong Li, Jianfeng Feng, Yanwei Fu

    Abstract: Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diff… ▽ More

    Submitted 17 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2403.18198  [pdf, other

    eess.IV cs.CV

    Generative Medical Segmentation

    Authors: Jiayu Huo, Xi Ouyang, Sébastien Ourselin, Rachel Sparks

    Abstract: Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models follow the discriminative pixel-wise classification learning paradigm and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medi… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2403.15901  [pdf, other

    cs.AI cs.CV

    MatchSeg: Towards Better Segmentation via Reference Image Matching

    Authors: Jiayu Huo, Ruiqiang Xiao, Haotian Zheng, Yang Liu, Sebastien Ourselin, Rachel Sparks

    Abstract: Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the q… ▽ More

    Submitted 17 August, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: International Conference on Bioinformatics and Biomedicine (BIBM 2024)

  14. arXiv:2403.15647  [pdf, other

    cs.CV

    RetiGen: A Framework for Generalized Retinal Diagnosis Using Multi-View Fundus Images

    Authors: Ze Chen, Gongyu Zhang, Jiayu Huo, Joan Nunez do Rio, Charalampos Komninos, Yang Liu, Rachel Sparks, Sebastien Ourselin, Christos Bergeles, Timothy Jackson

    Abstract: This study introduces a novel framework for enhancing domain generalization in medical imaging, specifically focusing on utilizing unlabelled multi-view colour fundus photographs. Unlike traditional approaches that rely on single-view imaging data and face challenges in generalizing across diverse clinical settings, our method leverages the rich information in the unlabelled multi-view imaging dat… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  15. arXiv:2403.12787  [pdf, other

    cs.CV

    DDSB: An Unsupervised and Training-free Method for Phase Detection in Echocardiography

    Authors: Zhenyu Bu, Yang Liu, Jiayu Huo, Jingjing Peng, Kaini Wang, Guangquan Zhou, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and tra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  16. arXiv:2403.11229  [pdf, other

    cs.CV

    Concatenate, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

    Authors: Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao

    Abstract: Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness,… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  17. arXiv:2403.10039  [pdf, other

    cs.CV cs.AI

    Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation

    Authors: Peiran Wu, Yang Liu, Jiayu Huo, Gongyu Zhang, Christos Bergeles, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  18. arXiv:2402.15746  [pdf, other

    cs.CV cs.AI cs.MM

    Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT

    Authors: Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu

    Abstract: With the rise of short video platforms represented by TikTok, the trend of users expressing their creativity through photos and videos has increased dramatically. However, ordinary users lack the professional skills to produce high-quality videos using professional creation software. To meet the demand for intelligent and user-friendly video creation tools, we propose the Dynamic Visual Compositio… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Project Page: https://sixiaozheng.github.io/IntelligentDirector/

  19. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  20. arXiv:2311.00342  [pdf, other

    cs.CV

    fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding

    Authors: Xuelin Qian, Yun Wang, Jingyang Huo, Jianfeng Feng, Yanwei Fu

    Abstract: The exploration of brain activity and its decoding from fMRI data has been a longstanding pursuit, driven by its potential applications in brain-computer interfaces, medical diagnostics, and virtual reality. Previous approaches have primarily focused on individual subject analysis, highlighting the need for a more universal and adaptable framework, which is the core motivation behind our work. In… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  21. arXiv:2309.16299  [pdf, other

    cs.RO cs.HC cs.LG

    CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture

    Authors: Zixuan Chen, Ze Ji, Shuyang Liu, Jing Huo, Yiyu Chen, Yang Gao

    Abstract: Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  22. arXiv:2308.13897  [pdf, other

    cs.CV

    InsertNeRF: Instilling Generalizability into NeRF with HyperNet Modules

    Authors: Yanqi Bao, Tianyu Ding, Jing Huo, Wenbin Li, Yuxin Li, Yang Gao

    Abstract: Generalizing Neural Radiance Fields (NeRF) to new scenes is a significant challenge that existing approaches struggle to address without extensive modifications to vanilla NeRF framework. We introduce InsertNeRF, a method for INStilling gEneRalizabiliTy into NeRF. By utilizing multiple plug-and-play HyperNet modules, InsertNeRF dynamically tailors NeRF's weights to specific reference scenes, trans… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: This work was accepted at ICLR 2024

  23. arXiv:2308.09923  [pdf, other

    cs.CR cs.AI cs.LG

    East: Efficient and Accurate Secure Transformer Framework for Inference

    Authors: Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, Xiyong Zhang

    Abstract: Transformer has been successfully used in practical applications, such as ChatGPT, due to its powerful advantages. However, users' input is leaked to the model provider during the service. With people's attention to privacy, privacy-preserving Transformer inference is on the demand of such services. Secure protocols for non-linear functions are crucial in privacy-preserving Transformer inference,… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  24. arXiv:2308.02908  [pdf, other

    cs.CV

    Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs

    Authors: Yanqi Bao, Yuxin Li, Jing Huo, Tianyu Ding, Xinyue Liang, Wenbin Li, Yang Gao

    Abstract: Neural Radiance Fields from Sparse input} (NeRF-S) have shown great potential in synthesizing novel views with a limited number of observed viewpoints. However, due to the inherent limitations of sparse inputs and the gap between non-adjacent views, rendering results often suffer from over-fitting and foggy surfaces, a phenomenon we refer to as "CONFUSION" during volume rendering. In this paper, w… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted In Proceedings of the 31st ACM International Conference on Multimedia (MM' 23)

  25. arXiv:2307.01220  [pdf, other

    eess.IV cs.CV

    ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance

    Authors: Jiayu Huo, Yang Liu, Xi Ouyang, Alejandro Granados, Sebastien Ourselin, Rachel Sparks

    Abstract: Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and ba… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: 9 pages, 4 figures, 3 tables

  26. arXiv:2306.11510  [pdf, other

    cs.CV

    Pushing the Limits of 3D Shape Generation at Scale

    Authors: Yu Wang, Xuelin Qian, Jingyang Huo, Tiejun Huang, Bo Zhao, Yanwei Fu

    Abstract: We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions. Through the adaptation of the Auto-Regressive model and the utilization of large language models, we have developed a remarkable model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D. Our approach addresses the… ▽ More

    Submitted 19 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Project page: https://argus-3d.github.io/

  27. arXiv:2305.17102  [pdf, other

    cs.CV

    GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

    Authors: Jingyang Huo, Qiang Sun, Boyan Jiang, Haitao Lin, Yanwei Fu

    Abstract: Most existing works solving Room-to-Room VLN problem only utilize RGB images and do not consider local context around candidate views, which lack sufficient visual cues about surrounding environment. Moreover, natural language contains complex semantic information thus its correlations with visual inputs are hard to model merely with cross attention. In this paper, we propose GeoVLN, which learns… ▽ More

    Submitted 2 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by CVPR 2023

  28. arXiv:2211.15486  [pdf, other

    eess.IV cs.CV

    MAPPING: Model Average with Post-processing for Stroke Lesion Segmentation

    Authors: Jiayu Huo, Liyun Chen, Yang Liu, Maxence Boels, Alejandro Granados, Sebastien Ourselin, Rachel Sparks

    Abstract: Accurate stroke lesion segmentation plays a pivotal role in stroke rehabilitation research, to provide lesion shape and size information which can be used for quantification of the extent of the stroke and to assess treatment efficacy. Recently, automatic segmentation algorithms using deep learning techniques have been developed and achieved promising results. In this report, we present our stroke… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Challenge Report, 1st place in 2022 MICCAI ATLAS Challenge

  29. arXiv:2211.14516  [pdf, other

    cs.CV

    A Unified Framework for Contrastive Learning from a Perspective of Affinity Matrix

    Authors: Wenbin Li, Meihao Kong, Xuesong Yang, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo

    Abstract: In recent years, a variety of contrastive learning based unsupervised visual representation learning methods have been designed and achieved great success in many visual tasks. Generally, these methods can be roughly classified into four categories: (1) standard contrastive methods with an InfoNCE like loss, such as MoCo and SimCLR; (2) non-contrastive methods with only positive pairs, such as BYO… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: 12 pages

  30. arXiv:2210.03591  [pdf, other

    cs.CV

    Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery

    Authors: Wenbin Li, Zhichen Fan, Jing Huo, Yang Gao

    Abstract: Novel class discovery (NCD) aims at learning a model that transfers the common knowledge from a class-disjoint labelled dataset to another unlabelled dataset and discovers new classes (clusters) within it. Many methods, as well as elaborate training pipelines and appropriate objectives, have been proposed and considerably boosted performance on NCD tasks. Despite all this, we find that the existin… ▽ More

    Submitted 23 March, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to CVPR 2023

  31. arXiv:2208.09612  [pdf, other

    cs.IR

    AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments

    Authors: Huadai Liu, Wenqiang Xu, Xuan Lin, Jingjing Huo, Hong Chen, Zhou Zhao

    Abstract: Argument mining aims to detect all possible argumentative components and identify their relationships automatically. As a thriving task in natural language processing, there has been a large amount of corpus for academic study and application development in this field. However, the research in this area is still constrained by the inherent limitations of existing datasets. Specifically, all the pu… ▽ More

    Submitted 30 May, 2024; v1 submitted 20 August, 2022; originally announced August 2022.

  32. arXiv:2208.03988  [pdf, other

    cs.SE

    Fuzzing Microservices: A Series of User Studies in Industry on Industrial Systems with EvoMaster

    Authors: Man Zhang, Andrea Arcuri, Yonggang Li, Yang Liu, Kaiming Xue, Zhao Wang, Jian Huo, Weiwei Huang

    Abstract: With several microservice architectures comprising of thousands of web services, used to serve 630 million customers, companies like Meituan face several challenges in the verification and validation of their software. This paper reports on our experience of integrating EvoMaster (a search-based white-box fuzzer) in the testing processes at Meituan over almost 2 years. Two user studies were carrie… ▽ More

    Submitted 22 August, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

  33. arXiv:2208.03203  [pdf, other

    cs.CV eess.IV

    Brain Lesion Synthesis via Progressive Adversarial Variational Auto-Encoder

    Authors: Jiayu Huo, Vejay Vakharia, Chengyuan Wu, Ashwini Sharan, Andrew Ko, Sebastien Ourselin, Rachel Sparks

    Abstract: Laser interstitial thermal therapy (LITT) is a novel minimally invasive treatment that is used to ablate intracranial structures to treat mesial temporal lobe epilepsy (MTLE). Region of interest (ROI) segmentation before and after LITT would enable automated lesion quantification to objectively assess treatment efficacy. Deep learning techniques, such as convolutional neural networks (CNNs) are st… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: 11 pages, 4 figures, accepted by International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI 2022)

  34. arXiv:2203.13802  [pdf, other

    cs.CV eess.IV

    Playing Lottery Tickets in Style Transfer Models

    Authors: Meihao Kong, Jing Huo, Wenbin Li, Jing Wu, Yu-Kun Lai, Yang Gao

    Abstract: Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other… ▽ More

    Submitted 10 April, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

  35. Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis

    Authors: Xi Ouyang, Srikrishna Karanam, Ziyan Wu, Terrence Chen, Jiayu Huo, Xiang Sean Zhou, Qian Wang, Jie-Zhi Cheng

    Abstract: We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a gene… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Journal ref: IEEE Transactions on Medical Imaging 2021

  36. arXiv:2110.01303  [pdf, other

    cs.LG cs.CV

    Incremental Class Learning using Variational Autoencoders with Similarity Learning

    Authors: Jiahao Huo, Terence L. van Zyl

    Abstract: Catastrophic forgetting in neural networks during incremental learning remains a challenging problem. Previous research investigated catastrophic forgetting in fully connected networks, with some earlier work exploring activation functions and learning algorithms. Applications of neural networks have been extended to include similarity learning. Understanding how similarity learning loss functions… ▽ More

    Submitted 14 March, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

  37. arXiv:2109.04898  [pdf, other

    cs.CV

    LibFewShot: A Comprehensive Library for Few-shot Learning

    Authors: Wenbin Li, Ziyi, Wang, Xuesong Yang, Chuanqi Dong, Pinzhuo Tian, Tiexin Qin, Jing Huo, Yinghuan Shi, Lei Wang, Yang Gao, Jiebo Luo

    Abstract: Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or ``tricks'', such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. Moreover, different w… ▽ More

    Submitted 15 September, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 17 pages

  38. arXiv:2107.12219  [pdf, other

    cs.RO

    Integer-Programming-Based Narrow-Passage Multi-Robot Path Planning with Effective Heuristics

    Authors: Jiaxi Huo, Ronghao Zheng, Meiqin Liu, Senlin Zhang

    Abstract: We study optimal Multi-robot Path Planning (MPP) on graphs, in order to improve the efficiency of multi-robot system (MRS) in the warehouse-like environment. We propose a novel algorithm, OMRPP (One-way Multi-robot Path Planning) based on Integer programming (IP) method. We focus on reducing the cost caused by a set of robots moving from their initial configuration to goal configuration in the war… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  39. arXiv:2107.10419  [pdf, other

    cs.CV

    Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

    Authors: Wenbin Li, Xuesong Yang, Meihao Kong, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo

    Abstract: Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performa… ▽ More

    Submitted 23 August, 2023; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR) 2023

  40. arXiv:2105.07715  [pdf, other

    cs.CV

    Cross-Modality Brain Tumor Segmentation via Bidirectional Global-to-Local Unsupervised Domain Adaptation

    Authors: Kelei He, Wen Ji, Tao Zhou, Zhuoyuan Li, Jing Huo, Xin Zhang, Yang Gao, Dinggang Shen, Bing Zhang, Junfeng Zhang

    Abstract: Accurate segmentation of brain tumors from multi-modal Magnetic Resonance (MR) images is essential in brain tumor diagnosis and treatment. However, due to the existence of domain shifts among different modalities, the performance of networks decreases dramatically when training on one modality and performing on another, e.g., train on T1 image while performing on T2 image, which is often required… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  41. arXiv:2104.09700  [pdf, other

    q-fin.PR cs.LG q-fin.CP q-fin.PM q-fin.TR

    Stock Market Trend Analysis Using Hidden Markov Model and Long Short Term Memory

    Authors: Mingwen Liu, Junbang Huo, Yulin Wu, Jinge Wu

    Abstract: This paper intends to apply the Hidden Markov Model into stock market and and make predictions. Moreover, four different methods of improvement, which are GMM-HMM, XGB-HMM, GMM-HMM+LSTM and XGB-HMM+LSTM, will be discussed later with the results of experiment respectively. After that we will analyze the pros and cons of different models. And finally, one of the best will be used into stock market f… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  42. arXiv:2010.09482  [pdf, other

    cs.CL cs.AI

    Diving Deep into Context-Aware Neural Machine Translation

    Authors: Jingjing Huo, Christian Herold, Yingbo Gao, Leonard Dahlmann, Shahram Khadivi, Hermann Ney

    Abstract: Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-lev… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted at 5th Conference on Machine Translation (WMT20)

  43. arXiv:2010.00246  [pdf, other

    cs.CV cs.LG cs.MM

    CariMe: Unpaired Caricature Generation with Multiple Exaggerations

    Authors: Zheng Gu, Chuanqi Dong, Jing Huo, Wenbin Li, Yang Gao

    Abstract: Caricature generation aims to translate real photos into caricatures with artistic styles and shape exaggerations while maintaining the identity of the subject. Different from the generic image-to-image translation, drawing a caricature automatically is a more challenging task due to the existence of various spacial deformations. Previous caricature generation methods are obsessed with predicting… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  44. arXiv:2007.09344  [pdf, other

    cs.CV

    Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition

    Authors: Wen Ji, Kelei He, Jing Huo, Zheng Gu, Yang Gao

    Abstract: Caricature attributes provide distinctive facial features to help research in Psychology and Neuroscience. However, unlike the facial photo attribute datasets that have a quantity of annotated images, the annotations of caricature attributes are rare. To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA. Moreover, to utilize mode… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: This paper has been accepted by ECCV 2020

  45. arXiv:2006.05713  [pdf, other

    cs.CV

    Unique Faces Recognition in Videos

    Authors: Jiahao Huo, Terence L van Zyl

    Abstract: This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: Paper was accepted into Fusion 2020 conference but will only be published after the virtual conference in July 2020. 7 pages long

  46. arXiv:2005.11926  [pdf, other

    eess.IV cs.CV

    mr2NST: Multi-Resolution and Multi-Reference Neural Style Transfer for Mammography

    Authors: Sheng Wang, Jiayu Huo, Xi Ouyang, Jifei Che, Xuhua Ren, Zhong Xue, Qian Wang, Jie-Zhi Cheng

    Abstract: Computer-aided diagnosis with deep learning techniques has been shown to be helpful for the diagnosis of the mammography in many clinical studies. However, the image styles of different vendors are very distinctive, and there may exist domain gap among different vendors that could potentially compromise the universal applicability of one deep learning model. In this study, we explicitly address st… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

  47. arXiv:2005.10777  [pdf, other

    cs.CV cs.MM

    Manifold Alignment for Semantically Aligned Style Transfer

    Authors: Jing Huo, Shiyin Jin, Wenbin Li, Jing Wu, Yu-Kun Lai, Yinghuan Shi, Yang Gao

    Abstract: Most existing style transfer methods follow the assumption that styles can be represented with global statistics (e.g., Gram matrices or covariance matrices), and thus address the problem by forcing the output and style images to have similar global statistics. An alternative is the assumption of local style patterns, where algorithms are designed to swap similar local features of content and styl… ▽ More

    Submitted 2 September, 2021; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: 9 pages

    Journal ref: ICCV 2021

  48. arXiv:2005.10089  [pdf, other

    eess.AS cs.CL cs.SD

    Investigation of Large-Margin Softmax in Neural Language Modeling

    Authors: Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney

    Abstract: To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept into the softmax is reported to have good properties such as enhanced discriminative power, less overfitting and well-defined geometric intuitions. Nowadays, l… ▽ More

    Submitted 21 April, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Proceedings of INTERSPEECH 2020

  49. arXiv:2005.09212  [pdf, other

    eess.IV cs.CV

    A Self-ensembling Framework for Semi-supervised Knee Cartilage Defects Assessment with Dual-Consistency

    Authors: Jiayu Huo, Liping Si, Xi Ouyang, Kai Xuan, Weiwu Yao, Zhong Xue, Qian Wang, Dinggang Shen, Lichi Zhang

    Abstract: Knee osteoarthritis (OA) is one of the most common musculoskeletal disorders and requires early-stage diagnosis. Nowadays, the deep convolutional neural networks have achieved greatly in the computer-aided diagnosis field. However, the construction of the deep learning models usually requires great amounts of annotated data, which is generally high-cost. In this paper, we propose a novel approach… ▽ More

    Submitted 12 October, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: accepted by International Workshop on PRedictive Intelligence In MEdicine, 2020

  50. arXiv:2005.07462  [pdf, other

    eess.IV cs.CV cs.LG

    MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling

    Authors: Kelei He, Chunfeng Lian, Ehsan Adeli, Jing Huo, Yang Gao, Bing Zhang, Junfeng Zhang, Dinggang Shen

    Abstract: Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmenta… ▽ More

    Submitted 23 January, 2021; v1 submitted 15 May, 2020; originally announced May 2020.