-
MAISI: Medical AI for Synthetic Imaging
Authors:
Pengfei Guo,
Can Zhao,
Dong Yang,
Ziyue Xu,
Vishwesh Nath,
Yucheng Tang,
Benjamin Simon,
Mason Belue,
Stephanie Harmon,
Baris Turkbey,
Daguang Xu
Abstract:
Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion mode…
▽ More
Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 x 512 x 768 ) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI's capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Planning In Natural Language Improves LLM Search For Code Generation
Authors:
Evan Wang,
Federico Cassano,
Catherine Wu,
Yunfeng Bai,
Will Song,
Vaskar Nath,
Ziwen Han,
Sean Hendryx,
Summer Yue,
Hugh Zhang
Abstract:
While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversi…
▽ More
While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language. Based on this insight, we propose PLANSEARCH, a novel search algorithm which shows strong results across HumanEval+, MBPP+, and LiveCodeBench (a contamination-free benchmark for competitive coding). PLANSEARCH generates a diverse set of observations about the problem and then uses these observations to construct plans for solving the problem. By searching over plans in natural language rather than directly over code solutions, PLANSEARCH explores a significantly more diverse range of potential solutions compared to baseline search methods. Using PLANSEARCH on top of Claude 3.5 Sonnet achieves a state-of-the-art pass@200 of 77.0% on LiveCodeBench, outperforming both the best score achieved without search (pass@1 = 41.4%) and using standard repeated sampling (pass@200 = 60.6%). Finally, we show that, across all models, search algorithms, and benchmarks analyzed, we can accurately predict performance gains due to search as a direct function of the diversity over generated ideas.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation
Authors:
Yufan He,
Pengfei Guo,
Yucheng Tang,
Andriy Myronenko,
Vishwesh Nath,
Ziyue Xu,
Dong Yang,
Can Zhao,
Daguang Xu,
Wenqi Li
Abstract:
Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out…
▽ More
Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out that the SAM2 paper clearly outlines a zero-shot evaluation pipeline, which simulates user clicks iteratively for up to eight iterations. We reproduced this interactive annotation simulation on 3D CT datasets and provided the results and code~\url{https://github.com/Project-MONAI/VISTA}. Our findings reveal that directly applying SAM2 on 3D medical imaging in a zero-shot manner is far from satisfactory. It is prone to generating false positives when foreground objects disappear, and annotating more slices cannot fully offset this tendency. For smaller single-connected objects like kidney and aorta, SAM2 performs reasonably well but for most organs it is still far behind state-of-the-art 3D annotation methods. More research and innovation are needed for 3D medical imaging community to use SAM2 correctly.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Learning Goal-Conditioned Representations for Language Reward Models
Authors:
Vaskar Nath,
Dylan Slack,
Jeff Da,
Yuntao Ma,
Hugh Zhang,
Spencer Whitehead,
Sean Hendryx
Abstract:
Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive,…
▽ More
Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive, $\textit{goal-conditioned}$ fashion by increasing the representation similarity of future states along sampled preferred trajectories and decreasing the similarity along randomly sampled dispreferred trajectories. This objective significantly improves RM performance by up to 0.09 AUROC across challenging benchmarks, such as MATH and GSM8k. These findings extend to general alignment as well -- on the Helpful-Harmless dataset, we observe $2.3\%$ increase in accuracy. Beyond improving reward model performance, we show this way of training RM representations enables improved $\textit{steerability}$ because it allows us to evaluate the likelihood of an action achieving a particular goal-state (e.g., whether a solution is correct or helpful). Leveraging this insight, we find that we can filter up to $55\%$ of generated tokens during majority voting by discarding trajectories likely to end up in an "incorrect" state, which leads to significant cost savings. We additionally find that these representations can perform fine-grained control by conditioning on desired future goal-states. For example, we show that steering a Llama 3 model towards helpful generations with our approach improves helpfulness by $9.6\%$ over a supervised-fine-tuning trained baseline. Similarly, steering the model towards complex generations improves complexity by $21.6\%$ over the baseline. Overall, we find that training RMs in this contrastive, goal-conditioned fashion significantly improves performance and enables model steerability.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization
Authors:
Yucheng Tang,
Yufan He,
Vishwesh Nath,
Pengfeig Guo,
Ruining Deng,
Tianyuan Yao,
Quan Liu,
Can Cui,
Mengmeng Yin,
Ziyue Xu,
Holger Roth,
Daguang Xu,
Haichun Yang,
Yuankai Huo
Abstract:
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this…
▽ More
In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions
Authors:
Hareem Nisar,
Syed Muhammad Anwar,
Zhifan Jiang,
Abhijeet Parida,
Ramon Sanchez-Jacob,
Vishwesh Nath,
Holger R. Roth,
Marius George Linguraru
Abstract:
Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently…
▽ More
Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.
△ Less
Submitted 2 August, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography
Authors:
Yufan He,
Pengfei Guo,
Yucheng Tang,
Andriy Myronenko,
Vishwesh Nath,
Ziyue Xu,
Dong Yang,
Can Zhao,
Benjamin Simon,
Mason Belue,
Stephanie Harmon,
Baris Turkbey,
Daguang Xu,
Wenqi Li
Abstract:
Medical image segmentation is a core component of precision medicine, and 3D computed tomography (CT) is one of the most important imaging techniques. A highly accurate and clinically applicable segmentation foundation model will greatly facilitate clinicians and researchers using CT images. Although existing foundation models have attracted great interest, none are adequate for 3D CT, either beca…
▽ More
Medical image segmentation is a core component of precision medicine, and 3D computed tomography (CT) is one of the most important imaging techniques. A highly accurate and clinically applicable segmentation foundation model will greatly facilitate clinicians and researchers using CT images. Although existing foundation models have attracted great interest, none are adequate for 3D CT, either because they lack accurate automatic segmentation for large cohort analysis or the ability to segment novel classes. An ideal segmentation solution should possess two features: accurate out-of-the-box performance covering major organ classes, and effective adaptation or zero-shot ability to novel structures. To achieve this goal, we introduce Versatile Imaging SegmenTation and Annotation model (VISTA3D). VISTA3D is trained systematically on 11454 volumes and provides accurate out-of-the-box segmentation for 127 common types of human anatomical structures and various lesions. Additionally, VISTA3D supports 3D interactive segmentation, allowing convenient editing of automatic results and achieving state-of-the-art annotation results on unseen classes. The novel model design and training recipe represent a promising step toward developing a versatile medical image foundation model and will serve as a valuable foundation for CT image analysis. Code and model weights are available at https://github.com/Project-MONAI/VISTA
△ Less
Submitted 7 August, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Authors:
Quan Liu,
Ruining Deng,
Can Cui,
Tianyuan Yao,
Vishwesh Nath,
Yucheng Tang,
Yuankai Huo
Abstract:
Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,…
▽ More
Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (global-to-local) and the development of a WSI-level image-text representation (local-to-global) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training
Authors:
Jeya Maria Jose Valanarasu,
Yucheng Tang,
Dong Yang,
Ziyue Xu,
Can Zhao,
Wenqi Li,
Vishal M. Patel,
Bennett Landman,
Daguang Xu,
Yufan He,
Vishwesh Nath
Abstract:
Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc…
▽ More
Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical images require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology images. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. Additionally, we also devise a cross-modal contrastive loss (CMCL) to accommodate the pre-training of multiple modalities in a single framework. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation challenge.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
COLosSAL: A Benchmark for Cold-start Active Learning for 3D Medical Image Segmentation
Authors:
Han Liu,
Hao Li,
Xing Yao,
Yubo Fan,
Dewei Hu,
Benoit Dawant,
Vishwesh Nath,
Zhoubing Xu,
Ipek Oguz
Abstract:
Medical image segmentation is a critical task in medical image analysis. In recent years, deep learning based approaches have shown exceptional performance when trained on a fully-annotated dataset. However, data annotation is often a significant bottleneck, especially for 3D medical images. Active learning (AL) is a promising solution for efficient annotation but requires an initial set of labele…
▽ More
Medical image segmentation is a critical task in medical image analysis. In recent years, deep learning based approaches have shown exceptional performance when trained on a fully-annotated dataset. However, data annotation is often a significant bottleneck, especially for 3D medical images. Active learning (AL) is a promising solution for efficient annotation but requires an initial set of labeled samples to start active selection. When the entire data pool is unlabeled, how do we select the samples to annotate as our initial set? This is also known as the cold-start AL, which permits only one chance to request annotations from experts without access to previously annotated data. Cold-start AL is highly relevant in many practical scenarios but has been under-explored, especially for 3D medical segmentation tasks requiring substantial annotation effort. In this paper, we present a benchmark named COLosSAL by evaluating six cold-start AL strategies on five 3D medical image segmentation tasks from the public Medical Segmentation Decathlon collection. We perform a thorough performance analysis and explore important open questions for cold-start AL, such as the impact of budget on different strategies. Our results show that cold-start AL is still an unsolved problem for 3D segmentation tasks but some important trends have been observed. The code repository, data partitions, and baseline results for the complete benchmark are publicly available at https://github.com/MedICL-VU/COLosSAL.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Robust Fiber ODF Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRI
Authors:
Tianyuan Yao,
Francois Rheault,
Leon Y Cai,
Vishwesh nath,
Zuhayr Asad,
Nancy Newlin,
Can Cui,
Ruining Deng,
Karthik Ramadass,
Andrea Shafer,
Susan Resnick,
Kurt Schilling,
Bennett A. Landman,
Yuankai Huo
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data…
▽ More
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data sharing, large-scale multi-site DW-MRI datasets are being made available for multi-site studies. However, measurement variabilities (e.g., inter- and intra-site variability, hardware performance, and sequence design) are inevitable during the acquisition of DW-MRI. Most existing model-based methods (e.g., constrained spherical deconvolution (CSD)) and learning based methods (e.g., deep learning (DL)) do not explicitly consider such variabilities in fODF modeling, which consequently leads to inferior performance on multi-site and/or longitudinal diffusion studies. In this paper, we propose a novel data-driven deep constrained spherical deconvolution method to explicitly constrain the scan-rescan variabilities for a more reproducible and robust estimation of brain microstructure from repeated DW-MRI scans. Specifically, the proposed method introduces a new 3D volumetric scanner-invariant regularization scheme during the fODF estimation. We study the Human Connectome Project (HCP) young adults test-retest group as well as the MASiVar dataset (with inter- and intra-site scan/rescan data). The Baltimore Longitudinal Study of Aging (BLSA) dataset is employed for external validation. From the experimental results, the proposed data-driven framework outperforms the existing benchmarks in repeated fODF estimation. The proposed method is assessing the downstream connectivity analysis and shows increased performance in distinguishing subjects with different biomarkers.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images
Authors:
Andres Diaz-Pinto,
Pritesh Mehta,
Sachidanand Alle,
Muhammad Asad,
Richard Brown,
Vishwesh Nath,
Alvin Ihsani,
Michela Antonelli,
Daniel Palkovics,
Csaba Pinter,
Ron Alkalay,
Steve Pieper,
Holger R. Roth,
Daguang Xu,
Prerna Dogra,
Tom Vercauteren,
Andrew Feng,
Abood Quraini,
Sebastien Ourselin,
M. Jorge Cardoso
Abstract:
Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and…
▽ More
Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and click-based refinement. DeepEdit combines the power of two methods: a non-interactive (i.e. automatic segmentation using nnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow), into a single deep learning model. It allows easy integration of uncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertainty computation) and active learning. We propose and implement a method for training DeepEdit by using standard training combined with user interaction simulation. Once trained, DeepEdit allows clinicians to quickly segment their datasets by using the algorithm in auto segmentation mode or by providing clicks via a user interface (i.e. 3D Slicer, OHIF). We show the value of DeepEdit through evaluation on the PROSTATEx dataset for prostate/prostatic lesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) dataset for abdominal CT segmentation, using state-of-the-art network architectures as baseline for comparison. DeepEdit could reduce the time and effort annotating 3D medical images compared to DeepGrow alone. Source code is available at https://github.com/Project-MONAI/MONAILabel
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Fair Federated Medical Image Segmentation via Client Contribution Estimation
Authors:
Meirui Jiang,
Holger R Roth,
Wenqi Li,
Dong Yang,
Can Zhao,
Vishwesh Nath,
Daguang Xu,
Qi Dou,
Ziyue Xu
Abstract:
How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more…
▽ More
How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more diverse clients joining FL to derive a high-quality global model. In this work, we propose a novel method to optimize both types of fairness simultaneously. Specifically, we propose to estimate client contribution in gradient and data space. In gradient space, we monitor the gradient direction differences of each client with respect to others. And in data space, we measure the prediction error on client data using an auxiliary model. Based on this contribution estimation, we propose a FL method, federated training via contribution estimation (FedCE), i.e., using estimation as global model aggregation weights. We have theoretically analyzed our method and empirically evaluated it on two real-world medical datasets. The effectiveness of our approach has been validated with significant performance improvements, better collaboration fairness, better performance fairness, and comprehensive analytical studies.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors:
Tianyuan Yao,
Nancy Newlin,
Praitayini Kanakaraj,
Vishwesh nath,
Leon Y Cai,
Karthik Ramadass,
Kurt Schilling,
Bennett A. Landman,
Yuankai Huo
Abstract:
Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture…
▽ More
Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
△ Less
Submitted 29 January, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples
Authors:
Jingwei Sun,
Ziyue Xu,
Dong Yang,
Vishwesh Nath,
Wenqi Li,
Can Zhao,
Daguang Xu,
Yiran Chen,
Holger R. Roth
Abstract:
Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overl…
▽ More
Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overlapping samples commonly seen in the real world. We propose a practical vertical federated learning (VFL) framework called \textbf{one-shot VFL} that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. We also propose \textbf{few-shot VFL} to improve the accuracy further with just one more communication round between the server and the clients. In our proposed framework, the clients only need to communicate with the server once or only a few times. We evaluate the proposed VFL framework on both image and tabular datasets. Our methods can improve the accuracy by more than 46.5\% and reduce the communication cost by more than 330$\times$ compared with state-of-the-art VFL methods when evaluated on CIFAR-10. Our code will be made publicly available at \url{https://nvidia.github.io/NVFlare/research/one-shot-vfl}.
△ Less
Submitted 29 March, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
MONAI: An open-source framework for deep learning in healthcare
Authors:
M. Jorge Cardoso,
Wenqi Li,
Richard Brown,
Nic Ma,
Eric Kerfoot,
Yiheng Wang,
Benjamin Murrey,
Andriy Myronenko,
Can Zhao,
Dong Yang,
Vishwesh Nath,
Yufan He,
Ziyue Xu,
Ali Hatamizadeh,
Andriy Myronenko,
Wentao Zhu,
Yun Liu,
Mingxin Zheng,
Yucheng Tang,
Isaac Yang,
Michael Zephyr,
Behrooz Hashemian,
Sachidanand Alle,
Mohammad Zalbagi Darestani,
Charlie Budd
, et al. (32 additional authors not shown)
Abstract:
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geo…
▽ More
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Warm Start Active Learning with Proxy Labels \& Selection via Semi-Supervised Fine-Tuning
Authors:
Vishwesh Nath,
Dong Yang,
Holger R. Roth,
Daguang Xu
Abstract:
Which volume to annotate next is a challenging problem in building medical imaging datasets for deep learning. One of the promising methods to approach this question is active learning (AL). However, AL has been a hard nut to crack in terms of which AL algorithm and acquisition functions are most useful for which datasets. Also, the problem is exacerbated with which volumes to label first when the…
▽ More
Which volume to annotate next is a challenging problem in building medical imaging datasets for deep learning. One of the promising methods to approach this question is active learning (AL). However, AL has been a hard nut to crack in terms of which AL algorithm and acquisition functions are most useful for which datasets. Also, the problem is exacerbated with which volumes to label first when there is zero labeled data to start with. This is known as the cold start problem in AL. We propose two novel strategies for AL specifically for 3D image segmentation. First, we tackle the cold start problem by proposing a proxy task and then utilizing uncertainty generated from the proxy task to rank the unlabeled data to be annotated. Second, we craft a two-stage learning framework for each active iteration where the unlabeled data is also used in the second stage as a semi-supervised fine-tuning strategy. We show the promise of our approach on two well-known large public datasets from medical segmentation decathlon. The results indicate that the initial selection of data and semi-supervised framework both showed significant improvement for several AL strategies.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images
Authors:
Andres Diaz-Pinto,
Sachidanand Alle,
Vishwesh Nath,
Yucheng Tang,
Alvin Ihsani,
Muhammad Asad,
Fernando Pérez-GarcÃa,
Pritesh Mehta,
Wenqi Li,
Mona Flores,
Holger R. Roth,
Tom Vercauteren,
Daguang Xu,
Prerna Dogra,
Sebastien Ourselin,
Andrew Feng,
M. Jorge Cardoso
Abstract:
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the t…
▽ More
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the time required to annotate radiology datasets. Through MONAI Label, researchers can develop AI annotation applications focusing on their domain of expertise. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user interface. Currently, MONAI Label readily supports locally installed (3D Slicer) and web-based (OHIF) frontends and offers two active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their AI-based annotation application by making them available to other researchers and clinicians alike. Additionally, MONAI Label provides sample AI-based interactive and non-interactive labeling applications, that can be used directly off the shelf, as plug-and-play to any given dataset. Significant reduced annotation times using the interactive model can be observed on two public datasets.
△ Less
Submitted 28 April, 2023; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
Authors:
Ali Hatamizadeh,
Vishwesh Nath,
Yucheng Tang,
Dong Yang,
Holger Roth,
Daguang Xu
Abstract:
Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U…
▽ More
Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase. Code: https://monai.io/research/swin-unetr
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
HyperSegNAS: Bridging One-Shot Neural Architecture Search with 3D Medical Image Segmentation using HyperNet
Authors:
Cheng Peng,
Andriy Myronenko,
Ali Hatamizadeh,
Vish Nath,
Md Mahfuzur Rahman Siddiquee,
Yufan He,
Daguang Xu,
Rama Chellappa,
Dong Yang
Abstract:
Semantic segmentation of 3D medical images is a challenging task due to the high variability of the shape and pattern of objects (such as organs or tumors). Given the recent success of deep learning in medical image segmentation, Neural Architecture Search (NAS) has been introduced to find high-performance 3D segmentation network architectures. However, because of the massive computational require…
▽ More
Semantic segmentation of 3D medical images is a challenging task due to the high variability of the shape and pattern of objects (such as organs or tumors). Given the recent success of deep learning in medical image segmentation, Neural Architecture Search (NAS) has been introduced to find high-performance 3D segmentation network architectures. However, because of the massive computational requirements of 3D data and the discrete optimization nature of architecture search, previous NAS methods require a long search time or necessary continuous relaxation, and commonly lead to sub-optimal network architectures. While one-shot NAS can potentially address these disadvantages, its application in the segmentation domain has not been well studied in the expansive multi-scale multi-path search space. To enable one-shot NAS for medical image segmentation, our method, named HyperSegNAS, introduces a HyperNet to assist super-net training by incorporating architecture topology information. Such a HyperNet can be removed once the super-net is trained and introduces no overhead during architecture search. We show that HyperSegNAS yields better performing and more intuitive architectures compared to the previous state-of-the-art (SOTA) segmentation networks; furthermore, it can quickly and accurately find good architecture candidates under different computing constraints. Our method is evaluated on public datasets from the Medical Segmentation Decathlon (MSD) challenge, and achieves SOTA performances.
△ Less
Submitted 24 March, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
Authors:
Yucheng Tang,
Dong Yang,
Wenqi Li,
Holger Roth,
Bennett Landman,
Daguang Xu,
Vishwesh Nath,
Ali Hatamizadeh
Abstract:
Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansforme…
▽ More
Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr
△ Less
Submitted 28 March, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
The Power of Proxy Data and Proxy Networks for Hyper-Parameter Optimization in Medical Image Segmentation
Authors:
Vishwesh Nath,
Dong Yang,
Ali Hatamizadeh,
Anas A. Abidin,
Andriy Myronenko,
Holger Roth,
Daguang Xu
Abstract:
Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by prop…
▽ More
Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by proposing two novel methodologies: proxy data and proxy networks. Both can be useful for estimating hyper-parameters more efficiently. We test the proposed techniques on CT and MR imaging modalities using well-known public datasets. In both cases using one dataset for building proxy data and another data source for external evaluation. For CT, the approach is tested on spleen segmentation with two datasets. The first dataset is from the medical segmentation decathlon (MSD), where the proxy data is constructed, the secondary dataset is utilized as an external validation dataset. Similarly, for MR, the approach is evaluated on prostate segmentation where the first dataset is from MSD and the second dataset is PROSTATEx. First, we show higher correlation to using full data for training when testing on the external validation set using smaller proxy data than a random selection of the proxy data. Second, we show that a high correlation exists for proxy networks when compared with the full network on validation Dice score. Third, we show that the proposed approach of utilizing a proxy network can speed up an AutoML framework for hyper-parameter search by 3.3x, and by 4.4x if proxy data and proxy network are utilized together.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
UNETR: Transformers for 3D Medical Image Segmentation
Authors:
Ali Hatamizadeh,
Yucheng Tang,
Vishwesh Nath,
Dong Yang,
Andriy Myronenko,
Bennett Landman,
Holger Roth,
Daguang Xu
Abstract:
Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the…
▽ More
Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard. Code: https://monai.io/research/unetr
△ Less
Submitted 9 October, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Diminishing Uncertainty within the Training Pool: Active Learning for Medical Image Segmentation
Authors:
Vishwesh Nath,
Dong Yang,
Bennett A. Landman,
Daguang Xu,
Holger R. Roth
Abstract:
Active learning is a unique abstraction of machine learning techniques where the model/algorithm could guide users for annotation of a set of data points that would be beneficial to the model, unlike passive machine learning. The primary advantage being that active learning frameworks select data points that can accelerate the learning process of a model and can reduce the amount of data needed to…
▽ More
Active learning is a unique abstraction of machine learning techniques where the model/algorithm could guide users for annotation of a set of data points that would be beneficial to the model, unlike passive machine learning. The primary advantage being that active learning frameworks select data points that can accelerate the learning process of a model and can reduce the amount of data needed to achieve full accuracy as compared to a model trained on a randomly acquired data set. Multiple frameworks for active learning combined with deep learning have been proposed, and the majority of them are dedicated to classification tasks. Herein, we explore active learning for the task of segmentation of medical imaging data sets. We investigate our proposed framework using two datasets: 1.) MRI scans of the hippocampus, 2.) CT scans of pancreas and tumors. This work presents a query-by-committee approach for active learning where a joint optimizer is used for the committee. At the same time, we propose three new strategies for active learning: 1.) increasing frequency of uncertain data to bias the training data set; 2.) Using mutual information among the input images as a regularizer for acquisition to ensure diversity in the training dataset; 3.) adaptation of Dice log-likelihood for Stein variational gradient descent (SVGD). The results indicate an improvement in terms of data reduction by achieving full accuracy while only using 22.69 % and 48.85 % of the available data for each dataset, respectively.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Semi-supervised Contrastive Learning Using Partial Label Information
Authors:
Colin B. Hansen,
Vishwesh Nath,
Diego A. Mesa,
Yuankai Huo,
Bennett A. Landman,
Thomas A. Lasko
Abstract:
In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the l…
▽ More
In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.
△ Less
Submitted 3 June, 2024; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Deep Learning Estimation of Multi-Tissue Constrained Spherical Deconvolution with Limited Single Shell DW-MRI
Authors:
Vishwesh Nath,
Sudhir K. Pathak,
Kurt G. Schilling,
Walt Schneider,
Bennett A. Landman
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced meth…
▽ More
Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced method for reconstruction of white matter microstructure that uses multi-shell data is multi-tissue constrained spherical deconvolution (MT-CSD). MT-CSD substantially improves the resolution of intra-voxel structure over the traditional single shell version, constrained spherical deconvolution (CSD). Herein, we explore the possibility of using deep learning on single shell data (using the b=1000 s/mm2 from the Human Connectome Project (HCP)) to estimate the information content captured by 8th order MT-CSD using the full three shell data (b=1000, 2000, and 3000 s/mm2 from HCP). Briefly, we examine two network architectures: 1.) Sequential network of fully connected dense layers with a residual block in the middle (ResDNN), 2.) Patch based convolutional neural network with a residual block (ResCNN). For both networks an additional output block for estimation of voxel fraction was used with a modified loss function. Each approach was compared against the baseline of using MT-CSD on all data on 15 subjects from the HCP divided into 5 training, 2 validation, and 8 testing subjects with a total of 6.7 million voxels. The fiber orientation distribution function (fODF) can be recovered with high correlation (0.77 vs 0.74 and 0.65) as compared to the ground truth of MT-CST, which was derived from the multi-shell DW-MRI acquisitions. Source code and models have been made publicly available.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Deep Learning Captures More Accurate Diffusion Fiber Orientations Distributions than Constrained Spherical Deconvolution
Authors:
Vishwesh Nath,
Kurt G. Schilling,
Colin B. Hansen,
Prasanna Parvathaneni,
Allison E. Hainline,
Camilo Bermudez,
Andrew J. Plassard,
Vaibhav Janve,
Yurui Gao,
Justin A. Blaber,
Iwona Stępniewska,
Adam W. Anderson,
Bennett A. Landman
Abstract:
Confocal histology provides an opportunity to establish intra-voxel fiber orientation distributions that can be used to quantitatively assess the biological relevance of diffusion weighted MRI models, e.g., constrained spherical deconvolution (CSD). Here, we apply deep learning to investigate the potential of single shell diffusion weighted MRI to explain histologically observed fiber orientation…
▽ More
Confocal histology provides an opportunity to establish intra-voxel fiber orientation distributions that can be used to quantitatively assess the biological relevance of diffusion weighted MRI models, e.g., constrained spherical deconvolution (CSD). Here, we apply deep learning to investigate the potential of single shell diffusion weighted MRI to explain histologically observed fiber orientation distributions (FOD) and compare the derived deep learning model with a leading CSD approach. This study (1) demonstrates that there exists additional information in the diffusion signal that is not currently exploited by CSD, and (2) provides an illustrative data-driven model that makes use of this information.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Enabling Multi-Shell b-Value Generalizability of Data-Driven Diffusion Models with Deep SHORE
Authors:
Vishwesh Nath,
Ilwoo Lyu,
Kurt G. Schilling,
Prasanna Parvathaneni,
Colin B. Hansen,
Yucheng Tang,
Yuankai Huo,
Vaibhav A. Janve,
Yurui Gao,
Iwona Stepniewska,
Adam W. Anderson,
Bennett A. Landman
Abstract:
Intra-voxel models of the diffusion signal are essential for interpreting organization of the tissue environment at micrometer level with data at millimeter resolution. Recent advances in data driven methods have enabled direct compari-son and optimization of methods for in-vivo data with externally validated histological sections with both 2-D and 3-D histology. Yet, all existing methods make lim…
▽ More
Intra-voxel models of the diffusion signal are essential for interpreting organization of the tissue environment at micrometer level with data at millimeter resolution. Recent advances in data driven methods have enabled direct compari-son and optimization of methods for in-vivo data with externally validated histological sections with both 2-D and 3-D histology. Yet, all existing methods make limiting assumptions of either (1) model-based linkages between b-values or (2) limited associations with single shell data. We generalize prior deep learning models that used single shell spherical harmonic transforms to integrate the re-cently developed simple harmonic oscillator reconstruction (SHORE) basis. To enable learning on the SHORE manifold, we present an alternative formulation of the fiber orientation distribution (FOD) object using the SHORE basis while rep-resenting the observed diffusion weighted data in the SHORE basis. To ensure consistency of hyper-parameter optimization for SHORE, we present our Deep SHORE approach to learn on a data-optimized manifold. Deep SHORE is evalu-ated with eight-fold cross-validation of a preclinical MRI-histology data with four b-values. Generalizability of in-vivo human data is evaluated on two separate 3T MRI scanners. Specificity in terms of angular correlation (ACC) with the preclinical data improved on single shell: 0.78 relative to 0.73 and 0.73, multi-shell: 0.80 relative to 0.74 (p < 0.001). In the in-vivo human data, Deep SHORE was more consistent across scanners with 0.63 relative to other multi-shell methods 0.39, 0.52 and 0.57 in terms of ACC. In conclusion, Deep SHORE is a promising method to enable data driven learning with DW-MRI under conditions with varying b-values, number of diffusion shells, and gradient directions per shell.
△ Less
Submitted 22 February, 2020; v1 submitted 14 July, 2019;
originally announced July 2019.
-
Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury
Authors:
Samuel Remedios,
Snehashis Roy,
Justin Blaber,
Camilo Bermudez,
Vishwesh Nath,
Mayur B. Patel,
John A. Butman,
Bennett A. Landman,
Dzung L. Pham
Abstract:
Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available…
▽ More
Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: 1) the model gains access to training on a larger dataset that it could not normally obtain and 2) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt University Medical Center (VUMC) without compromising PHI. Three neural networks are trained to convergence on a computed tomography (CT) brain hematoma segmentation task: one only with NIH data,one only with VUMC data, and one multi-site model alternating between NIH and VUMC data. Resultant lesion masks with the multi-site model attain an average Dice similarity coefficient of 0.64 and the automatically segmented hematoma volumes correlate to those done manually with a Pearson correlation coefficient of 0.87,corresponding to an 8% and 5% improvement, respectively, over the single-site model counterparts.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised Learning
Authors:
Yuankai Huo,
James G. Terry,
Jiachen Wang,
Vishwesh Nath,
Camilo Bermudez,
Shunxing Bao,
Prasanna Parvathaneni,
J. Jeffery Carr,
Bennett A. Landman
Abstract:
Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we pr…
▽ More
Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we propose the attention identical dual network (AID-Net) to perform CAC detection using scan-rescan longitudinal non-contrast CT scans with weakly supervised attention by only using per scan level labels. To leverage the performance, 3D attention mechanisms were integrated into the AID-Net to provide complementary information for classification tasks. Moreover, the 3D Gradient-weighted Class Activation Mapping (Grad-CAM) was also proposed at the testing stage to interpret the behaviors of the deep neural network. 5075 non-contrast chest CT scans were used as training, validation and testing datasets. Baseline performance was assessed on the same cohort. From the results, the proposed AID-Net achieved the superior performance on classification accuracy (0.9272) and AUC (0.9627).
△ Less
Submitted 10 November, 2018;
originally announced November 2018.
-
Inter-Scanner Harmonization of High Angular Resolution DW-MRI using Null Space Deep Learning
Authors:
Vishwesh Nath,
Prasanna Parvathaneni,
Colin B. Hansen,
Allison E. Hainline,
Camilo Bermudez,
Samuel Remedios,
Justin A. Blaber,
Kurt G. Schilling,
Ilwoo Lyu,
Vaibhav Janve,
Yurui Gao,
Iwona Stepniewska,
Baxter P. Rogers,
Allen T. Newton,
L. Taylor Davis,
Jeff Luci,
Adam W. Anderson,
Bennett A. Landman
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) allows for non-invasive imaging of the local fiber architecture of the human brain at a millimetric scale. Multiple classical approaches have been proposed to detect both single (e.g., tensors) and multiple (e.g., constrained spherical deconvolution, CSD) fiber population orientations per voxel. However, existing techniques generally exhibit l…
▽ More
Diffusion-weighted magnetic resonance imaging (DW-MRI) allows for non-invasive imaging of the local fiber architecture of the human brain at a millimetric scale. Multiple classical approaches have been proposed to detect both single (e.g., tensors) and multiple (e.g., constrained spherical deconvolution, CSD) fiber population orientations per voxel. However, existing techniques generally exhibit low reproducibility across MRI scanners. Herein, we propose a data-driven tech-nique using a neural network design which exploits two categories of data. First, training data were acquired on three squirrel monkey brains using ex-vivo DW-MRI and histology of the brain. Second, repeated scans of human subjects were acquired on two different scanners to augment the learning of the network pro-posed. To use these data, we propose a new network architecture, the null space deep network (NSDN), to simultaneously learn on traditional observed/truth pairs (e.g., MRI-histology voxels) along with repeated observations without a known truth (e.g., scan-rescan MRI). The NSDN was tested on twenty percent of the histology voxels that were kept completely blind to the network. NSDN significantly improved absolute performance relative to histology by 3.87% over CSD and 1.42% over a recently proposed deep neural network approach. More-over, it improved reproducibility on the paired data by 21.19% over CSD and 10.09% over a recently proposed deep approach. Finally, NSDN improved gen-eralizability of the model to a third in vivo human scanner (which was not used in training) by 16.08% over CSD and 10.41% over a recently proposed deep learn-ing approach. This work suggests that data-driven approaches for local fiber re-construction are more reproducible, informative and precise and offers a novel, practical method for determining these models.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.