Search | arXiv e-print repository

Mastoidectomy Multi-View Synthesis from a Single Microscopy Image

Abstract: Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purp… ▽ More Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purpose. We manually align the surface with a selected microscope frame to obtain an accurate initial pose of the reconstructed CT mesh relative to the microscope. We then perform UV projection to transfer the colors from the frame to surface textures. Novel views of the textured surface can be used to generate a large dataset of synthetic frames with ground truth poses. We evaluated the quality of synthetic views rendered using Pytorch3D and PyVista. We found both rendering engines lead to similarly high-quality synthetic novel-view frames compared to ground truth with a structural similarity index for both methods averaging about 0.86. A large dataset of novel views with known poses is critical for ongoing training of a method to automatically estimate microscope pose for 2D to 3D registration with the pre-operative CT to facilitate augmented reality surgery. This dataset will empower various downstream tasks, such as integrating Augmented Reality (AR) in the OR, tracking surgical tools, and supporting other video analysis studies. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: Submitted to Medical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling

arXiv:2408.09931 [pdf, other]

Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation

Authors: Qianhui Men, Xiaoqing Guo, Aris T. Papageorghiou, J. Alison Noble

Abstract: 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the pr… ▽ More 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the proposed Pose-GuideNet, a novel 2D/3D registration approach to align freehand 2D ultrasound to a 3D anatomical atlas without the acquisition of 3D ultrasound. To facilitate the 2D to 3D cross-dimensional projection, we exploit the prior knowledge in the atlas to align the standard plane frame in a freehand scan. A semantic-aware contrastive-based approach is further proposed to align the frames that are off standard planes based on their anatomical similarity. In the experiment, we enhance the existing assessment of freehand image localization by comparing the transformation of its estimated pose towards standard plane with the corresponding probe motion, which reflects the actual view change in 3D anatomy. Extensive results on two clinical head biometry tasks show that Pose-GuideNet not only accurately predicts pose but also successfully predicts the direction of the fetal head. Evaluations with probe motions further demonstrate the feasibility of adopting Pose-GuideNet for freehand ultrasound-assisted navigation in a sensor-free environment. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: Accepted by MICCAI2024

arXiv:2408.08652 [pdf, other]

TextCAVs: Debugging vision models using text

Authors: Angus Nicolson, Yarin Gal, J. Alison Noble

Abstract: Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs:… ▽ More Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 11 pages, 2 figures. Accepted at iMIMIC Workshop at MICCAI 2024

ACM Class: I.2.1; I.2.6

arXiv:2408.03761 [pdf, other]

MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video

Authors: Xiaoqing Guo, Qianhui Men, J. Alison Noble

Abstract: We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyf… ▽ More We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual prior. The MMSummary system provides comprehensive summaries for fetal ultrasound examinations and based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance clinical workflow efficiency. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: MICCAI 2024

arXiv:2408.01648 [pdf]

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

Authors: Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

Abstract: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot perfo… ▽ More The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and effectiveness in the surgical domain. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: The first work evaluates the performance of SAM 2 in surgical videos

arXiv:2407.15787 [pdf, other]

M&M: Unsupervised Mamba-based Mastoidectomy for Cochlear Implant Surgery with Noisy Data

Authors: Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble

Abstract: Cochlear Implant (CI) procedures involve inserting an array of electrodes into the cochlea located inside the inner ear. Mastoidectomy is a surgical procedure that uses a high-speed drill to remove part of the mastoid region of the temporal bone, providing safe access to the cochlea through the middle and inner ear. We aim to develop an intraoperative navigation system that registers plans created… ▽ More Cochlear Implant (CI) procedures involve inserting an array of electrodes into the cochlea located inside the inner ear. Mastoidectomy is a surgical procedure that uses a high-speed drill to remove part of the mastoid region of the temporal bone, providing safe access to the cochlea through the middle and inner ear. We aim to develop an intraoperative navigation system that registers plans created using 3D preoperative Computerized Tomography (CT) volumes with the 2D surgical microscope view. Herein, we propose a method to synthesize the mastoidectomy volume using only the preoperative CT scan, where the mastoid is intact. We introduce an unsupervised learning framework designed to synthesize mastoidectomy. For model training purposes, this method uses postoperative CT scans to avoid manual data cleaning or labeling, even when the region removed during mastoidectomy is visible but affected by metal artifacts, low signal-to-noise ratio, or electrode wiring. Our approach estimates mastoidectomy regions with a mean dice score of 70.0%. This approach represents a major step forward for CI intraoperative navigation by predicting realistic mastoidectomy-removed regions in preoperative planning that can be used to register the pre-surgery plan to intraoperative microscopy. △ Less

Submitted 18 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2406.11636 [pdf, other]

Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities

Authors: Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous database… ▽ More Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous databases that may contain scans and segmentation labels for different types of brain pathologies and diverse sets of MRI modalities. Additionally, the sensitivity of patient data often prevents centrally aggregating data, necessitating a decentralized approach. Is it feasible to use Federated Learning (FL) to train a single model on client databases that contain scans and labels of different brain pathologies and diverse sets of MRI modalities? We demonstrate promising results by combining appropriate, simple, and practical modifications to the model and training strategy: Designing a model with input channels that cover the whole set of modalities available across clients, training with random modality drop, and exploring the effects of feature normalization methods. Evaluation on 7 brain MRI databases with 5 different diseases shows that such FL framework can train a single model that is shown to be very promising in segmenting all disease types seen during training. Importantly, it is able to segment these diseases in new databases that contain sets of modalities different from those in training clients. These results demonstrate, for the first time, the feasibility and effectiveness of using Federated Learning to train a single 3D segmentation model on decentralised data with diverse brain diseases and MRI modalities, a necessary step towards leveraging heterogeneous real-world databases. Code will be made available at: https://github.com/FelixWag/FL-MultiDisease-MRI △ Less

Submitted 20 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

ACM Class: I.4.9; I.4.6; I.2.11; I.4.0

arXiv:2406.02422 [pdf, other]

IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

Authors: Ziyun Liang, Xiaoqing Guo, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They inten… ▽ More Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2 △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.03713 [pdf, other]

Explaining Explainability: Understanding Concept Activation Vectors

Authors: Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

Abstract: Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using… ▽ More Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: (54 pages, 39 figures)

ACM Class: I.2.6

arXiv:2403.07219 [pdf, other]

Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery

Authors: Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble

Abstract: For those experiencing severe-to-profound sensorineural hearing loss, the cochlear implant (CI) is the preferred treatment. Augmented reality (AR) aided surgery can potentially improve CI procedures and hearing outcomes. Typically, AR solutions for image-guided surgery rely on optical tracking systems to register pre-operative planning information to the display so that hidden anatomy or other imp… ▽ More For those experiencing severe-to-profound sensorineural hearing loss, the cochlear implant (CI) is the preferred treatment. Augmented reality (AR) aided surgery can potentially improve CI procedures and hearing outcomes. Typically, AR solutions for image-guided surgery rely on optical tracking systems to register pre-operative planning information to the display so that hidden anatomy or other important information can be overlayed and co-registered with the view of the surgical scene. In this paper, our goal is to develop a method that permits direct 2D-to-3D registration of the microscope video to the pre-operative Computed Tomography (CT) scan without the need for external tracking equipment. Our proposed solution involves using surface mapping of a portion of the incus in surgical recordings and determining the pose of this structure relative to the surgical microscope by performing pose estimation via the perspective-n-point (PnP) algorithm. This registration can then be applied to pre-operative segmentations of other anatomy-of-interest, as well as the planned electrode insertion trajectory to co-register this information for the AR display. Our results demonstrate the accuracy with an average rotation error of less than 25 degrees and a translation error of less than 2 mm, 3 mm, and 0.55% for the x, y, and z axes, respectively. Our proposed method has the potential to be applicable and generalized to other surgical procedures while only needing a monocular microscope during intra-operation. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.02265 [pdf, other]

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Authors: Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu

Abstract: Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D… ▽ More Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024. Paper + supplementary material

arXiv:2402.10728 [pdf, other]

Semi-weakly-supervised neural network training for medical image registration

Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialised effort. This paper describes a semi-weakly-supervised registration pipeline that improves the model performance, when only a small corresponding-ROI-labelled dataset is available, by exploiting unlabelled image pairs. We examine two types of augmentation methods by perturbation on network weights and image resampling, such that consistency-based unsupervised losses can be applied on unlabelled data. The novel WarpDDF and RegCut approaches are proposed to allow commutative perturbation between an image pair and the predicted spatial transformation (i.e. respective input and output of registration networks), distinct from existing perturbation methods for classification or segmentation. Experiments using 589 male pelvic MR images, labelled with eight anatomical ROIs, show the improvement in registration performance and the ablated contributions from the individual strategies. Furthermore, this study attempts to construct one of the first computational atlases for pelvic structures, enabled by registering inter-subject MRs, and quantifies the significant differences due to the proposed semi-weak supervision with a discussion on the potential clinical use of example atlas-derived statistics. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10425 [pdf]

DABS-LS: Deep Atlas-Based Segmentation Using Regional Level Set Self-Supervision

Authors: Hannah G. Mason, Jack H. Noble

Abstract: Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANFs) can help audiologists improve the CI programming. These models require localization of the ANFs relative to surrounding anatomy and the CI. Localization is challenging because the ANFs are so small they are not di… ▽ More Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANFs) can help audiologists improve the CI programming. These models require localization of the ANFs relative to surrounding anatomy and the CI. Localization is challenging because the ANFs are so small they are not directly visible in clinical imaging. In this work, we hypothesize the position of the ANFs can be accurately inferred from the location of the internal auditory canal (IAC), which has high contrast in CT, since the ANFs pass through this canal between the cochlea and the brain. Inspired by VoxelMorph, in this paper we propose a deep atlas-based IAC segmentation network. We create a single atlas in which the IAC and ANFs are pre-localized. Our network is trained to produce deformation fields (DFs) mapping coordinates from the atlas to new target volumes and that accurately segment the IAC. We hypothesize that DFs that accurately segment the IAC in target images will also facilitate accurate atlas-based localization of the ANFs. As opposed to VoxelMorph, which aims to produce DFs that accurately register the entire volume, our novel contribution is an entirely self-supervised training scheme that aims to produce DFs that accurately segment the target structure. This self-supervision is facilitated using a regional level set (LS) inspired loss function. We call our method Deep Atlas Based Segmentation using Level Sets (DABS-LS). Results show that DABS-LS outperforms VoxelMorph for IAC segmentation. Tests with publicly available datasets for trachea and kidney segmentation also show significant improvement in segmentation accuracy, demonstrating the generalizability of the method. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.05294 [pdf, other]

Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection

Authors: Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, J. Alison Noble

Abstract: Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with… ▽ More Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 42 pages

arXiv:2402.00247 [pdf, other]

doi 10.1145/3643763

Towards AI-Assisted Synthesis of Verified Dafny Methods

Authors: Md Rakib Hossain Misu, Cristina V. Lopes, Iris Ma, James Noble

Abstract: Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs' specifications as well as code so that the code can be proved correct with respect to the sp… ▽ More Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs' specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing large language models show a severe lack of proficiency in verified programming. In this paper, we demonstrate how to improve two pretrained models' proficiency in the Dafny verification-aware language. Using 178 problems from the MBPP dataset, we prompt two contemporary models (GPT-4 and PaLM-2) to synthesize Dafny methods. We use three different types of prompts: a direct Contextless prompt; a Signature prompt that includes a method signature and test cases, and a Chain of Thought (CoT) prompt that decomposes the problem into steps and includes retrieval augmentation generated example problems and solutions. Our results show that GPT-4 performs better than PaLM-2 on these tasks and that both models perform best with the retrieval augmentation generated CoT prompt. GPT-4 was able to generate verified, human-evaluated, Dafny methods for 58% of the problems, however, GPT-4 managed only 19% of the problems with the Contextless prompt, and even fewer (10%) for the Signature prompt. We are thus able to contribute 153 verified Dafny solutions to MBPP problems, 50 that we wrote manually, and 103 synthesized by GPT-4. Our results demonstrate that the benefits of formal program verification are now within reach of code generating large language models... △ Less

Submitted 10 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: This is an author provided preprint. The final version will be published at Proc. ACM Softw. Eng; FSE 2024, in July 2024

arXiv:2311.00469 [pdf, other]

Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos

Authors: Divyanshu Mishra, He Zhao, Pramit Saha, Aris T. Papageorghiou, J. Alison Noble

Abstract: Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when… ▽ More Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Published in MICCAI 2023

arXiv:2310.18815 [pdf, other]

Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data

Authors: Pramit Saha, Divyanshu Mishra, J. Alison Noble

Abstract: The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unl… ▽ More The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Published in MICCAI 2023 with early acceptance and selected as 1 of the top 20 poster highlights under the category: Which work has the potential to impact other applications of AI and CV

arXiv:2310.16477 [pdf, other]

Show from Tell: Audio-Visual Modelling in Clinical Settings

Authors: Jianbo Jiao, Mohammad Alsharid, Lior Drukker, Aris T. Papageorghiou, Andrew Zisserman, J. Alison Noble

Abstract: Auditory and visual signals usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, the audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals -- usually speech. In this paper, we consider a… ▽ More Auditory and visual signals usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, the audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals -- usually speech. In this paper, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without human expert annotation. A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose. The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference. Experimental evaluations on a large-scale clinical multi-modal ultrasound video dataset show that the proposed self-supervised method learns good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.02983 [pdf, other]

doi 10.1145/3622846

Reference Capabilities for Flexible Memory Management: Extended Version

Authors: Ellen Arvidsson, Elias Castegren, Sylvan Clebsch, Sophia Drossopoulou, James Noble, Matthew J. Parkinson, Tobias Wrigstad

Abstract: Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within on… ▽ More Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within one active region -- its "window of mutability". Memory management costs are localised to the active region, ensuring overheads can be predicted and controlled. Moving the mutability window between regions is explicit, so code can be executed wherever it is required, yet programs remain in control of memory use. An ownership type system based on reference capabilities enforces region isolation, controlling aliasing within and between regions, yet supporting objects moving between regions and threads. Data accesses never need expensive atomic operations, and are always thread-safe. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 87 pages, 10 figures, 5 listings, 4 tables. Extended version of paper to be published at OOPSLA 2023

arXiv:2308.11776 [pdf]

WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters

Authors: Ange Lou, Jack Noble

Abstract: Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more atten… ▽ More Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation. △ Less

Submitted 5 February, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by SPIE 2024

arXiv:2308.11774 [pdf]

SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)

Authors: Ange Lou, Yamin Li, Xing Yao, Yike Zhang, Jack Noble

Abstract: The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D positio… ▽ More The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery.The source code will be released soon. △ Less

Submitted 5 February, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by SPIE 2024

arXiv:2305.07152 [pdf, other]

Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge

Authors: Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai, Winnie Pang , et al. (46 additional authors not shown)

Abstract: The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train… ▽ More The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train machine learning models to identify and localize surgical tools is a difficult task. Annotating bounding boxes frame-by-frame is tedious and time-consuming, yet large amounts of data with a wide variety of surgical tools and surgeries must be captured for robust training. Moreover, ongoing annotator training is needed to stay up to date with surgical instrument innovation. In robotic-assisted surgery, however, potentially informative data like timestamps of instrument installation and removal can be programmatically harvested. The ability to rely on tool installation data alone would significantly reduce the workload to train robust tool-tracking models. With this motivation in mind we invited the surgical data science community to participate in the challenge, SurgToolLoc 2022. The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools and localize them in video frames with bounding boxes. We present the results of this challenge along with many of the team's efforts. We conclude by discussing these results in the broader context of machine learning and surgical data science. The training data used for this challenge consisting of 24,695 video clips with tool presence labels is also being released publicly and can be accessed at https://console.cloud.google.com/storage/browser/isi-surgtoolloc-2022. △ Less

Submitted 31 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2302.07967 [pdf, other]

Self-supervised Registration and Segmentation of the Ossicles with A Single Ground Truth Label

Authors: Yike Zhang, Jack Noble

Abstract: AI-assisted surgeries have drawn the attention of the medical image research community due to their real-world impact on improving surgery success rates. For image-guided surgeries, such as Cochlear Implants (CIs), accurate object segmentation can provide useful information for surgeons before an operation. Recently published image segmentation methods that leverage machine learning usually rely o… ▽ More AI-assisted surgeries have drawn the attention of the medical image research community due to their real-world impact on improving surgery success rates. For image-guided surgeries, such as Cochlear Implants (CIs), accurate object segmentation can provide useful information for surgeons before an operation. Recently published image segmentation methods that leverage machine learning usually rely on a large number of manually predefined ground truth labels. However, it is a laborious and time-consuming task to prepare the dataset. This paper presents a novel technique using a self-supervised 3D-UNet that produces a dense deformation field between an atlas and a target image that can be used for atlas-based segmentation of the ossicles. Our results show that our method outperforms traditional image segmentation methods and generates a more accurate boundary around the ossicles based on Dice similarity coefficient and point-to-point error comparison. The mean Dice coefficient is improved by 8.51% with our proposed method. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: conference

arXiv:2211.14467 [pdf]

Self-Supervised Surgical Instrument 3D Reconstruction from a Single Camera Image

Authors: Ange Lou, Xing Yao, Ziteng Liu, Jintong Han, Jack Noble

Abstract: Surgical instrument tracking is an active research area that can provide surgeons feedback about the location of their tools relative to anatomy. Recent tracking methods are mainly divided into two parts: segmentation and object detection. However, both can only predict 2D information, which is limiting for application to real-world surgery. An accurate 3D surgical instrument model is a prerequisi… ▽ More Surgical instrument tracking is an active research area that can provide surgeons feedback about the location of their tools relative to anatomy. Recent tracking methods are mainly divided into two parts: segmentation and object detection. However, both can only predict 2D information, which is limiting for application to real-world surgery. An accurate 3D surgical instrument model is a prerequisite for precise predictions of the pose and depth of the instrument. Recent single-view 3D reconstruction methods are only used in natural object reconstruction and do not achieve satisfying reconstruction accuracy without 3D attribute-level supervision. Further, those methods are not suitable for the surgical instruments because of their elongated shapes. In this paper, we firstly propose an end-to-end surgical instrument reconstruction system -- Self-supervised Surgical Instrument Reconstruction (SSIR). With SSIR, we propose a multi-cycle-consistency strategy to help capture the texture information from a slim instrument while only requiring a binary instrument label map. Experiments demonstrate that our approach improves the reconstruction quality of surgical instruments compared to other self-supervised methods and achieves promising results. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted by SPIE Medical Imaging 2023

arXiv:2209.08205 [pdf, other]

Necessity Specifications for Robustness

Authors: Julian Mackay, Sophia Drossopoulou, James Noble, Susan Eisenbach

Abstract: Robust modules guarantee to do only what they are supposed to do - even in the presence of untrusted, malicious clients, and considering not just the direct behaviour of individual methods, but also the emergent behaviour from calls to more than one method. Necessity is a language for specifying robustness, based on novel necessity operators capturing temporal implication, and a proof logic that d… ▽ More Robust modules guarantee to do only what they are supposed to do - even in the presence of untrusted, malicious clients, and considering not just the direct behaviour of individual methods, but also the emergent behaviour from calls to more than one method. Necessity is a language for specifying robustness, based on novel necessity operators capturing temporal implication, and a proof logic that derives explicit robustness specifications from functional specifications. Soundness and an exemplar proof are mechanised in Coq. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.05160 [pdf, other]

Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration

Authors: Yiwen Li, Yunguan Fu, Iani Gayo, Qianye Yang, Zhe Min, Shaheer Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Henkjan Huisman, Dean Barratt, Victor Adrian Prisacariu, Yipeng Hu

Abstract: The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively ada… ▽ More The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting structures that are absent in training, using only a few labelled images from a different institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment, support mask conditioning module is proposed to further utilise the annotation available from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the support mask conditioning, all of which made positive contributions independently or collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether the support data come from the same or different institutes. △ Less

Submitted 25 August, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: accepted by Medical Image Analysis

arXiv:2208.10642 [pdf, other]

Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound

Authors: Zeyu Fu, Jianbo Jiao, Robail Yasrab, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics.… ▽ More Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics. In this paper, we propose to improve visual representations of medical images via anatomy-aware contrastive learning (AWCL), which incorporates anatomy information to augment the positive/negative pair sampling in a contrastive learning manner. The proposed approach is demonstrated for automated fetal ultrasound imaging tasks, enabling the positive pairs from the same or different ultrasound scans that are anatomically similar to be pulled together and thus improving the representation learning. We empirically investigate the effect of inclusion of anatomy information with coarse- and fine-grained granularity, for contrastive learning and find that learning with fine-grained anatomy information which preserves intra-class difference is more effective than its counterpart. We also analyze the impact of anatomy ratio on our AWCL framework and find that using more distinct but anatomically similar samples to compose positive pairs results in better quality representations. Experiments on a large-scale fetal ultrasound dataset demonstrate that our approach is effective for learning representations that transfer well to three clinical downstream tasks, and achieves superior performance compared to ImageNet supervised and the current state-of-the-art contrastive learning methods. In particular, AWCL outperforms ImageNet supervised method by 13.8% and state-of-the-art contrastive-based method by 7.1% on a cross-domain segmentation task. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: ECCV-MCV 2022

arXiv:2207.12833 [pdf, other]

Multimodal-GuideNet: Gaze-Probe Bidirectional Guidance in Obstetric Ultrasound Scanning

Authors: Qianhui Men, Clare Teng, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: Eye trackers can provide visual guidance to sonographers during ultrasound (US) scanning. Such guidance is potentially valuable for less experienced operators to improve their scanning skills on how to manipulate the probe to achieve the desired plane. In this paper, a multimodal guidance approach (Multimodal-GuideNet) is proposed to capture the stepwise dependency between a real-world US video si… ▽ More Eye trackers can provide visual guidance to sonographers during ultrasound (US) scanning. Such guidance is potentially valuable for less experienced operators to improve their scanning skills on how to manipulate the probe to achieve the desired plane. In this paper, a multimodal guidance approach (Multimodal-GuideNet) is proposed to capture the stepwise dependency between a real-world US video signal, synchronized gaze, and probe motion within a unified framework. To understand the causal relationship between gaze movement and probe motion, our model exploits multitask learning to jointly learn two related tasks: predicting gaze movements and probe signals that an experienced sonographer would perform in routine obstetric scanning. The two tasks are associated by a modality-aware spatial graph to detect the co-occurrence among the multi-modality inputs and share useful cross-modal information. Instead of a deterministic scanning path, Multimodal-GuideNet allows for scanning diversity by estimating the probability distribution of real scans. Experiments performed with three typical obstetric scanning examinations show that the new approach outperforms single-task learning for both probe motion guidance and gaze movement prediction. Multimodal-GuideNet also provides a visual guidance signal with an error rate of less than 10 pixels for a 224x288 US image. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: Early accepted by MICCAI 2022

arXiv:2205.00795 [pdf, ps, other]

Rusty Links in Local Chains

Authors: James Noble, Julian Mackay, Tobias Wrigstad

Abstract: Rust successfully applies ownership types to control memory allocation. This restricts the programs' topologies to the point where doubly-linked lists cannot be programmed in Safe Rust. We sketch how more flexible "local" ownership could be added to Rust, permitting multiple mutable references to objects, provided each reference is bounded by the object's lifetime. To maintain thread-safety, local… ▽ More Rust successfully applies ownership types to control memory allocation. This restricts the programs' topologies to the point where doubly-linked lists cannot be programmed in Safe Rust. We sketch how more flexible "local" ownership could be added to Rust, permitting multiple mutable references to objects, provided each reference is bounded by the object's lifetime. To maintain thread-safety, locally owned objects must remain thread-local; to maintain memory safety, local objects can be deallocated when their owner's lifetime expires. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2205.00787 [pdf, other]

More Programming Than Programming: Teaching Formal Methods in a Software Engineering Programme

Authors: James Noble, David Streader, Isaac Oscar Gariano, Miniruwani Samarakoon

Abstract: Formal methods for software correctness are critical to the future of software engineering - and so must be an essential part of software engineering education. Unfortunately, formal methods are often resisted by students due to perceived difficulty, mathematicity, and practical irrelevance. We redeveloped our software correctness course by taking a programming intensive approach, using the solver… ▽ More Formal methods for software correctness are critical to the future of software engineering - and so must be an essential part of software engineering education. Unfortunately, formal methods are often resisted by students due to perceived difficulty, mathematicity, and practical irrelevance. We redeveloped our software correctness course by taking a programming intensive approach, using the solver-aided language Dafny to provide instant formative feedback via automated assessment. Our redeveloped course increased student retention and resulted in the best evaluation for the course for at least ten years. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2203.15177 [pdf]

Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation

Authors: Ange Lou, Kareem Tawfik, Xing Yao, Ziteng Liu, Jack Noble

Abstract: A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view t… ▽ More A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view training by employing classifiers and projectors to build all-negative, and positive and negative feature pairs, respectively, to formulate the learning as solving a MMS problem. The all-negative pairs are used to supervise the networks learning from different views and to capture general features, and the consistency of unlabeled predictions is measured by pixel-wise contrastive loss between positive and negative pairs. To quantitatively and qualitatively evaluate our proposed method, we test it on four public endoscopy surgical tool segmentation datasets and one cochlear implant surgery dataset, which we manually annotated. Results indicate that our proposed method consistently outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms. And our semi-supervised segmentation algorithm can successfully recognize unknown surgical tools and provide good predictions. Also, our MMS approach could achieve inference speeds of about 40 frames per second (fps) and is suitable to deal with the real-time video segmentation. △ Less

Submitted 22 February, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2203.14258 [pdf, other]

doi 10.1016/j.media.2022.102427

Image quality assessment for machine learning tasks using meta-reinforcement learning

Authors: Shaheer U. Saeed, Yunguan Fu, Vasilis Stavrinides, Zachary M. C. Baum, Qianye Yang, Mirabela Rusu, Richard E. Fan, Geoffrey A. Sonn, J. Alison Noble, Dean C. Barratt, Yipeng Hu

Abstract: In this paper, we consider image quality assessment (IQA) as a measure of how images are amenable with respect to a given downstream task, or task amenability. When the task is performed using machine learning algorithms, such as a neural-network-based task predictor for image classification or segmentation, the performance of the task predictor provides an objective estimate of task amenability.… ▽ More In this paper, we consider image quality assessment (IQA) as a measure of how images are amenable with respect to a given downstream task, or task amenability. When the task is performed using machine learning algorithms, such as a neural-network-based task predictor for image classification or segmentation, the performance of the task predictor provides an objective estimate of task amenability. In this work, we use an IQA controller to predict the task amenability which, itself being parameterised by neural networks, can be trained simultaneously with the task predictor. We further develop a meta-reinforcement learning framework to improve the adaptability for both IQA controllers and task predictors, such that they can be fine-tuned efficiently on new datasets or meta-tasks. We demonstrate the efficacy of the proposed task-specific, adaptable IQA approach, using two clinical applications for ultrasound-guided prostate intervention and pneumonia detection on X-ray images. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Comments: Accepted to Medical Image Analysis; Final published version available at: https://doi.org/10.1016/j.media.2022.102427

Journal ref: Medical Image Analysis, Volume 78, 2022, 102427, ISSN 1361-8415

arXiv:2112.01534 [pdf]

Learning to automate cryo-electron microscopy data collection with Ptolemy

Authors: Paul T. Kim, Alex J. Noble, Anchi Cheng, Tristan Bepler

Abstract: Over the past decade, cryogenic electron microscopy (cryo-EM) has emerged as a primary method for determining near-native, near-atomic resolution 3D structures of biological macromolecules. In order to meet increasing demand for cryo-EM, automated methods to improve throughput and efficiency while lowering costs are needed. Currently, all high-magnification cryo-EM data collection softwares requir… ▽ More Over the past decade, cryogenic electron microscopy (cryo-EM) has emerged as a primary method for determining near-native, near-atomic resolution 3D structures of biological macromolecules. In order to meet increasing demand for cryo-EM, automated methods to improve throughput and efficiency while lowering costs are needed. Currently, all high-magnification cryo-EM data collection softwares require human input and manual tuning of parameters. Expert operators must navigate low- and medium-magnification images to find good high-magnification collection locations. Automating this is non-trivial: the images suffer from low signal-to-noise ratio and are affected by a range of experimental parameters that can differ for each collection session. Here, we use various computer vision algorithms, including mixture models, convolutional neural networks, and U-Nets to develop the first pipeline to automate low- and medium-magnification targeting. Learned models in this pipeline are trained on a large internal dataset of images from real world cryo-EM data collection sessions, labeled with locations that were selected by operators. Using these models, we show that we can effectively detect and classify regions of interest in low- and medium-magnification images, and can generalize to unseen sessions, as well as to images captured using different microscopes from external facilities. We expect our open-source pipeline, Ptolemy, will be both immediately useful as a tool for automation of cryo-EM data collection, and serve as a foundation for future advanced methods for efficient and automated cryo-EM microscopy. △ Less

Submitted 14 January, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Main: 12 pages, 11 figures. Appendix: 2 pages, 1 figure

ACM Class: I.4.9; J.3

arXiv:2109.07541 [pdf, other]

Dala: A Simple Capability-Based Dynamic Language Design For Data Race-Freedom

Authors: Kiko Fernandez-Reyes, Isaac Oscar Gariano, James Noble, Erin Greenwood-Thessman, Michael Homer, Tobias Wrigstad

Abstract: Dynamic languages like Erlang, Clojure, JavaScript, and E adopted data-race freedom by design. To enforce data-race freedom, these languages either deep copy objects during actor (thread) communication or proxy back to their owning thread. We present Dala, a simple programming model that ensures data-race freedom while supporting efficient inter-thread communication. Dala is a dynamic, concurrent,… ▽ More Dynamic languages like Erlang, Clojure, JavaScript, and E adopted data-race freedom by design. To enforce data-race freedom, these languages either deep copy objects during actor (thread) communication or proxy back to their owning thread. We present Dala, a simple programming model that ensures data-race freedom while supporting efficient inter-thread communication. Dala is a dynamic, concurrent, capability-based language that relies on three core capabilities: immutable values can be shared freely; isolated mutable objects can be transferred between threads but not aliased; local objects can be aliased within their owning thread but not dereferenced by other threads. Objects with capabilities can co-exist with unsafe objects, that are unchecked and may suffer data races, without compromising the safety of safe objects. We present a formal model of Dala, prove data race-freedom and state and prove a dynamic gradual guarantee. These theorems guarantee data race-freedom when using safe capabilities and show that the addition of capabilities is semantics preserving modulo permission and cast errors. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2109.05485 [pdf, other]

Facial Anatomical Landmark Detection using Regularized Transfer Learning with Application to Fetal Alcohol Syndrome Recognition

Authors: Zeyu Fu, Jianbo Jiao, Michael Suttie, J. Alison Noble

Abstract: Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the… ▽ More Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not wellsuited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focuses on adjusting the pre-trained weights, the proposed learning approach regularizes the model behavior. It explicitly reuses the rich visual semantics of a domain-similar source model on the target task data as an additional supervisory signal for regularizing landmark detection optimization. Specifically, we develop four regularization constraints for the proposed transfer learning, including constraining the feature outputs from classification and intermediate layers, as well as matching activation attention maps in both spatial and channel levels. Experimental evaluation on a collected clinical imaging dataset demonstrate that the proposed approach can effectively improve model generalizability under limited training samples, and is advantageous to other approaches in the literature. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: To appear in IEEE journal of Biomedical and Health Informatics 2021

arXiv:2108.04359 [pdf, other]

Adaptable image quality assessment using meta-reinforcement learning of task amenability

Authors: Shaheer U. Saeed, Yunguan Fu, Vasilis Stavrinides, Zachary M. C. Baum, Qianye Yang, Mirabela Rusu, Richard E. Fan, Geoffrey A. Sonn, J. Alison Noble, Dean C. Barratt, Yipeng Hu

Abstract: The performance of many medical image analysis tasks are strongly associated with image data quality. When developing modern deep learning algorithms, rather than relying on subjective (human-based) image quality assessment (IQA), task amenability potentially provides an objective measure of task-specific image quality. To predict task amenability, an IQA agent is trained using reinforcement learn… ▽ More The performance of many medical image analysis tasks are strongly associated with image data quality. When developing modern deep learning algorithms, rather than relying on subjective (human-based) image quality assessment (IQA), task amenability potentially provides an objective measure of task-specific image quality. To predict task amenability, an IQA agent is trained using reinforcement learning (RL) with a simultaneously optimised task predictor, such as a classification or segmentation neural network. In this work, we develop transfer learning or adaptation strategies to increase the adaptability of both the IQA agent and the task predictor so that they are less dependent on high-quality, expert-labelled training data. The proposed transfer learning strategy re-formulates the original RL problem for task amenability in a meta-reinforcement learning (meta-RL) framework. The resulting algorithm facilitates efficient adaptation of the agent to different definitions of image quality, each with its own Markov decision process environment including different images, labels and an adaptable task predictor. Our work demonstrates that the IQA agents pre-trained on non-expert task labels can be adapted to predict task amenability as defined by expert task labels, using only a small set of expert labels. Using 6644 clinical ultrasound images from 249 prostate cancer patients, our results for image classification and segmentation tasks show that the proposed IQA method can be adapted using data with as few as respective 19.7% and 29.6% expert-reviewed consensus labels and still achieve comparable IQA and task performance, which would otherwise require a training dataset with 100% expert labels. △ Less

Submitted 31 July, 2021; originally announced August 2021.

Comments: Accepted at ASMUS 2021 (The 2nd International Workshop of Advances in Simplifying Medical UltraSound)

arXiv:2107.03987 [pdf]

Atlas-Based Segmentation of Intracochlear Anatomy in Metal Artifact Affected CT Images of the Ear with Co-trained Deep Neural Networks

Authors: Jianing Wang, Dingjie Su, Yubo Fan, Srijata Chakravorti, Jack H. Noble, Benoit M. Dawant

Abstract: We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep n… ▽ More We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT images to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle-consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. Our model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. We show that our end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on a two-steps approach that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images. Our method requires a fraction of the time needed by the SOTA, which is important for end-user acceptance. △ Less

Submitted 9 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 10 pages, 5 figures

arXiv:2105.12415 [pdf, other]

doi 10.1137/21M1421842

A multiresolution Discrete Element Method for triangulated objects with implicit timestepping

Authors: Peter J. Noble, Tobias Weinzierl

Abstract: Simulations of many rigid bodies colliding with each other sometimes yield particularly interesting results if the colliding objects differ significantly in size and are non-spherical. The most expensive part within such a simulation code is the collision detection. We propose a family of novel multiscale collision detection algorithms that can be applied to triangulated objects within explicit an… ▽ More Simulations of many rigid bodies colliding with each other sometimes yield particularly interesting results if the colliding objects differ significantly in size and are non-spherical. The most expensive part within such a simulation code is the collision detection. We propose a family of novel multiscale collision detection algorithms that can be applied to triangulated objects within explicit and implicit time stepping methods. They are well-suited to handle objects that cannot be represented by analytical shapes or assemblies of analytical objects. Inspired by multigrid methods and adaptive mesh refinement, we determine collision points iteratively over a resolution hierarchy, and combine a functional minimisation plus penalty parameters with the actual comparision-based geometric distance calculation. Coarse surrogate geometry representations identify "no collision" scenarios early on and otherwise yield an educated guess which triangle subsets of the next finer level potentially yield collisions. They prune the search tree, and furthermore feed conservative contact force estimates into the iterative solve behind an implicit time stepping. Implicit time stepping and non-analytical shapes often yield prohibitive high compute cost for rigid body simulations. Our approach reduces these cost algorithmically by one to two orders of magnitude. It also exhibits high vectorisation efficiency due to its iterative nature. △ Less

Submitted 9 March, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

MSC Class: 70E55; 70F35; 68U05; 51P05; 37N15

arXiv:2103.07895 [pdf, other]

Principled Ultrasound Data Augmentation for Classification of Standard Planes

Authors: Lok Hin Lee, Yuan Gao, J. Alison Noble

Abstract: Deep learning models with large learning capacities often overfit to medical imaging datasets. This is because training sets are often relatively small due to the significant time and financial costs incurred in medical data acquisition and labelling. Data augmentation is therefore often used to expand the availability of training data and to increase generalization. However, augmentation strategi… ▽ More Deep learning models with large learning capacities often overfit to medical imaging datasets. This is because training sets are often relatively small due to the significant time and financial costs incurred in medical data acquisition and labelling. Data augmentation is therefore often used to expand the availability of training data and to increase generalization. However, augmentation strategies are often chosen on an ad-hoc basis without justification. In this paper, we present an augmentation policy search method with the goal of improving model classification performance. We include in the augmentation policy search additional transformations that are often used in medical image analysis and evaluate their performance. In addition, we extend the augmentation policy search to include non-linear mixed-example data augmentation strategies. Using these learned policies, we show that principled data augmentation for medical image model training can lead to significant improvements in ultrasound standard plane detection, with an an average F1-score improvement of 7.0% overall over naive data augmentation strategies in ultrasound fetal standard plane classification. We find that the learned representations of ultrasound images are better clustered and defined with optimized data augmentation. △ Less

Submitted 14 March, 2021; originally announced March 2021.

Comments: Information Processing in Medical Imaging (IPMI) 2021

arXiv:2009.13635 [pdf, other]

Cross-Task Representation Learning for Anatomical Landmark Detection

Authors: Zeyu Fu, Jianbo Jiao, Michael Suttie, J. Alison Noble

Abstract: Recently, there is an increasing demand for automatically detecting anatomical landmarks which provide rich structural information to facilitate subsequent medical image analysis. Current methods related to this task often leverage the power of deep neural networks, while a major challenge in fine tuning such models in medical applications arises from insufficient number of labeled samples. To add… ▽ More Recently, there is an increasing demand for automatically detecting anatomical landmarks which provide rich structural information to facilitate subsequent medical image analysis. Current methods related to this task often leverage the power of deep neural networks, while a major challenge in fine tuning such models in medical applications arises from insufficient number of labeled samples. To address this, we propose to regularize the knowledge transfer across source and target tasks through cross-task representation learning. The proposed method is demonstrated for extracting facial anatomical landmarks which facilitate the diagnosis of fetal alcohol syndrome. The source and target tasks in this work are face recognition and landmark detection, respectively. The main idea of the proposed method is to retain the feature representations of the source model on the target task data, and to leverage them as an additional source of supervisory signals for regularizing the target model learning, thereby improving its performance under limited training samples. Concretely, we present two approaches for the proposed representation learning by constraining either final or intermediate model features on the target model. Experimental results on a clinical face image dataset demonstrate that the proposed approach works well with few labeled data, and outperforms other compared approaches. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: MICCAI-MLMI 2020

arXiv:2008.13002 [pdf, other]

doi 10.1007/978-3-030-59716-0_24

Longitudinal Image Registration with Temporal-order and Subject-specificity Discrimination

Authors: Qianye Yang, Yunguan Fu, Francesco Giganti, Nooshin Ghavami, Qingchao Chen, J. Alison Noble, Tom Vercauteren, Dean Barratt, Yipeng Hu

Abstract: Morphological analysis of longitudinal MR images plays a key role in monitoring disease progression for prostate cancer patients, who are placed under an active surveillance program. In this paper, we describe a learning-based image registration algorithm to quantify changes on regions of interest between a pair of images from the same patient, acquired at two different time points. Combining inte… ▽ More Morphological analysis of longitudinal MR images plays a key role in monitoring disease progression for prostate cancer patients, who are placed under an active surveillance program. In this paper, we describe a learning-based image registration algorithm to quantify changes on regions of interest between a pair of images from the same patient, acquired at two different time points. Combining intensity-based similarity and gland segmentation as weak supervision, the population-data-trained registration networks significantly lowered the target registration errors (TREs) on holdout patient data, compared with those before registration and those from an iterative registration algorithm. Furthermore, this work provides a quantitative analysis on several longitudinal-data-sampling strategies and, in turn, we propose a novel regularisation method based on maximum mean discrepancy, between differently-sampled training image pairs. Based on 216 3D MR images from 86 patients, we report a mean TRE of 5.6 mm and show statistically significant differences between the different training data sampling strategies. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: Accepted at MICCAI 2020

arXiv:2008.08698 [pdf, other]

Self-Supervised Ultrasound to MRI Fetal Brain Image Synthesis

Authors: Jianbo Jiao, Ana I. L. Namburete, Aris T. Papageorghiou, J. Alison Noble

Abstract: Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for second-trimester anomaly screening, for which ultrasound (US) is employed. Although expert sonographers are adept at reading US images, MR images which closely resemble anatomical images are much easier for non-experts to interpret. Thus in this paper we propose to generate MR-like… ▽ More Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for second-trimester anomaly screening, for which ultrasound (US) is employed. Although expert sonographers are adept at reading US images, MR images which closely resemble anatomical images are much easier for non-experts to interpret. Thus in this paper we propose to generate MR-like images directly from clinical US images. In medical image analysis such a capability is potentially useful as well, for instance for automatic US-MRI registration and fusion. The proposed model is end-to-end trainable and self-supervised without any external annotations. Specifically, based on an assumption that the US and MRI data share a similar anatomical latent space, we first utilise a network to extract the shared latent features, which are then used for MRI synthesis. Since paired data is unavailable for our study (and rare in practice), pixel-level constraints are infeasible to apply. We instead propose to enforce the distributions to be statistically indistinguishable, by adversarial learning in both the image domain and feature space. To regularise the anatomical structures between US and MRI during synthesis, we further propose an adversarial structural constraint. A new cross-modal attention technique is proposed to utilise non-local spatial information, by encouraging multi-modal knowledge fusion and propagation. We extend the approach to consider the case where 3D auxiliary information (e.g., 3D neighbours and a 3D location index) from volumetric data is also available, and show that this improves image synthesis. The proposed approach is evaluated quantitatively and qualitatively with comparison to real fetal MR images and other approaches to synthesis, demonstrating its feasibility of synthesising realistic MR images. △ Less

Submitted 19 August, 2020; originally announced August 2020.

Comments: IEEE Transactions on Medical Imaging 2020

arXiv:2008.06607 [pdf, other]

Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound

Authors: Jianbo Jiao, Yifan Cai, Mohammad Alsharid, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access, making conventional deep learning-based models difficult to scale. As a result, it would be beneficial if useful representations could be derived from raw data without the need for manual annotations. In this paper, we propose to address the problem of self-supervised representation learning with… ▽ More In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access, making conventional deep learning-based models difficult to scale. As a result, it would be beneficial if useful representations could be derived from raw data without the need for manual annotations. In this paper, we propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data. For this case, we assume that there is a high correlation between the ultrasound video and the corresponding narrative speech audio of the sonographer. In order to learn meaningful representations, the model needs to identify such correlation and at the same time understand the underlying anatomical features. We designed a framework to model the correspondence between video and audio without any kind of human annotations. Within this framework, we introduce cross-modal contrastive learning and an affinity-aware self-paced learning scheme to enhance correlation modelling. Experimental evaluations on multi-modal fetal ultrasound video and audio show that the proposed approach is able to learn strong representations and transfers well to downstream tasks of standard plane detection and eye-gaze prediction. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: MICCAI 2020 (early acceptance)

arXiv:2007.04480 [pdf, other]

Automatic Probe Movement Guidance for Freehand Obstetric Ultrasound

Authors: Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: We present the first system that provides real-time probe movement guidance for acquiring standard planes in routine freehand obstetric ultrasound scanning. Such a system can contribute to the worldwide deployment of obstetric ultrasound scanning by lowering the required level of operator expertise. The system employs an artificial neural network that receives the ultrasound video signal and the m… ▽ More We present the first system that provides real-time probe movement guidance for acquiring standard planes in routine freehand obstetric ultrasound scanning. Such a system can contribute to the worldwide deployment of obstetric ultrasound scanning by lowering the required level of operator expertise. The system employs an artificial neural network that receives the ultrasound video signal and the motion signal of an inertial measurement unit (IMU) that is attached to the probe, and predicts a guidance signal. The network termed US-GuideNet predicts either the movement towards the standard plane position (goal prediction), or the next movement that an expert sonographer would perform (action prediction). While existing models for other ultrasound applications are trained with simulations or phantoms, we train our model with real-world ultrasound video and probe motion data from 464 routine clinical scans by 17 accredited sonographers. Evaluations for 3 standard plane types show that the model provides a useful guidance signal with an accuracy of 88.8% for goal prediction and 90.9% for action prediction. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: Accepted at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020)

arXiv:2003.05477 [pdf, other]

doi 10.1007/978-3-030-58558-7_25

Unified Image and Video Saliency Modeling

Authors: Richard Droste, Jianbo Jiao, J. Alison Noble

Abstract: Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approach… ▽ More Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal △ Less

Submitted 7 November, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

Comments: Presented at the European Conference on Computer Vision (ECCV) 2020. R. Droste and J. Jiao contributed equally to this work. v3: Updated Fig. 5a) and added new MTI300 benchmark results to supp. material

Journal ref: In: ECCV 2020, Springer LNCS 12350, pp. 419-435

arXiv:2003.00105 [pdf, other]

Self-supervised Representation Learning for Ultrasound Video

Authors: Jianbo Jiao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract: Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data… ▽ More Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction. △ Less

Submitted 28 February, 2020; originally announced March 2020.

Comments: ISBI 2020

arXiv:2002.08334 [pdf, other]

Holistic Specifications for Robust Programs

Authors: Sophia Drossopoulou, James Noble, Julian Mackay, Susan Eisenbach

Abstract: Functional specifications describe what program components do: the sufficient conditions to invoke a component's operations. They allow us to reason about the use of components in the closed world setting, where the component interacts with known client code, and where the client code must establish the appropriate pre-conditions before calling into the component. Sufficient conditions are not e… ▽ More Functional specifications describe what program components do: the sufficient conditions to invoke a component's operations. They allow us to reason about the use of components in the closed world setting, where the component interacts with known client code, and where the client code must establish the appropriate pre-conditions before calling into the component. Sufficient conditions are not enough to reason about the use of components in the open world setting, where the component interacts with external code, possibly of unknown provenance, and where the component itself may evolve over time. In this open world setting, we must also consider the necessary} conditions, i.e, what are the conditions without which an effect will not happen. In this paper we propose the language Chainmail for writing holistic specifications that focus on necessary conditions (as well as sufficient conditions). We give a formal semantics for \Chainmail. The core of Chainmail has been mechanised in the Coq proof assistant. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: 44 pages, 1 Table, 11 Figures

MSC Class: 68; 68N19; 68Q60; 68Q65

arXiv:2001.08188 [pdf, other]

Discovering Salient Anatomical Landmarks by Predicting Human Gaze

Authors: Richard Droste, Pierre Chatelain, Lior Drukker, Harshita Sharma, Aris T. Papageorghiou, J. Alison Noble

Abstract: Anatomical landmarks are a crucial prerequisite for many medical imaging tasks. Usually, the set of landmarks for a given task is predefined by experts. The landmark locations for a given image are then annotated manually or via machine learning methods trained on manual annotations. In this paper, in contrast, we present a method to automatically discover and localize anatomical landmarks in medi… ▽ More Anatomical landmarks are a crucial prerequisite for many medical imaging tasks. Usually, the set of landmarks for a given task is predefined by experts. The landmark locations for a given image are then annotated manually or via machine learning methods trained on manual annotations. In this paper, in contrast, we present a method to automatically discover and localize anatomical landmarks in medical images. Specifically, we consider landmarks that attract the visual attention of humans, which we term visually salient landmarks. We illustrate the method for fetal neurosonographic images. First, full-length clinical fetal ultrasound scans are recorded with live sonographer gaze-tracking. Next, a convolutional neural network (CNN) is trained to predict the gaze point distribution (saliency map) of the sonographers on scan video frames. The CNN is then used to predict saliency maps of unseen fetal neurosonographic images, and the landmarks are extracted as the local maxima of these saliency maps. Finally, the landmarks are matched across images by clustering the landmark CNN features. We show that the discovered landmarks can be used within affine image registration, with average landmark alignment errors between 4.1% and 10.9% of the fetal head long axis length. △ Less

Submitted 22 January, 2020; originally announced January 2020.

Comments: Accepted at IEEE International Symposium on Biomedical Imaging 2020 (ISBI 2020)

arXiv:1912.09903 [pdf, other]

doi 10.1016/j.ultrasmedbio.2016.02.007

Heterogeneous tissue characterization using ultrasound: a comparison of fractal analysis backscatter models on liver tumors

Authors: Omar S. Al-Kadi, Daniel Y. F. Chung, Constantin C. Coussios, J. Alison Noble

Abstract: Assessing tumor tissue heterogeneity via ultrasound has recently been suggested for predicting early response to treatment. The ultrasound backscattering characteristics can assist in better understanding the tumor texture by highlighting local concentration and spatial arrangement of tissue scatterers. However, it is challenging to quantify the various tissue heterogeneities ranging from fine-to-… ▽ More Assessing tumor tissue heterogeneity via ultrasound has recently been suggested for predicting early response to treatment. The ultrasound backscattering characteristics can assist in better understanding the tumor texture by highlighting local concentration and spatial arrangement of tissue scatterers. However, it is challenging to quantify the various tissue heterogeneities ranging from fine-to-coarse of the echo envelope peaks in tumor texture. Local parametric fractal features extracted via maximum likelihood estimation from five well-known statistical model families are evaluated for the purpose of ultrasound tissue characterization. The fractal dimension (self-similarity measure) was used to characterize the spatial distribution of scatterers, while the Lacunarity (sparsity measure) was applied to determine scatterer number density. Performance was assessed based on 608 cross-sectional clinical ultrasound RF images of liver tumors (230 and 378 demonstrating respondent and non-respondent cases, respectively). Crossvalidation via leave-one-tumor-out and with different k-folds methodologies using a Bayesian classifier were employed for validation. The fractal properties of the backscattered echoes based on the Nakagami model (Nkg) and its extend four-parameter Nakagami-generalized inverse Gaussian (NIG) distribution achieved best results - with nearly similar performance - for characterizing liver tumor tissue. Accuracy, sensitivity and specificity for the Nkg/NIG were: 85.6%/86.3%, 94.0%/96.0%, and 73.0%/71.0%, respectively. Other statistical models, such as the Rician, Rayleigh, and K-distribution were found to not be as effective in characterizing the subtle changes in tissue texture as an indication of response to treatment. Employing the most relevant and practical statistical model could have potential consequences for the design of an early and effective clinical therapy. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: 31 pages, 7 figures, 3 tables, journal article

Journal ref: Ultrasound in Medicine & Biology, vol. 42(7), pp. 1612-1626, 2016

arXiv:1909.10137 [pdf]

Validation of image-guided cochlear implant programming techniques

Authors: Yiyuan Zhao, Jianing Wang, Rui Li, Robert F. Labadie, Benoit M. Dawant, Jack H. Noble

Abstract: Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT… ▽ More Cochlear implants (CIs) are a standard treatment for patients who experience severe to profound hearing loss. Recent studies have shown that hearing outcome is correlated with intra-cochlear anatomy and electrode placement. Our group has developed image-guided CI programming (IGCIP) techniques that use image analysis methods to both segment the inner ear structures in pre- or post-implantation CT images and localize the CI electrodes in post-implantation CT images. This permits to assist audiologists with CI programming by suggesting which among the contacts should be deactivated to reduce electrode interaction that is known to affect outcomes. Clinical studies have shown that IGCIP can improve hearing outcomes for CI recipients. However, the sensitivity of IGCIP with respect to the accuracy of the two major steps: electrode localization and intra-cochlear anatomy segmentation, is unknown. In this article, we create a ground truth dataset with conventional CT and micro-CT images of 35 temporal bone specimens to both rigorously characterize the accuracy of these two steps and assess how inaccuracies in these steps affect the overall results. Our study results show that when clinical pre- and post-implantation CTs are available, IGCIP produces results that are comparable to those obtained with the corresponding ground truth in 86.7% of the subjects tested. When only post-implantation CTs are available, this number is 83.3%. These results suggest that our current method is robust to errors in segmentation and localization but also that it can be improved upon. Keywords: cochlear implant, ground truth, segmentation, validation △ Less

Submitted 13 July, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

Comments: 37 pages, 12 figures, 7 tables

Showing 1–50 of 71 results for author: Noble, J