Chen et al., 2024 - Google Patents

Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

Chen et al., 2024

View PDF
Document ID
12255755920595553974
Author
Chen K
Thapa R
Chalamala R
Athiwaratkun B
Song S
Zou J
Publication year
Publication venue
arXiv preprint arXiv:2406.00977

External Links

Snippet

Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/30Medical informatics, i.e. computer-based analysis or dissemination of patient or disease data
    • G06F19/32Medical data management, e.g. systems or protocols for archival or communication of medical images, computerised patient records or computerised general medical references
    • G06F19/321Management of medical image data, e.g. communication or archiving systems such as picture archiving and communication systems [PACS] or related medical protocols such as digital imaging and communications in medicine protocol [DICOM]; Editing of medical image data, e.g. adding diagnosis information
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/30Medical informatics, i.e. computer-based analysis or dissemination of patient or disease data
    • G06F19/34Computer-assisted medical diagnosis or treatment, e.g. computerised prescription or delivery of medication or diets, computerised local control of medical devices, medical expert systems or telemedicine
    • G06F19/345Medical expert systems, neural networks or other automated diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30244Information retrieval; Database structures therefor; File system structures therefor in image databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for a specific business sector, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/22Health care, e.g. hospitals; Social work
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image, e.g. from bit-mapped to bit-mapped creating a different image
    • G06T3/0031Geometric image transformation in the plane of the image, e.g. from bit-mapped to bit-mapped creating a different image for topological mapping of a higher dimensional structure on a lower dimensional surface
    • G06T3/0037Reshaping or unfolding a 3D tree structure onto a 2D plane
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Similar Documents

Publication Publication Date Title
Zhang et al. Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks
Chen et al. Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
CN105760874A (en) CT image processing system and method for pneumoconiosis
Lin et al. BATFormer: Towards boundary-aware lightweight transformer for efficient medical image segmentation
CN113506310A (en) Medical image processing method and device, electronic equipment and storage medium
Amara et al. COVIR: A virtual rendering of a novel NN architecture O-Net for COVID-19 Ct-scan automatic lung lesions segmentation
Liu et al. A systematic review of deep learning-based research on radiology report generation
Varçin et al. Diagnosis of lumbar spondylolisthesis via convolutional neural networks
Karam et al. A progressive and cross-domain deep transfer learning framework for wrist fracture detection
Le Van et al. Detecting lumbar implant and diagnosing scoliosis from vietnamese X-ray imaging using the pre-trained api models and transfer learning
Lin et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining
Wu et al. K-diag: Knowledge-enhanced disease diagnosis in radiographic imaging
Jha et al. CT Liver Segmentation via PVT-based Encoding and Refined Decoding
Feng et al. Utransnet: Transformer within u-net for stroke lesion segmentation
Pelka et al. Branding-fusion of meta data and musculoskeletal radiographs for multi-modal diagnostic recognition
Dai et al. UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
Nguyen et al. Collaborative consultation doctors model: Unifying CNN and ViT for COVID-19 diagnostic
Jia et al. Amo-net: abdominal multi-organ segmentation in mri with a extend unet
Guo Applying Medical Language Models to Medical Image Analysis
Machado Mandible-focused osteoporosis risk assessment using dental panoramic radiography and artificial intelligence models
Dalla Serra et al. Improving Image Representations via MoCo Pre-training for Multimodal CXR Classification
Hamamci et al. Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
Crespi Towards a comprehensive application of deep learning methods in medical imaging: semantic segmentation, features extraction, synthetic image generation
Ma et al. Symmetrical awareness network for cross-site ultrasound thyroid nodule segmentation
Si et al. Non-symmetrical Sibling-Stream Network with Adaptive Positional Encoding for Automatic Medical Report Generation