Skip to main content

Showing 1–50 of 204 results for author: Hsu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02530  [pdf

    cs.LG cs.AI

    Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models

    Authors: Chih-Yuan Li, Jun-Ting Wu, Chan Hsu, Ming-Yen Lin, Yihuang Kang

    Abstract: The estimated Glomerular Filtration Rate (eGFR) is an essential indicator of kidney function in clinical practice. Although traditional equations and Machine Learning (ML) models using clinical and laboratory data can estimate eGFR, accurately predicting future eGFR levels remains a significant challenge for nephrologists and ML researchers. Recent advances demonstrate that Large Language Models (… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: This preprint version includes corrections of typographical errors related to numerical values in Table 2, which were present in the version published at the BDH workshop in MIPR 2024. These corrections do not affect the overall conclusions of the study

  2. arXiv:2409.00489  [pdf

    cs.CV cs.AI

    Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability

    Authors: Chia-Yu Hsu, Wenwen Li, Sizhe Wang

    Abstract: Research on geospatial foundation models (GFMs) has become a trending topic in geospatial artificial intelligence (AI) research due to their potential for achieving high generalizability and domain adaptability, reducing model training costs for individual researchers. Unlike large language models, such as ChatGPT, constructing visual foundation models for image analysis, particularly in remote se… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  3. arXiv:2409.00395  [pdf, other

    cs.CV cs.LG

    Self-supervised Fusarium Head Blight Detection with Hyperspectral Image and Feature Mining

    Authors: Yu-Fan Lin, Ching-Heng Cheng, Bo-Cheng Qiu, Cheng-Jun Kang, Chia-Ming Lee, Chih-Chung Hsu

    Abstract: Fusarium Head Blight (FHB) is a serious fungal disease affecting wheat (including durum), barley, oats, other small cereal grains, and corn. Effective monitoring and accurate detection of FHB are crucial to ensuring stable and reliable food security. Traditionally, trained agronomists and surveyors perform manual identification, a method that is labor-intensive, impractical, and challenging to sca… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Beyond Visible Spectrum: AI for Agriculture Challenge, in conjunted with ICPR 2024

  4. arXiv:2408.15057  [pdf

    cs.LG

    Subgroup Analysis via Model-based Rule Forest

    Authors: I-Ling Cheng, Chan Hsu, Chantung Ku, Pei-Ju Lee, Yihuang Kang

    Abstract: Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  5. arXiv:2408.15055  [pdf

    cs.LG cs.AI

    Causal Rule Forest: Toward Interpretable and Precise Treatment Effect Estimation

    Authors: Chan Hsu, Jun-Ting Wu, Yihuang Kang

    Abstract: Understanding and inferencing Heterogeneous Treatment Effects (HTE) and Conditional Average Treatment Effects (CATE) are vital for developing personalized treatment recommendations. Many state-of-the-art approaches achieve inspiring performance in estimating HTE on benchmark datasets or simulation studies. However, the indirect predicting manner and complex model architecture reduce the interpreta… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: The 25th IEEE International Conference on Information Reuse and Integration for Data Science (IRI 2024)

  6. arXiv:2408.13708  [pdf, other

    cs.CV cs.LG

    InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth

    Authors: Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann

    Abstract: Indoor monocular depth estimation helps home automation, including robot navigation or AR/VR for surrounding perception. Most previous methods primarily experiment with the NYUv2 Dataset and concentrate on the overall performance in their evaluation. However, their robustness and generalization to diversely unseen types or categories for indoor spaces (spaces types) have yet to be discovered. Rese… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: BMVC 2024. This version supersedes 2309.13516

  7. arXiv:2408.08968  [pdf, other

    cs.NI cs.AI cs.LG

    Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Systems

    Authors: Cyril Shih-Huan Hsu, Danny De Vleeschauwer, Chrysa Papagianni

    Abstract: When a network slice spans multiple domains, each domain must uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. This requires decomposing the E2E SLA into partial SLAs for each domain. In a two-level network slicing management system with an E2E orchestrator and local controllers, we propose an online learning-decomposition framework that dynamically updates risk… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: The paper has been submitted to IEEE Networking Letters

  8. arXiv:2407.21402  [pdf, other

    cs.CV

    DD-rPPGNet: De-interfering and Descriptive Feature Learning for Unsupervised rPPG Estimation

    Authors: Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

    Abstract: Remote Photoplethysmography (rPPG) aims to measure physiological signals and Heart Rate (HR) from facial videos. Recent unsupervised rPPG estimation methods have shown promising potential in estimating rPPG signals from facial regions without relying on ground truth rPPG signals. However, these methods seem oblivious to interference existing in rPPG signals and still result in unsatisfactory perfo… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  9. arXiv:2407.16148  [pdf, other

    cs.CL

    CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

    Authors: Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, Aakanksha Naik

    Abstract: Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical ca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 2024 ACL Findings

  10. SLA Decomposition for Network Slicing: A Deep Neural Network Approach

    Authors: Cyril Shih-Huan Hsu, Danny De Vleeschauwer, Chrysa Papagianni

    Abstract: For a network slice that spans multiple technology and/or administrative domains, these domains must ensure that the slice's End-to-End (E2E) Service Level Agreement (SLA) is met. Thus, the E2E SLA should be decomposed to partial SLAs, assigned to each of these domains. Assuming a two level management architecture consisting of an E2E service orchestrator and local domain controllers, we consider… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IEEE Networking Letters

  11. arXiv:2407.13322  [pdf, other

    cs.CV

    Fully Test-Time rPPG Estimation via Synthetic Signal-Guided Feature Learning

    Authors: Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

    Abstract: Many remote photoplethysmography (rPPG) estimation models have achieved promising performance in the training domain but often fail to accurately estimate physiological signals or heart rates (HR) in the target domains. Domain generalization (DG) or domain adaptation (DA) techniques are therefore adopted during the offline training stage to adapt the model to either unobserved or observed target d… ▽ More

    Submitted 15 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  12. arXiv:2407.12579  [pdf, other

    cs.CV cs.AI

    The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

    Authors: Yi Yao, Chan-Feng Hsu, Jhe-Hao Lin, Hongxia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data. This work explores how diffusion models can generate images from prompts requiring artistic creativity or specialized knowledge. We introduce the Realistic-Fantasy Benchmark (RFBench), a novel evaluation framew… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  13. arXiv:2407.07331  [pdf, ps, other

    cs.CV cs.AI

    Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

    Authors: Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICIP 2024

  14. arXiv:2407.00556  [pdf, other

    cs.MM

    Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, Chi-Han Tsai

    Abstract: Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submission of the 7th Social Media Prediction Challenge

  15. arXiv:2406.19941  [pdf, other

    cs.CV

    GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

    Authors: Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

    Abstract: As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques.… ▽ More

    Submitted 1 September, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to TPAMI 2024

  16. arXiv:2406.19666  [pdf, other

    cs.CV eess.IV

    CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

    Authors: Chih-Chung Hsu, Chih-Chien Ni, Chia-Ming Lee, Li-Wei Kang

    Abstract: Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to TIP 2024

  17. arXiv:2406.07431  [pdf, other

    cs.MA cs.CV

    Active Scout: Multi-Target Tracking Using Neural Radiance Fields in Dense Urban Environments

    Authors: Christopher D. Hsu, Pratik Chaudhari

    Abstract: We study pursuit-evasion games in highly occluded urban environments, e.g. tall buildings in a city, where a scout (quadrotor) tracks multiple dynamic targets on the ground. We show that we can build a neural radiance field (NeRF) representation of the city -- online -- using RGB and depth images from different vantage points. This representation is used to calculate the information gain to both e… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 8 pages, 8 figures, 1 table

  18. arXiv:2406.00024  [pdf, other

    cs.CL cs.AI cs.ET cs.LG

    Embedding-Aligned Language Models

    Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier

    Abstract: We propose a novel approach for training large language models (LLMs) to adhere to objectives defined within a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w.r.t. so… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

  19. arXiv:2405.16833  [pdf, other

    cs.LG

    Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

    Authors: Chia-Yi Hsu, Yu-Lin Tsai, Chih-Hsun Lin, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang

    Abstract: While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fin… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  20. arXiv:2405.14259  [pdf, other

    cs.CL cs.AI

    Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition

    Authors: Chan-Jan Hsu, Yi-Chang Chen, Feng-Ting Liao, Pei-Chen Ho, Yu-Hsiang Wang, Po-Chun Hsu, Da-shan Shiu

    Abstract: We introduce "Generative Fusion Decoding" (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). We derive the formulas necessary to enable GFD to operate across mismatched token spaces of different models by mapping text token space to… ▽ More

    Submitted 2 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.06554  [pdf, ps, other

    cs.IT

    Tradeoffs among Action Taking Policies Matter in Active Sequential Multi-Hypothesis Testing: the Optimal Error Exponent Region

    Authors: Chia-Yu Hsu, I-Hsiang Wang

    Abstract: Reliability of sequential hypothesis testing can be greatly improved when decision maker is given the freedom to adaptively take an action that determines the distribution of the current collected sample. Such advantage of sampling adaptivity has been realized since Chernoff's seminal paper in 1959 [1]. While a large body of works have explored and investigated the gain of adaptivity, in the gener… ▽ More

    Submitted 29 August, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Submitted to the IEEE Transactions on Information Theory

  22. arXiv:2404.16670  [pdf, other

    cs.CV cs.AI

    EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

    Authors: Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to ins… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  23. arXiv:2404.15781  [pdf, other

    cs.CV cs.AI eess.IV

    Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat

    Authors: Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen

    Abstract: This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by TGRS 2024

  24. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  25. arXiv:2404.10108  [pdf, other

    cs.CV cs.LG

    GeoAI Reproducibility and Replicability: a computational and spatial perspective

    Authors: Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Peter Kedron

    Abstract: GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of re… ▽ More

    Submitted 22 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by Annals of the American Association of Geographers

  26. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  27. arXiv:2404.05183  [pdf, other

    cs.CV cs.LG

    Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Traditional defect classification approaches are facing with two barriers. (1) Insufficient training data and unstable data quality. Collecting sufficient defective sample is expensive and time-costing, consequently leading to dataset variance. It introduces the difficulty on recognition and learning. (2) Over-dependence on visual modality. When the image pattern and texture is monotonic for all d… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: MULA 2024

  28. arXiv:2404.01643  [pdf, other

    eess.IV cs.CV cs.LG

    A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions… ▽ More

    Submitted 20 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Camera-ready version, accepted by DEF-AI-MIA workshop, in conjunted with CVPR2024

  29. arXiv:2404.00722  [pdf, other

    cs.CV cs.AI

    DRCT: Saving Image Super-resolution away from Information Bottleneck

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou

    Abstract: In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global sp… ▽ More

    Submitted 15 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Camera-ready version, NTIRE 2024 Image Super-resolution (x4)

  30. arXiv:2403.15791  [pdf, other

    cs.RO

    DriveEnv-NeRF: Exploration of A NeRF-Based Autonomous Driving Environment for Real-World Performance Validation

    Authors: Mu-Yi Shen, Chia-Chi Hsu, Hao-Yu Hou, Yu-Chen Huang, Wei-Fang Sun, Chia-Che Chang, Yu-Lun Liu, Chun-Yi Lee

    Abstract: In this study, we introduce the DriveEnv-NeRF framework, which leverages Neural Radiance Fields (NeRF) to enable the validation and faithful forecasting of the efficacy of autonomous driving agents in a targeted real-world scene. Standard simulator-based rendering often fails to accurately reflect real-world performance due to the sim-to-real gap, which represents the disparity between virtual sim… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Project page: https://github.com/muyishen2040/DriveEnvNeRF

  31. arXiv:2403.11576  [pdf, other

    cs.CV cs.MM

    MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation

    Authors: Chih-Chung Hsu, Chia-Ming Lee

    Abstract: Instance segmentation, a cornerstone task in computer vision, has wide-ranging applications in diverse industries. The advent of deep learning and artificial intelligence has underscored the criticality of training effective models, particularly in data-scarce scenarios - a concern that resonates in both academic and industrial circles. A significant impediment in this domain is the resource-inten… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  32. arXiv:2403.11572  [pdf, other

    cs.CV cs.MM

    Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Ming-Shyen Wu

    Abstract: Instance segmentation is a fundamental task in computer vision with broad applications across various industries. In recent years, with the proliferation of deep learning and artificial intelligence applications, how to train effective models with limited data has become a pressing issue for both academia and industry. In the Visual Inductive Priors challenge (VIPriors2023), participants must trai… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  33. arXiv:2403.11536  [pdf, other

    cs.CV cs.AI cs.LG

    OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These incl… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  34. arXiv:2403.11230  [pdf, other

    eess.IV cs.CV cs.LG

    Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  35. Thought Graph: Generating Thought Process for Biological Reasoning

    Authors: Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, Ying Ding

    Abstract: We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 4 pages. Accepted by Web Conf 2024

  36. arXiv:2403.06497  [pdf, other

    cs.CV cs.MM

    QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

    Authors: Jiun-Man Chen, Yu-Hsuan Chao, Yu-Jie Wang, Ming-Der Shieh, Chih-Chung Hsu, Wei-Fen Lin

    Abstract: Transformer-based models have gained widespread popularity in both the computer vision (CV) and natural language processing (NLP) fields. However, significant challenges arise during post-training linear quantization, leading to noticeable reductions in inference accuracy. Our study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tunin… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  37. arXiv:2403.03516  [pdf, other

    cs.CL cs.IR

    Unsupervised Multilingual Dense Retrieval via Generative Pseudo Labeling

    Authors: Chao-Wei Huang, Chen-An Li, Tsu-Yuan Hsu, Chen-Yu Hsu, Yun-Nung Chen

    Abstract: Dense retrieval methods have demonstrated promising performance in multilingual information retrieval, where queries and documents can be in different languages. However, dense retrievers typically require a substantial amount of paired data, which poses even greater challenges in multilingual scenarios. This paper introduces UMR, an Unsupervised Multilingual dense Retriever trained without any pa… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to Findings of EACL 2024

  38. arXiv:2403.02712  [pdf, other

    cs.CL

    Breeze-7B Technical Report

    Authors: Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-Chun Hsu, Yi-Chang Chen, Da-Shan Shiu

    Abstract: Breeze-7B is an open-source language model based on Mistral-7B, designed to address the need for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese. This technical report provides an overview of the additional pretraining, finetuning, and evaluation stages for the Breeze-7B model. The Breeze-7B family of base and chat models exhibits good performance on langua… ▽ More

    Submitted 3 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  39. arXiv:2402.15957  [pdf, other

    cs.LG

    DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

    Authors: Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem Bıyık, Craig Boutilier

    Abstract: We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent co… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  40. arXiv:2402.13025  [pdf, other

    cs.CL cs.AI

    CFEVER: A Chinese Fact Extraction and VERification Dataset

    Authors: Ying-Jia Lin, Chun-Yi Lin, Chia-Jen Yeh, Yi-Ting Li, Yun-Yu Hu, Chih-Hao Hsu, Mei-Feng Lee, Hung-Yu Kao

    Abstract: We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as "Supports", "Refutes", or "Not Enough Info" to depict its degree of factualness. Similar to the FEVER dataset, claims in the "Supports" and "Refutes" categories are also annotated with correspon… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: AAAI-24

  41. arXiv:2402.12627  [pdf, other

    cs.LG cs.AI cs.CV

    A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

    Authors: Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  42. arXiv:2402.07087  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Self-Correcting Self-Consuming Loops for Generative Model Training

    Authors: Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun

    Abstract: As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: Camera ready version (ICML 2024). Code at https://nategillman.com/sc-sc.html

  43. arXiv:2402.03616  [pdf, other

    cs.CL cs.AI cs.HC cs.IR

    Leveraging Large Language Models for Hybrid Workplace Decision Support

    Authors: Yujin Kim, Chin-Chia Hsu

    Abstract: Large Language Models (LLMs) hold the potential to perform a variety of text processing tasks and provide textual explanations for proposed actions or decisions. In the era of hybrid work, LLMs can provide intelligent decision support for workers who are designing their hybrid work plans. In particular, they can offer suggestions and explanations to workers balancing numerous decision factors, the… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  44. arXiv:2401.08787  [pdf, other

    cs.CV

    Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping

    Authors: Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna Liljedahl, Chandi Witharana, Yili Yang, Brendan M. Rogers, Samantha T. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis

    Abstract: This paper assesses trending AI foundation models, especially emerging computer vision foundation models and their performance in natural landscape feature segmentation. While the term foundation model has quickly garnered interest from the geospatial domain, its definition remains vague. Hence, this paper will first introduce AI foundation models and their defining characteristics. Built upon the… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  45. arXiv:2312.06519  [pdf, other

    cs.LG cs.AI cs.SI

    A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling

    Authors: Hung Chun Hsu, Bo-Jun Wu, Ming-Yi Hong, Che Lin, Chih-Yu Wang

    Abstract: Our research addresses class imbalance issues in heterogeneous graphs using graph neural networks (GNNs). We propose a novel method combining the strengths of Generative Adversarial Networks (GANs) with GNNs, creating synthetic nodes and edges that effectively balance the dataset. This approach directly targets and rectifies imbalances at the data level. The proposed framework resolves issues such… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  46. arXiv:2312.03077  [pdf, other

    cs.CL cs.AI cs.CY

    Clinical Notes Reveal Physician Fatigue

    Authors: Chao-Chun Hsu, Ziad Obermeyer, Chenhao Tan

    Abstract: Physicians write notes about patients. In doing so, they reveal much about themselves. Using data from 129,228 emergency room visits, we train a model to identify notes written by fatigued physicians -- those who worked 5 or more of the prior 7 days. In a hold-out set, the model accurately identifies notes written by these high-workload physicians, and also flags notes written in other high-fatigu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  47. arXiv:2312.02109  [pdf, other

    cs.CV

    ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

    Authors: Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu

    Abstract: This work introduces ArtAdapter, a transformative text-to-image (T2I) style transfer framework that transcends traditional limitations of color, brushstrokes, and object shape, capturing high-level style elements such as composition and distinctive artistic expression. The integration of a multi-level style encoder with our proposed explicit adaptation mechanism enables ArtAdapter to achieve unpre… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  48. arXiv:2311.14902  [pdf, other

    cs.CV

    Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features

    Authors: Jun-En Ding, Chien-Chin Hsu, Feng Liu

    Abstract: Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel… ▽ More

    Submitted 24 August, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  49. arXiv:2311.04928  [pdf, other

    cs.CL cs.AI cs.HC cs.SI

    Leveraging Large Language Models for Collective Decision-Making

    Authors: Marios Papachristou, Longqi Yang, Chin-Chia Hsu

    Abstract: In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing… ▽ More

    Submitted 24 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Comparison with baselines, requirements analysis, expand related work

  50. arXiv:2311.02085  [pdf, other

    cs.IR cs.AI

    Preference Elicitation with Soft Attributes in Interactive Recommendation

    Authors: Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth se… ▽ More

    Submitted 22 October, 2023; originally announced November 2023.