Skip to main content

Showing 1–50 of 1,244 results for author: Zhou, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03509  [pdf, other

    cs.CV

    Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

    Authors: Chamuditha Jayanaga Galappaththige, Zachary Izzo, Xilin He, Honglu Zhou, Muhammad Haris Khan

    Abstract: Unarguably, deep learning models capable of generalizing to unseen domain data while leveraging a few labels are of great practical significance due to low developmental costs. In search of this endeavor, we study the challenging problem of semi-supervised domain generalization (SSDG), where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a r… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted at WACV25

  2. arXiv:2409.00643  [pdf, other

    cs.RO

    Learning to Singulate Objects in Packed Environments using a Dexterous Hand

    Authors: Hao Jiang, Yuhai Wang, Hanyang Zhou, Daniel Seita

    Abstract: Robotic object singulation, where a robot must isolate, grasp, and retrieve a target object in a cluttered environment, is a fundamental challenge in robotic manipulation. This task is difficult due to occlusions and how other objects act as obstacles for manipulation. A robot must also reason about the effect of object-object interactions as it tries to singulate the target. Prior work has explor… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  3. Dynamical system prediction from sparse observations using deep neural networks with Voronoi tessellation and physics constraint

    Authors: Hanyang Wang, Hao Zhou, Sibo Cheng

    Abstract: Despite the success of various methods in addressing the issue of spatial reconstruction of dynamical systems with sparse observations, spatio-temporal prediction for sparse fields remains a challenge. Existing Kriging-based frameworks for spatio-temporal sparse field prediction fail to meet the accuracy and inference time required for nonlinear dynamic prediction problems. In this paper, we intro… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Journal ref: Computer Methods in Applied Mechanics and Engineering. 2024 Dec 1

  4. arXiv:2409.00128  [pdf

    cs.CL cs.AI econ.GN

    Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs

    Authors: Ziyan Cui, Ning Li, Huaikang Zhou

    Abstract: Artificial Intelligence (AI) is increasingly being integrated into scientific research, particularly in the social sciences, where understanding human behavior is critical. Large Language Models (LLMs) like GPT-4 have shown promise in replicating human-like responses in various psychological experiments. However, the extent to which LLMs can effectively replace human subjects across diverse experi… ▽ More

    Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced September 2024.

    Comments: 5 figures, 2 tables

  5. arXiv:2408.17027  [pdf, other

    cs.CV

    ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

    Authors: Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas

    Abstract: To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rende… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  6. arXiv:2408.16308  [pdf, other

    cs.SI

    AdaMotif: Graph Simplification via Adaptive Motif Design

    Authors: Hong Zhou, Peifeng Lai, Zhida Sun, Xiangyuan Chen, Yang Chen, Huisi Wu, Yong Wang

    Abstract: With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  7. arXiv:2408.15741  [pdf, other

    cs.CV

    Segmentation-guided Layer-wise Image Vectorization with Gradient Fills

    Authors: Hengyu Zhou, Hui Zhang, Bin Wang

    Abstract: The widespread use of vector graphics creates a significant demand for vectorization methods. While recent learning-based techniques have shown their capability to create vector images of clear topology, filling these primitives with gradients remains a challenge. In this paper, we propose a segmentation-guided vectorization framework to convert raster images into concise vector graphics with radi… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.14397  [pdf, other

    cs.AI cs.CL cs.CV

    Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

    Authors: Xiaoman Zhang, Julián N. Acosta, Hong-Yu Zhou, Pranav Rajpurkar

    Abstract: Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from process… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Code is available at: https://github.com/rajpurkarlab/ReXKG

  9. arXiv:2408.13045  [pdf, other

    cs.DS

    Adaptive complexity of log-concave sampling

    Authors: Huanjian Zhou, Baoxiang Wang, Masashi Sugiyama

    Abstract: In large-data applications, such as the inference process of diffusion models, it is desirable to design sampling algorithms with a high degree of parallelization. In this work, we study the adaptive complexity of sampling, which is the minimal number of sequential rounds required to achieve sampling given polynomially many queries executed in parallel at each round. For unconstrained sampling, we… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  10. arXiv:2408.12590  [pdf, other

    cs.CV cs.AI

    xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

    Authors: Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

    Abstract: We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi… ▽ More

    Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV24 AI4VA

  11. arXiv:2408.11947  [pdf, ps, other

    cs.CE

    Assessing skin thermal injury risk in exposure tests of heating until flight

    Authors: Hongyun Wang, Shannon E. Foley, Hong Zhou

    Abstract: We assess the skin thermal injury risk in the situation where a test subject is exposed to an electromagnetic beam until the occurrence of flight action. The physical process is modeled as follows. The absorbed electromagnetic power increases the skin temperature. Wherever it is above a temperature threshold, thermal nociceptors are activated and transduce an electrical signal. When the activated… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2408.11797  [pdf

    cs.RO eess.SY

    An Advanced Microscopic Energy Consumption Model for Automated Vehicle:Development, Calibration, Verification

    Authors: Ke Ma, Zhaohui Liang, Hang Zhou, Xiaopeng Li

    Abstract: The automated vehicle (AV) equipped with the Adaptive Cruise Control (ACC) system is expected to reduce the fuel consumption for the intelligent transportation system. This paper presents the Advanced ACC-Micro (AA-Micro) model, a new energy consumption model based on micro trajectory data, calibrated and verified by empirical data. Utilizing a commercial AV equipped with the ACC system as the tes… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  13. Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

    Authors: Haipeng Zhou, Honqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, Lei Zhu

    Abstract: Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence. Existing works suffer from inefficient temporal learning. Moreover, few works address the VSD problem by considering the characteristic (i.e., boundary) of shadow. Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD where we take account of the past-future temporal guidanc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ACM MM2024

  14. arXiv:2408.11475  [pdf, other

    cs.CV

    TrackGo: A Flexible and Efficient Method for Controllable Video Generation

    Authors: Haitao Zhou, Chuang Wang, Rui Nie, Jinxiao Lin, Dongdong Yu, Qian Yu, Changhu Wang

    Abstract: Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video gene… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  15. arXiv:2408.11396  [pdf, other

    cs.CL

    MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

    Authors: Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen

    Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, ind… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  16. arXiv:2408.10947  [pdf, other

    cs.AI cs.CL cs.CY

    Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models

    Authors: Yuyan Chen, Chenwei Wu, Songzhou Yan, Panjun Liu, Haoyu Zhou, Yanghua Xiao

    Abstract: Teachers are important to imparting knowledge and guiding learners, and the role of large language models (LLMs) as potential educators is emerging as an important area of study. Recognizing LLMs' capability to generate educational content can lead to advances in automated and personalized learning. While LLMs have been tested for their comprehension and problem-solving skills, their capability in… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024

  17. arXiv:2408.10775  [pdf, other

    cs.CV cs.LG eess.IV

    Generative AI in Industrial Machine Vision -- A Review

    Authors: Hans Aoyang Zhou, Dominik Wolfschläger, Constantinos Florides, Jonas Werheid, Hannes Behnen, Jan-Henrick Woltersmann, Tiago C. Pinto, Marco Kemmerling, Anas Abdelrazeq, Robert H. Schmitt

    Abstract: Machine vision enhances automation, quality control, and operational efficiency in industrial applications by enabling machines to interpret and act on visual data. While traditional computer vision algorithms and approaches remain widely utilized, machine learning has become pivotal in current research activities. In particular, generative AI demonstrates promising potential by improving pattern… ▽ More

    Submitted 21 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 44 pages, 7 figures, This work has been submitted to the Journal of Intelligent Manufacturing

  18. arXiv:2408.10567  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

    Authors: Zijian Dong, Yilei Wu, Zijiao Chen, Yichi Zhang, Yueming Jin, Juan Helen Zhou

    Abstract: We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024

  19. arXiv:2408.10524  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

    Authors: Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou

    Abstract: Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted to NCMMSC 2024

  20. arXiv:2408.10130  [pdf

    cs.CL cs.AI

    Rhyme-aware Chinese lyric generator based on GPT

    Authors: Yixiao Yuan, Yangchen Huang, Yu Ma, Xinjin Li, Zhenglin Li, Yiming Shi, Huapeng Zhou

    Abstract: Neural language representation models such as GPT, pre-trained on large-scale corpora, can effectively capture rich semantic patterns from plain text and be fine-tuned to consistently improve natural language generation performance. However, existing pre-trained language models used to generate lyrics rarely consider rhyme information, which is crucial in lyrics. Using a pre-trained model directly… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  21. arXiv:2408.08872  [pdf, other

    cs.CV cs.AI cs.CL

    xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

    Authors: Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles , et al. (2 additional authors not shown)

    Abstract: This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tas… ▽ More

    Submitted 28 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  22. arXiv:2408.08704  [pdf, other

    cs.CV cs.AI

    Beyond the Hype: A dispassionate look at vision-language models in medical scenario

    Authors: Yang Nan, Huichi Zhou, Xiaodan Xing, Guang Yang

    Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments over-concentrate in evaluating VLMs based on simple Visual Question Answering… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages

  23. arXiv:2408.08685  [pdf, other

    cs.LG cs.AI cs.CY cs.SI

    Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?

    Authors: Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi

    Abstract: Graph neural networks (GNNs) are vulnerable to adversarial perturbations, especially for topology attacks, and many methods that improve the robustness of GNNs have received considerable attention. Recently, we have witnessed the significant success of large language models (LLMs), leading many to explore the great potential of LLMs on GNNs. However, they mainly focus on improving the performance… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  24. Language-Driven Interactive Shadow Detection

    Authors: Hongqiu Wang, Wei Wang, Haipeng Zhou, Huihui Xu, Shaozhi Wu, Lei Zhu

    Abstract: Traditional shadow detectors often identify all shadow regions of static images or video sequences. This work presents the Referring Video Shadow Detection (RVSD), which is an innovative task that rejuvenates the classic paradigm by facilitating the segmentation of particular shadows in videos based on descriptive natural language prompts. This novel RVSD not only achieves segmentation of arbitrar… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ACM MM 2024

  25. arXiv:2408.08500  [pdf, other

    cs.CV

    CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving

    Authors: Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan

    Abstract: Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly pla… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  26. arXiv:2408.07908  [pdf, other

    cs.NE q-bio.NC

    Time-Dependent VAE for Building Latent Factor from Visual Neural Activity with Complex Dynamics

    Authors: Liwei Huang, ZhengYu Ma, Liutao Yu, Huihui Zhou, Yonghong Tian

    Abstract: Seeking high-quality neural latent representations to reveal the intrinsic correlation between neural activity and behavior or sensory stimulation has attracted much interest. Currently, some deep latent variable models rely on behavioral information (e.g., movement direction and position) as an aid to build expressive embeddings while being restricted by fixed time scales. Visual neural activity… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  27. arXiv:2408.06146  [pdf, ps, other

    cs.DS math.CO

    Spectral Sparsification by Deterministic Discrepancy Walk

    Authors: Lap Chi Lau, Robert Wang, Hong Zhou

    Abstract: Spectral sparsification and discrepancy minimization are two well-studied areas that are closely related. Building on recent connections between these two areas, we generalize the "deterministic discrepancy walk" framework by Pesenti and Vladu [SODA~23] for vector discrepancy to matrix discrepancy, and use it to give a simpler proof of the matrix partial coloring theorem of Reis and Rothvoss [SODA… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 32 pages

  28. arXiv:2408.05717  [pdf, other

    cs.CV cs.AI

    Deformable Image Registration with Multi-scale Feature Fusion from Shared Encoder, Auxiliary and Pyramid Decoders

    Authors: Hongchao Zhou, Shunbo Hu

    Abstract: In this work, we propose a novel deformable convolutional pyramid network for unsupervised image registration. Specifically, the proposed network enhances the traditional pyramid network by adding an additional shared auxiliary decoder for image pairs. This decoder provides multi-scale high-level feature information from unblended image pairs for the registration task. During the registration proc… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  29. Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos

    Authors: Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, Hao Fu, Jinzhe Xue, Bin He

    Abstract: Learning robotic skills from raw human videos remains a non-trivial challenge. Previous works tackled this problem by leveraging behavior cloning or learning reward functions from videos. Despite their remarkable performances, they may introduce several issues, such as the necessity for robot actions, requirements for consistent viewpoints and similar layouts between human and robot videos, as wel… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Journal ref: 2024 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

  30. arXiv:2408.05328  [pdf

    cs.CL cs.AI cs.ET cs.HC econ.GN

    From Text to Insight: Leveraging Large Language Models for Performance Evaluation in Management

    Authors: Ning Li, Huaikang Zhou, Mingze Xu

    Abstract: This study explores the potential of Large Language Models (LLMs), specifically GPT-4, to enhance objectivity in organizational task performance evaluations. Through comparative analyses across two studies, including various task performance outputs, we demonstrate that LLMs can serve as a reliable and even superior alternative to human raters in evaluating knowledge-based performance outputs, whi… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 39 pages, 8 figures, 5 tables

  31. arXiv:2408.04901  [pdf, other

    cs.RO

    CTE-MLO: Continuous-time and Efficient Multi-LiDAR Odometry with Localizability-aware Point Cloud Sampling

    Authors: Hongming Shen, Zhenyu Wu, Wei Wang, Qiyang Lyu, Huiqin Zhou, Tianchen Deng, Yeqing Zhu, Danwei Wang

    Abstract: In recent years, LiDAR-based localization and mapping methods have achieved significant progress thanks to their reliable and real-time localization capability. Considering single LiDAR odometry often faces hardware failures and degradation in practical scenarios, Multi-LiDAR Odometry (MLO), as an emerging technology, is studied to enhance the performance of LiDAR-based localization and mapping sy… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  32. arXiv:2408.03284  [pdf, other

    cs.CV cs.GR cs.MM

    ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

    Authors: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu

    Abstract: Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://guanjz20.github.io/projects/ReSyncer

  33. arXiv:2408.02900  [pdf, other

    cs.CV

    MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

    Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

    Abstract: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as deta… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: The project page is at https://yunfeixie233.github.io/MedTrinity-25M

  34. arXiv:2408.02507  [pdf, other

    cs.CV

    Estimating Pore Location of PBF-LB/M Processes with Segmentation Models

    Authors: Hans Aoyang Zhou, Jan Theunissen, Marco Kemmerling, Anas Abdelrazeq, Johannes Henrich Schleifenbaum, Robert H. Schmitt

    Abstract: Reliably manufacturing defect free products is still an open challenge for Laser Powder Bed Fusion processes. Particularly, pores that occur frequently have a negative impact on mechanical properties like fatigue performance. Therefore, an accurate localisation of pores is mandatory for quality assurance, but requires time-consuming post-processing steps like computer tomography scans. Although ex… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 20 pages, 7 figures, This work has been submitted to the Journal Progress in Additive Manufacturing

  35. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  36. arXiv:2407.21714  [pdf, other

    cs.AI q-bio.QM

    UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora

    Authors: Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu

    Abstract: The abundance of intestinal flora is closely related to human diseases, but diseases are not caused by a single gut microbe. Instead, they result from the complex interplay of numerous microbial entities. This intricate and implicit connection among gut microbes poses a significant challenge for disease prediction using abundance information from OTU data. Recently, several methods have shown pote… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  37. arXiv:2407.21490  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

    Authors: Junxuan Yu, Rusi Chen, Yongsong Zhou, Yanlin Chen, Yaofei Duan, Yuhao Huang, Han Zhou, Tan Tao, Xin Yang, Dong Ni

    Abstract: Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specif… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI MLMI 2024

  38. arXiv:2407.21376  [pdf

    cs.AI

    An Extended Kalman Filter Integrated Latent Feature Model on Dynamic Weighted Directed Graphs

    Authors: Hongxun Zhou, Xiangyu Chen, Ye Yuan

    Abstract: A dynamic weighted directed graph (DWDG) is commonly encountered in various application scenarios. It involves extensive dynamic interactions among numerous nodes. Most existing approaches explore the intricate temporal patterns hidden in a DWDG from the purely data-driven perspective, which suffers from accuracy loss when a DWDG exhibits strong fluctuations over time. To address this issue, this… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  39. arXiv:2407.21320  [pdf

    cs.AI physics.flu-dyn

    MetaOpenFOAM: an LLM-based multi-agent framework for CFD

    Authors: Yuxuan Chen, Xu Zhu, Hua Zhou, Zhuyin Ren

    Abstract: Remarkable progress has been made in automated problem solving through societies of agents based on large language models (LLMs). Computational fluid dynamics (CFD), as a complex problem, presents unique challenges in automated simulations that require sophisticated solutions. MetaOpenFOAM, as a novel multi-agent collaborations framework, aims to complete CFD simulation tasks with only natural lan… ▽ More

    Submitted 7 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: 31 pages,11 figures, 11 tables

  40. arXiv:2407.21284  [pdf, other

    cs.CV cs.AI cs.LG

    Robust Box Prompt based SAM for Medical Image Segmentation

    Authors: Yuhao Huang, Xin Yang, Han Zhou, Yan Cao, Haoran Dou, Fajin Dong, Dong Ni

    Abstract: The Segment Anything Model (SAM) can achieve satisfactory segmentation performance under high-quality box prompts. However, SAM's robustness is compromised by the decline in box quality, limiting its practicality in clinical reality. In this study, we propose a novel Robust Box prompt based SAM (\textbf{RoBox-SAM}) to ensure SAM's segmentation performance under prompts with different qualities. Ou… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI MLMI 2024

  41. arXiv:2407.20600  [pdf, other

    cs.CV

    Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning

    Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu

    Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical pr… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  42. arXiv:2407.19296  [pdf, other

    cs.AI

    Multi-Modal CLIP-Informed Protein Editing

    Authors: Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu

    Abstract: Proteins govern most biological functions essential for life, but achieving controllable protein discovery and optimization remains challenging. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, 5 tables

  43. arXiv:2407.19103  [pdf, ps, other

    cs.LG cs.DC

    FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

    Authors: Chutian Jiang, Hansong Zhou, Xiaonan Zhang, Shayok Chakraborty

    Abstract: Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorate… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 18 pages, ECML 2024

  44. arXiv:2407.16337  [pdf, other

    cs.LG

    STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments

    Authors: Hao Zhou, Kun Sun, Shaoming Li, Yangfeng Fan, Guibin Jiang, Jiaqi Zheng, Tao Li

    Abstract: Online controlled experiments play a crucial role in enabling data-driven decisions across a wide range of companies. Variance reduction is an effective technique to improve the sensitivity of experiments, achieving higher statistical power while using fewer samples and shorter experimental periods. However, typical variance reduction methods (e.g., regression-adjusted estimators) are built upon t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  45. arXiv:2407.15187  [pdf, other

    cs.CV cs.GR

    HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

    Authors: Haiyang Zhou, Xinhua Cheng, Wangbo Yu, Yonghong Tian, Li Yuan

    Abstract: 3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain mul… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Homepage: https://zhouhyocean.github.io/holodreamer

  46. arXiv:2407.15111  [pdf, other

    cs.CV

    D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

    Authors: Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du

    Abstract: In this paper, we introduce D$^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoisin… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  47. arXiv:2407.14133  [pdf, other

    cs.CL

    I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction

    Authors: Zaiqiao Meng, Hao Zhou, Yifang Chen

    Abstract: Visual Language Models (VLMs) are essential for various tasks, particularly visual reasoning tasks, due to their robust multi-modal information integration, visual reasoning capabilities, and contextual awareness. However, existing \VLMs{}' visual spatial reasoning capabilities are often inadequate, struggling even with basic tasks such as distinguishing left from right. To address this, we propos… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  48. arXiv:2407.13664  [pdf, other

    cs.LG

    Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

    Authors: Hao Zhou, Rongxiao Huang, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Bing Cheng, Wei Lin

    Abstract: Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, whi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  49. arXiv:2407.13642  [pdf, other

    cs.CV

    Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

    Authors: Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

    Abstract: In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  50. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.