Search | arXiv e-print repository

Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

Authors: Chamuditha Jayanaga Galappaththige, Zachary Izzo, Xilin He, Honglu Zhou, Muhammad Haris Khan

Abstract: Unarguably, deep learning models capable of generalizing to unseen domain data while leveraging a few labels are of great practical significance due to low developmental costs. In search of this endeavor, we study the challenging problem of semi-supervised domain generalization (SSDG), where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a r… ▽ More Unarguably, deep learning models capable of generalizing to unseen domain data while leveraging a few labels are of great practical significance due to low developmental costs. In search of this endeavor, we study the challenging problem of semi-supervised domain generalization (SSDG), where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data. Domain generalization (DG) methods show subpar performance under the SSDG setting, whereas semi-supervised learning (SSL) methods demonstrate relatively better performance, however, they are considerably poor compared to the fully-supervised DG methods. Towards handling this new, but challenging problem of SSDG, we propose a novel method that can facilitate the generation of accurate pseudo-labels under various domain shifts. This is accomplished by retaining the domain-level specialism in the classifier during training corresponding to each source domain. Specifically, we first create domain-level information vectors on the fly which are then utilized to learn a domain-aware mask for modulating the classifier's weights. We provide a mathematical interpretation for the effect of this modulation procedure on both pseudo-labeling and model training. Our method is plug-and-play and can be readily applied to different SSL baselines for SSDG. Extensive experiments on six challenging datasets in two different SSDG settings show that our method provides visible gains over the various strong SSL-based SSDG baselines. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted at WACV25

arXiv:2409.00643 [pdf, other]

Learning to Singulate Objects in Packed Environments using a Dexterous Hand

Authors: Hao Jiang, Yuhai Wang, Hanyang Zhou, Daniel Seita

Abstract: Robotic object singulation, where a robot must isolate, grasp, and retrieve a target object in a cluttered environment, is a fundamental challenge in robotic manipulation. This task is difficult due to occlusions and how other objects act as obstacles for manipulation. A robot must also reason about the effect of object-object interactions as it tries to singulate the target. Prior work has explor… ▽ More Robotic object singulation, where a robot must isolate, grasp, and retrieve a target object in a cluttered environment, is a fundamental challenge in robotic manipulation. This task is difficult due to occlusions and how other objects act as obstacles for manipulation. A robot must also reason about the effect of object-object interactions as it tries to singulate the target. Prior work has explored object singulation in scenarios where there is enough free space to perform relatively long pushes to separate objects, in contrast to when space is tight and objects have little separation from each other. In this paper, we propose the Singulating Objects in Packed Environments (SOPE) framework. We propose a novel method that involves a displacement-based state representation and a multi-phase reinforcement learning procedure that enables singulation using the 16-DOF Allegro Hand. We demonstrate extensive experiments in Isaac Gym simulation, showing the ability of our system to singulate a target object in clutter. We directly transfer the policy trained in simulation to the real world. Over 250 physical robot manipulation trials, our method obtains success rates of 79.2%, outperforming alternative learning and non-learning methods. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2409.00458 [pdf, other]

doi 10.1016/j.cma.2024.117339

Dynamical system prediction from sparse observations using deep neural networks with Voronoi tessellation and physics constraint

Authors: Hanyang Wang, Hao Zhou, Sibo Cheng

Abstract: Despite the success of various methods in addressing the issue of spatial reconstruction of dynamical systems with sparse observations, spatio-temporal prediction for sparse fields remains a challenge. Existing Kriging-based frameworks for spatio-temporal sparse field prediction fail to meet the accuracy and inference time required for nonlinear dynamic prediction problems. In this paper, we intro… ▽ More Despite the success of various methods in addressing the issue of spatial reconstruction of dynamical systems with sparse observations, spatio-temporal prediction for sparse fields remains a challenge. Existing Kriging-based frameworks for spatio-temporal sparse field prediction fail to meet the accuracy and inference time required for nonlinear dynamic prediction problems. In this paper, we introduce the Dynamical System Prediction from Sparse Observations using Voronoi Tessellation (DSOVT) framework, an innovative methodology based on Voronoi tessellation which combines convolutional encoder-decoder (CED) and long short-term memory (LSTM) and utilizing Convolutional Long Short-Term Memory (ConvLSTM). By integrating Voronoi tessellations with spatio-temporal deep learning models, DSOVT is adept at predicting dynamical systems with unstructured, sparse, and time-varying observations. CED-LSTM maps Voronoi tessellations into a low-dimensional representation for time series prediction, while ConvLSTM directly uses these tessellations in an end-to-end predictive model. Furthermore, we incorporate physics constraints during the training process for dynamical systems with explicit formulas. Compared to purely data-driven models, our physics-based approach enables the model to learn physical laws within explicitly formulated dynamics, thereby enhancing the robustness and accuracy of rolling forecasts. Numerical experiments on real sea surface data and shallow water systems clearly demonstrate our framework's accuracy and computational efficiency with sparse and time-varying observations. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Journal ref: Computer Methods in Applied Mechanics and Engineering. 2024 Dec 1

arXiv:2409.00128 [pdf]

Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs

Authors: Ziyan Cui, Ning Li, Huaikang Zhou

Abstract: Artificial Intelligence (AI) is increasingly being integrated into scientific research, particularly in the social sciences, where understanding human behavior is critical. Large Language Models (LLMs) like GPT-4 have shown promise in replicating human-like responses in various psychological experiments. However, the extent to which LLMs can effectively replace human subjects across diverse experi… ▽ More Artificial Intelligence (AI) is increasingly being integrated into scientific research, particularly in the social sciences, where understanding human behavior is critical. Large Language Models (LLMs) like GPT-4 have shown promise in replicating human-like responses in various psychological experiments. However, the extent to which LLMs can effectively replace human subjects across diverse experimental contexts remains unclear. Here, we conduct a large-scale study replicating 154 psychological experiments from top social science journals with 618 main effects and 138 interaction effects using GPT-4 as a simulated participant. We find that GPT-4 successfully replicates 76.0 percent of main effects and 47.0 percent of interaction effects observed in the original studies, closely mirroring human responses in both direction and significance. However, only 19.44 percent of GPT-4's replicated confidence intervals contain the original effect sizes, with the majority of replicated effect sizes exceeding the 95 percent confidence interval of the original studies. Additionally, there is a 71.6 percent rate of unexpected significant results where the original studies reported null findings, suggesting potential overestimation or false positives. Our results demonstrate the potential of LLMs as powerful tools in psychological research but also emphasize the need for caution in interpreting AI-driven findings. While LLMs can complement human studies, they cannot yet fully replace the nuanced insights provided by human subjects. △ Less

Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced September 2024.

Comments: 5 figures, 2 tables

arXiv:2408.17027 [pdf, other]

ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

Authors: Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas

Abstract: To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rende… ▽ More To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rendering NeRF-like ray marching process. Using dense per pixel features we are able to 1) directly distill the learned priors from 2D models to 3D models and create useful 3D backbones, 2) extract more consistent and less noisy 2D features, 3) formulate a consistent embedding space where 2D, 3D, and other modalities of data (e.g., natural language prompts) can be jointly queried. Furthermore, besides dense features, ConDense can be trained to extract sparse features (e.g., key points), also with 2D-3D consistency -- condensing 3D NeRF representations into compact sets of decorated key points. We demonstrate that our pre-trained model provides good initialization for various 3D tasks including 3D classification and segmentation, outperforming other 3D pre-training methods by a significant margin. It also enables, by exploiting our sparse features, additional useful downstream tasks, such as matching 2D images to 3D scenes, detecting duplicate 3D scenes, and querying a repository of 3D scenes through natural language -- all quite efficiently and without any per-scene fine-tuning. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: ECCV 2024

arXiv:2408.16308 [pdf, other]

AdaMotif: Graph Simplification via Adaptive Motif Design

Authors: Hong Zhou, Peifeng Lai, Zhida Sun, Xiangyuan Chen, Yang Chen, Huisi Wu, Yong Wang

Abstract: With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi… ▽ More With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures via adaptive motif designs. Specifically, our approach involves partitioning a given large graph into multiple subgraphs, then clustering similar subgraphs and extracting similar structural information within each cluster. Subsequently, adaptive motifs representing each cluster are generated and utilized to replace the corresponding subgraphs, leading to a simplified visualization. Our approach aims to preserve as much information as possible from the subgraphs while simplifying the graph efficiently. Notably, our approach successfully visualizes crucial community information within a large graph. We conduct case studies and a user study using real-world graphs to validate the effectiveness of our proposed approach. The results demonstrate the capability of our approach in simplifying graphs while retaining important structural and community information. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.15741 [pdf, other]

Segmentation-guided Layer-wise Image Vectorization with Gradient Fills

Authors: Hengyu Zhou, Hui Zhang, Bin Wang

Abstract: The widespread use of vector graphics creates a significant demand for vectorization methods. While recent learning-based techniques have shown their capability to create vector images of clear topology, filling these primitives with gradients remains a challenge. In this paper, we propose a segmentation-guided vectorization framework to convert raster images into concise vector graphics with radi… ▽ More The widespread use of vector graphics creates a significant demand for vectorization methods. While recent learning-based techniques have shown their capability to create vector images of clear topology, filling these primitives with gradients remains a challenge. In this paper, we propose a segmentation-guided vectorization framework to convert raster images into concise vector graphics with radial gradient fills. With the guidance of an embedded gradient-aware segmentation subroutine, our approach progressively appends gradient-filled Bézier paths to the output, where primitive parameters are initiated with our newly designed initialization technique and are optimized to minimize our novel loss function. We build our method on a differentiable renderer with traditional segmentation algorithms to develop it as a model-free tool for raster-to-vector conversion. It is tested on various inputs to demonstrate its feasibility, independent of datasets, to synthesize vector graphics with improved visual quality and layer-wise topology compared to prior work. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.14397 [pdf, other]

Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Authors: Xiaoman Zhang, Julián N. Acosta, Hong-Yu Zhou, Pranav Rajpurkar

Abstract: Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from process… ▽ More Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from processed reports to construct a comprehensive radiology knowledge graph. We then propose three metrics to evaluate the similarity of nodes (ReXKG-NSC), distribution of edges (ReXKG-AMS), and coverage of subgraphs (ReXKG-SCS) across various knowledge graphs. We conduct an in-depth comparative analysis of AI-generated and human-written radiology reports, assessing the performance of both specialist and generalist models. Our study provides a deeper understanding of the capabilities and limitations of current AI models in radiology report generation, offering valuable insights for improving model performance and clinical applicability. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Code is available at: https://github.com/rajpurkarlab/ReXKG

arXiv:2408.13045 [pdf, other]

Adaptive complexity of log-concave sampling

Authors: Huanjian Zhou, Baoxiang Wang, Masashi Sugiyama

Abstract: In large-data applications, such as the inference process of diffusion models, it is desirable to design sampling algorithms with a high degree of parallelization. In this work, we study the adaptive complexity of sampling, which is the minimal number of sequential rounds required to achieve sampling given polynomially many queries executed in parallel at each round. For unconstrained sampling, we… ▽ More In large-data applications, such as the inference process of diffusion models, it is desirable to design sampling algorithms with a high degree of parallelization. In this work, we study the adaptive complexity of sampling, which is the minimal number of sequential rounds required to achieve sampling given polynomially many queries executed in parallel at each round. For unconstrained sampling, we examine distributions that are log-smooth or log-Lipschitz and log strongly or non-strongly concave. We show that an almost linear iteration algorithm cannot return a sample with a specific exponentially small accuracy under total variation distance. For box-constrained sampling, we show that an almost linear iteration algorithm cannot return a sample with sup-polynomially small accuracy under total variation distance for log-concave distributions. Our proof relies upon novel analysis with the characterization of the output for the hardness potentials based on the chain-like structure with random partition and classical smoothing techniques. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12590 [pdf, other]

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Authors: Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

Abstract: We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi… ▽ More We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of visual tokens and the computational demands associated with generating long-sequence videos. To further address the computational costs, we propose a divide-and-merge strategy that maintains temporal consistency across video segments. Our Diffusion Transformer (DiT) model incorporates spatial and temporal self-attention layers, enabling robust generalization across different timeframes and aspect ratios. We have devised a data processing pipeline from the very beginning and collected over 13M high-quality video-text pairs. The pipeline includes multiple steps such as clipping, text detection, motion estimation, aesthetics scoring, and dense captioning based on our in-house video-LLM model. Training the VidVAE and DiT models required approximately 40 and 642 H100 days, respectively. Our model supports over 14-second 720p video generation in an end-to-end way and demonstrates competitive performance against state-of-the-art T2V models. △ Less

Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV24 AI4VA

arXiv:2408.11947 [pdf, ps, other]

Assessing skin thermal injury risk in exposure tests of heating until flight

Authors: Hongyun Wang, Shannon E. Foley, Hong Zhou

Abstract: We assess the skin thermal injury risk in the situation where a test subject is exposed to an electromagnetic beam until the occurrence of flight action. The physical process is modeled as follows. The absorbed electromagnetic power increases the skin temperature. Wherever it is above a temperature threshold, thermal nociceptors are activated and transduce an electrical signal. When the activated… ▽ More We assess the skin thermal injury risk in the situation where a test subject is exposed to an electromagnetic beam until the occurrence of flight action. The physical process is modeled as follows. The absorbed electromagnetic power increases the skin temperature. Wherever it is above a temperature threshold, thermal nociceptors are activated and transduce an electrical signal. When the activated skin volume reaches a threshold, the flight signal is initiated. After the delay of human reaction time, the flight action is materialized when the subject moves away or the beam power is turned off. The injury risk is quantified by the thermal damage parameter calculated in the Arrhenius equation. It depends on the beam power density absorbed into the skin, which is not measurable. In addition, the volume threshold for flight initiation is unknown. To circumference these difficulties, we normalize the formulation and write the thermal damage parameter in terms of the occurrence time of flight action, which is reliably observed in exposure tests. This thermal injury formulation provides a viable framework for investigating the effects of model parameters. △ Less