Search | arXiv e-print repository

arXiv:2409.03236 [pdf, other]

Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection

Authors: Chenglizhao Chen, Xinyu Liu, Mengke Song, Luming Li, Xu Yu, Shanchen Pang

Abstract: Detecting anomalies in human-related videos is crucial for surveillance applications. Current methods primarily include appearance-based and action-based techniques. Appearance-based methods rely on low-level visual features such as color, texture, and shape. They learn a large number of pixel patterns and features related to known scenes during training, making them effective in detecting anomali… ▽ More Detecting anomalies in human-related videos is crucial for surveillance applications. Current methods primarily include appearance-based and action-based techniques. Appearance-based methods rely on low-level visual features such as color, texture, and shape. They learn a large number of pixel patterns and features related to known scenes during training, making them effective in detecting anomalies within these familiar contexts. However, when encountering new or significantly changed scenes, i.e., unknown scenes, they often fail because existing SOTA methods do not effectively capture the relationship between actions and their surrounding scenes, resulting in low generalization. In contrast, action-based methods focus on detecting anomalies in human actions but are usually less informative because they tend to overlook the relationship between actions and their scenes, leading to incorrect detection. For instance, the normal event of running on the beach and the abnormal event of running on the street might both be considered normal due to the lack of scene information. In short, current methods struggle to integrate low-level visual and high-level action features, leading to poor anomaly detection in varied and complex scenes. To address this challenge, we propose a novel decoupling-based architecture for human-related video anomaly detection (DecoAD). DecoAD significantly improves the integration of visual and action features through the decoupling and interweaving of scenes and actions, thereby enabling a more intuitive and accurate understanding of complex behaviors and scenes. DecoAD supports fully supervised, weakly supervised, and unsupervised settings. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 13pages, 9 figures

arXiv:2409.03164 [pdf, other]

A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Authors: Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

Abstract: The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequenc… ▽ More The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequency, play crucial roles in real-world applications. This paper introduces a scalable visual analysis method to explain tree ensemble classifiers that contain tens of thousands of rules. The key idea is to address the issue of losing fidelity by adaptively organizing the rules as a hierarchy rather than reducing them. To ensure the inclusion of anomalous rules, we develop an anomaly-biased model reduction method to prioritize these rules at each hierarchical level. Synergized with this hierarchical organization of rules, we develop a matrix-based hierarchical visualization to support exploration at different levels of detail. Our quantitative experiments and case studies demonstrate how our method fosters a deeper understanding of both common and anomalous rules, thereby enhancing interpretability without sacrificing comprehensiveness. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 15 pages, 10 figures

arXiv:2409.03106 [pdf, other]

Spatial Diffusion for Cell Layout Generation

Authors: Chen Li, Xiaoling Hu, Shahira Abousamra, Meilong Xu, Chao Chen

Abstract: Generative models, such as GANs and diffusion models, have been used to augment training sets and boost performances in different tasks. We focus on generative models for cell detection instead, i.e., locating and classifying cells in given pathology images. One important information that has been largely overlooked is the spatial patterns of the cells. In this paper, we propose a spatial-pattern-… ▽ More Generative models, such as GANs and diffusion models, have been used to augment training sets and boost performances in different tasks. We focus on generative models for cell detection instead, i.e., locating and classifying cells in given pathology images. One important information that has been largely overlooked is the spatial patterns of the cells. In this paper, we propose a spatial-pattern-guided generative model for cell layout generation. Specifically, a novel diffusion model guided by spatial features and generates realistic cell layouts has been proposed. We explore different density models as spatial features for the diffusion model. In downstream tasks, we show that the generated cell layouts can be used to guide the generation of high-quality pathology images. Augmenting with these images can significantly boost the performance of SOTA cell detection methods. The code is available at https://github.com/superlc1995/Diffusion-cell. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 12 pages, 4 figures, accepted by MICCAI 2024

arXiv:2409.01556 [pdf, other]

Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

Authors: Chen-Chi Chang, Ching-Yuan Chen, Hung-Shin Lee, Chih-Cheng Lee

Abstract: This study introduces a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in understanding and processing cultural knowledge, with a specific focus on Hakka culture as a case study. Leveraging Bloom's Taxonomy, the study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Apply… ▽ More This study introduces a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in understanding and processing cultural knowledge, with a specific focus on Hakka culture as a case study. Leveraging Bloom's Taxonomy, the study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. This benchmark extends beyond traditional single-dimensional evaluations by providing a deeper analysis of LLMs' abilities to handle culturally specific content, ranging from basic recall of facts to higher-order cognitive tasks such as creative synthesis. Additionally, the study integrates Retrieval-Augmented Generation (RAG) technology to address the challenges of minority cultural knowledge representation in LLMs, demonstrating how RAG enhances the models' performance by dynamically incorporating relevant external information. The results highlight the effectiveness of RAG in improving accuracy across all cognitive domains, particularly in tasks requiring precise retrieval and application of cultural knowledge. However, the findings also reveal the limitations of RAG in creative tasks, underscoring the need for further optimization. This benchmark provides a robust tool for evaluating and comparing LLMs in culturally diverse contexts, offering valuable insights for future research and development in AI-driven cultural knowledge preservation and dissemination. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Submitted to O-COCOSDA 2024

arXiv:2409.00712 [pdf, other]

Unveiling the Bandwidth Nightmare: CDN Compression Format Conversion Attacks

Authors: Ziyu Lin, Zhiwei Lin, Ximeng Liu, Zuobing Ying, Cheng Chen

Abstract: Content Delivery Networks (CDNs) are designed to enhance network performance and protect against web attack traffic for their hosting websites. And the HTTP compression request mechanism primarily aims to reduce unnecessary network transfers. However, we find that the specification failed to consider the security risks introduced when CDNs meet compression requests. In this paper, we present a nov… ▽ More Content Delivery Networks (CDNs) are designed to enhance network performance and protect against web attack traffic for their hosting websites. And the HTTP compression request mechanism primarily aims to reduce unnecessary network transfers. However, we find that the specification failed to consider the security risks introduced when CDNs meet compression requests. In this paper, we present a novel HTTP amplification attack, CDN Compression Format Convert (CDN-Convet) Attacks. It allows attackers to massively exhaust not only the outgoing bandwidth of the origin servers deployed behind CDNs but also the bandwidth of CDN surrogate nodes. We examined the CDN-Convet attacks on 11 popular CDNs to evaluate the feasibility and real-world impacts. Our experimental results show that all these CDNs are affected by the CDN-Convet attacks. We have also disclosed our findings to affected CDN providers and have received constructive feedback. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 10 pages

arXiv:2409.00116 [pdf, other]

FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

Authors: Qianyi Zhao, Chen Qu, Cen Chen, Mingyuan Fan, Yanhao Wang

Abstract: With increasing concerns and regulations on data privacy, fine-tuning pretrained language models (PLMs) in federated learning (FL) has become a common paradigm for NLP tasks. Despite being extensively studied, the existing methods for this problem still face two primary challenges. First, the huge number of parameters in large-scale PLMs leads to excessive communication and computational overhead.… ▽ More With increasing concerns and regulations on data privacy, fine-tuning pretrained language models (PLMs) in federated learning (FL) has become a common paradigm for NLP tasks. Despite being extensively studied, the existing methods for this problem still face two primary challenges. First, the huge number of parameters in large-scale PLMs leads to excessive communication and computational overhead. Second, the heterogeneity of data and tasks across clients poses a significant obstacle to achieving the desired fine-tuning performance. To address the above problems, we propose FedMCP, a novel parameter-efficient fine-tuning method with model-contrastive personalization for FL. Specifically, FedMCP adds two lightweight adapter modules, i.e., the global adapter and the private adapter, to the frozen PLMs within clients. In a communication round, each client sends only the global adapter to the server for federated aggregation. Furthermore, FedMCP introduces a model-contrastive regularization term between the two adapters. This, on the one hand, encourages the global adapter to assimilate universal knowledge and, on the other hand, the private adapter to capture client-specific knowledge. By leveraging both adapters, FedMCP can effectively provide fine-tuned personalized models tailored to individual clients. Extensive experiments on highly heterogeneous cross-task, cross-silo datasets show that FedMCP achieves substantial performance improvements over state-of-the-art FL fine-tuning approaches for PLMs. △ Less

Submitted 28 August, 2024; originally announced September 2024.

arXiv:2408.17180 [pdf, other]

Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

Authors: Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu

Abstract: How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games-is essential for enhancing gameplay and achieving balance. We have developed two advance… ▽ More How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games-is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in zero-sum competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations. Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology hinges on a simple technique to enhance codebook utilization in discrete representation with a deterministic vector quantization process for an extremely small state space. Our framework has been validated in popular online games, including Age of Empires II, Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: TMLR 09/2024 https://openreview.net/forum?id=2D36otXvBE

arXiv:2408.16706 [pdf, other]

Incremental Context-free Grammar Inference in Black Box Settings

Authors: Feifei Li, Xiao Chen, Xi Xiao, Xiaoyu Sun, Chuan Chen, Shaohua Wang, Jitao Han

Abstract: Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and… ▽ More Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and readability, primarily because they process entire example strings, adding to the complexity and substantially slowing down computations. To overcome these limitations, we propose a novel method that segments example strings into smaller units and incrementally infers the grammar. Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16673 [pdf, other]

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Authors: Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo

Abstract: Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions… ▽ More Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions that still effectively capture the data. Specifically, we develop a new distribution matching method called GEM, which solves reverse Kullback-Leibler divergence minimization with an entropy regularizer. For the SFT of Llama-3-8B models, GEM outperforms CE in several aspects. First, when applied to the UltraFeedback dataset to develop general instruction-following abilities, GEM exhibits reduced overfitting, evidenced by lower perplexity and better performance on the IFEval benchmark. Furthermore, GEM enhances output diversity, leading to performance gains of up to 7 points on math reasoning and code generation tasks using best-of-n sampling, even without domain-specific data. Second, when fine-tuning with domain-specific datasets for math reasoning and code generation, GEM also shows less overfitting and improvements of up to 10 points compared with CE. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16310 [pdf, other]

Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning

Authors: Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Kunze Huang, Xinghao Ding, Yue Huang

Abstract: Foundation models have made incredible strides in achieving zero-shot or few-shot generalization, leveraging prompt engineering to mimic the problem-solving approach of human intelligence. However, when it comes to some foundation models like Segment Anything, there is still a challenge in performing well on out-of-distribution data, including camouflaged and medical images. Inconsistent prompting… ▽ More Foundation models have made incredible strides in achieving zero-shot or few-shot generalization, leveraging prompt engineering to mimic the problem-solving approach of human intelligence. However, when it comes to some foundation models like Segment Anything, there is still a challenge in performing well on out-of-distribution data, including camouflaged and medical images. Inconsistent prompting strategies during fine-tuning and testing further compound the issue, leading to decreased performance. Drawing inspiration from how human cognition processes new environments, we introduce SlotSAM, a method that reconstructs features from the encoder in a self-supervised manner to create object-centric representations. These representations are then integrated into the foundation model, bolstering its object-level perceptual capabilities while reducing the impact of distribution-related variables. The beauty of SlotSAM lies in its simplicity and adaptability to various tasks, making it a versatile solution that significantly enhances the generalization abilities of foundation models. Through limited parameter fine-tuning in a bootstrap manner, our approach paves the way for improved generalization in novel environments. The code is available at github.com/lytang63/SlotSAM. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: This work is accepted by ECCV 2024 EVAL-FoMo Workshop

arXiv:2408.15451 [pdf, other]

Certified Causal Defense with Generalizable Robustness

Authors: Yiran Qiao, Yu Yin, Chen Chen, Jing Ma

Abstract: While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball).… ▽ More While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: Submitted to AAAI

arXiv:2408.14968 [pdf, other]

MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce

Authors: Hao Jiang, Haoxiang Zhang, Qingshan Hou, Chaofeng Chen, Weisi Lin, Jingchang Zhang, Annan Wang

Abstract: Providing high-quality item recall for text queries is crucial in large-scale e-commerce search systems. Current Embedding-based Retrieval Systems (ERS) embed queries and items into a shared low-dimensional space, but uni-modality ERS rely too heavily on textual features, making them unreliable in complex contexts. While multi-modality ERS incorporate various data sources, they often overlook indi… ▽ More Providing high-quality item recall for text queries is crucial in large-scale e-commerce search systems. Current Embedding-based Retrieval Systems (ERS) embed queries and items into a shared low-dimensional space, but uni-modality ERS rely too heavily on textual features, making them unreliable in complex contexts. While multi-modality ERS incorporate various data sources, they often overlook individual preferences for different modalities, leading to suboptimal results. To address these issues, we propose MRSE, a Multi-modality Retrieval System that integrates text, item images, and user preferences through lightweight mixture-of-expert (LMoE) modules to better align features across and within modalities. MRSE also builds user profiles at a multi-modality level and introduces a novel hybrid loss function that enhances consistency and robustness using hard negative sampling. Experiments on a large-scale dataset from Shopee and online A/B testing show that MRSE achieves an 18.9% improvement in offline relevance and a 3.7% gain in online core metrics compared to Shopee's state-of-the-art uni-modality system. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14594 [pdf, other]

MMR: Evaluating Reading Ability of Large Multimodal Models

Authors: Jian Chen, Ruiyi Zhang, Yufan Zhou, Ryan Rossi, Jiuxiang Gu, Changyou Chen

Abstract: Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many LMMs now easily achieve high scores. This means that current benchmarks fail to accurately reflect performance of different models, and a natural idea is to bui… ▽ More Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many LMMs now easily achieve high scores. This means that current benchmarks fail to accurately reflect performance of different models, and a natural idea is to build a new benchmark to evaluate their complex reasoning and spatial understanding abilities. In this work, we propose the Multi-Modal Reading (MMR) benchmark in 11 diverse tasks to evaluate LMMs for text-rich image understanding. MMR is the first text-rich image benchmark built on human annotations with the help of language models. By evaluating several state-of-the-art LMMs, including GPT-4o, it reveals the limited capabilities of existing LMMs underscoring the value of our benchmark. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.14520 [pdf, other]

Towards Graph Prompt Learning: A Survey and Beyond

Authors: Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou

Abstract: Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability ac… ▽ More Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability across various tasks. Graphs, as versatile data structures that capture relationships between entities, play pivotal roles in fields such as social network analysis, recommender systems, and biological graphs. Despite the success of pre-train and prompt learning paradigms in Natural Language Processing (NLP) and Computer Vision (CV), their application in graph domains remains nascent. In graph-structured data, not only do the node and edge features often have disparate distributions, but the topological structures also differ significantly. This diversity in graph data can lead to incompatible patterns or gaps between pre-training and fine-tuning on downstream graphs. We aim to bridge this gap by summarizing methods for alleviating these disparities. This includes exploring prompt design methodologies, comparing related techniques, assessing application scenarios and datasets, and identifying unresolved problems and challenges. This survey categorizes over 100 relevant works in this field, summarizing general design principles and the latest applications, including text-attributed graphs, molecules, proteins, and recommendation systems. Through this extensive review, we provide a foundational understanding of graph prompt learning, aiming to impact not only the graph mining community but also the broader Artificial General Intelligence (AGI) community. △ Less

Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

Comments: 19 pages, 2 figures

arXiv:2408.14393 [pdf, other]

CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

Authors: Chaochao Chen, Jiaming Zhang, Yizhao Zhang, Li Zhang, Lingjuan Lyu, Yuyuan Li, Biao Gong, Chenggang Yan

Abstract: With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in… ▽ More With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in recommendation unlearning, evaluating unlearning methods comprehensively remains challenging due to the absence of a unified evaluation framework and overlooked aspects of deeper influence, e.g., fairness. To address these gaps, we propose CURE4Rec, the first comprehensive benchmark for recommendation unlearning evaluation. CURE4Rec covers four aspects, i.e., unlearning Completeness, recommendation Utility, unleaRning efficiency, and recommendation fairnEss, under three data selection strategies, i.e., core data, edge data, and random data. Specifically, we consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels. We construct multiple datasets with CURE4Rec evaluation and conduct extensive experiments on existing recommendation unlearning methods. Our code is released at https://github.com/xiye7lai/CURE4Rec. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.14279 [pdf, other]

Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes

Authors: Chao Chen, Yu-Shen Liu, Zhizhong Han

Abstract: It is challenging to reconstruct 3D point clouds in unseen classes from single 2D images. Instead of object-centered coordinate system, current methods generalized global priors learned in seen classes to reconstruct 3D shapes from unseen classes in viewer-centered coordinate system. However, the reconstruction accuracy and interpretability are still eager to get improved. To resolve this issue, w… ▽ More It is challenging to reconstruct 3D point clouds in unseen classes from single 2D images. Instead of object-centered coordinate system, current methods generalized global priors learned in seen classes to reconstruct 3D shapes from unseen classes in viewer-centered coordinate system. However, the reconstruction accuracy and interpretability are still eager to get improved. To resolve this issue, we introduce to learn local pattern modularization for reconstructing 3D shapes in unseen classes, which achieves both good generalization ability and high reconstruction accuracy. Our insight is to learn a local prior which is class-agnostic and easy to generalize in object-centered coordinate system. Specifically, the local prior is learned via a process of learning and customizing local pattern modularization in seen classes. During this process, we first learn a set of patterns in local regions, which is the basis in the object-centered coordinate system to represent an arbitrary region on shapes across different classes. Then, we modularize each region on an initially reconstructed shape using the learned local patterns. Based on that, we customize the local pattern modularization using the input image by refining the reconstruction with more details. Our method enables to reconstruct high fidelity point clouds from unseen classes in object-centered coordinate system without requiring a large number of patterns or any additional information, such as segmentation supervision or camera poses. Our experimental results under widely used benchmarks show that our method achieves the state-of-the-art reconstruction accuracy for shapes from unseen classes. The code is available at https://github.com/chenchao15/Unseen. △ Less

Submitted 4 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

Comments: 14pages, 11figures, accepted by ECCV 2024

arXiv:2408.14119 [pdf, other]

Contrastive Learning Subspace for Text Clustering

Authors: Qian Yong, Chen Chen, Xiabing Zhou

Abstract: Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel… ▽ More Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13735 [pdf, other]

MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation

Authors: Chaowei Chen, Li Yu, Shiquan Min, Shunfang Wang

Abstract: State Space Models (SSMs), especially Mamba, have shown great promise in medical image segmentation due to their ability to model long-range dependencies with linear computational complexity. However, accurate medical image segmentation requires the effective learning of both multi-scale detailed feature representations and global contextual dependencies. Although existing works have attempted to… ▽ More State Space Models (SSMs), especially Mamba, have shown great promise in medical image segmentation due to their ability to model long-range dependencies with linear computational complexity. However, accurate medical image segmentation requires the effective learning of both multi-scale detailed feature representations and global contextual dependencies. Although existing works have attempted to address this issue by integrating CNNs and SSMs to leverage their respective strengths, they have not designed specialized modules to effectively capture multi-scale feature representations, nor have they adequately addressed the directional sensitivity problem when applying Mamba to 2D image data. To overcome these limitations, we propose a Multi-Scale Vision Mamba UNet model for medical image segmentation, termed MSVM-UNet. Specifically, by introducing multi-scale convolutions in the VSS blocks, we can more effectively capture and aggregate multi-scale feature representations from the hierarchical features of the VMamba encoder and better handle 2D visual data. Additionally, the large kernel patch expanding (LKPE) layers achieve more efficient upsampling of feature maps by simultaneously integrating spatial and channel information. Extensive experiments on the Synapse and ACDC datasets demonstrate that our approach is more effective than some state-of-the-art methods in capturing and aggregating multi-scale feature representations and modeling long-range dependencies between pixels. △ Less

Submitted 25 August, 2024; originally announced August 2024.

Comments: 8 pages, 5 figures

arXiv:2408.13454 [pdf, other]

AdaOcc: Adaptive-Resolution Occupancy Prediction

Authors: Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

Abstract: Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computationa… ▽ More Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12935 [pdf, other]

Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations

Authors: Chen Chen, Ziyao Liu, Weifeng Jiang, Goh Si Qi, KwoK-Yan Lam

Abstract: AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety… ▽ More AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety and national security. In this paper, we propose a novel architectural framework for understanding and analyzing AI Safety; defining its characteristics from three perspectives: Trustworthy AI, Responsible AI, and Safe AI. We provide an extensive review of current research and advancements in AI safety from these perspectives, highlighting their key challenges and mitigation approaches. Through examples from state-of-the-art technologies, particularly Large Language Models (LLMs), we present innovative mechanism, methodologies, and techniques for designing and testing AI safety. Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12085 [pdf, other]

Controllability and Observability of Temporal Hypergraphs

Authors: Anqi Dong, Xin Mao, Can Chen

Abstract: Numerous complex systems, such as those arisen in ecological networks, genomic contact networks, and social networks, exhibit higher-order and time-varying characteristics, which can be effectively modeled using temporal hypergraphs. However, analyzing and controlling temporal hypergraphs poses significant challenges due to their inherent time-varying and nonlinear nature, while most existing meth… ▽ More Numerous complex systems, such as those arisen in ecological networks, genomic contact networks, and social networks, exhibit higher-order and time-varying characteristics, which can be effectively modeled using temporal hypergraphs. However, analyzing and controlling temporal hypergraphs poses significant challenges due to their inherent time-varying and nonlinear nature, while most existing methods predominantly target static hypergraphs. In this article, we generalize the notions of controllability and observability to temporal hypergraphs by leveraging tensor and nonlinear systems theory. Specifically, we establish tensor-based rank conditions to determine the weak controllability and observability of temporal hypergraphs. The proposed framework is further demonstrated with synthetic and real-world examples. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 6 pages, 3 figures

MSC Class: 93Bxx; 92Dxx; 05C65

arXiv:2408.11950 [pdf]

Evaluation of Hash Algorithm Performance for Cryptocurrency Exchanges Based on Blockchain System

Authors: Abel C. H. Chen

Abstract: The blockchain system has emerged as one of the focal points of research in recent years, particularly in applications and services such as cryptocurrencies and smart contracts. In this context, the hash value serves as a crucial element in linking blocks within the blockchain, ensuring the integrity of block contents. Therefore, hash algorithms represent a vital security technology for ensuring t… ▽ More The blockchain system has emerged as one of the focal points of research in recent years, particularly in applications and services such as cryptocurrencies and smart contracts. In this context, the hash value serves as a crucial element in linking blocks within the blockchain, ensuring the integrity of block contents. Therefore, hash algorithms represent a vital security technology for ensuring the integrity and security of blockchain systems. This study primarily focuses on analyzing the security and execution efficiency of mainstream hash algorithms in the Proof of Work (PoW) calculations within blockchain systems. It proposes an evaluation factor and conducts comparative experiments to evaluate each hash algorithm. The experimental results indicate that there are no significant differences in the security aspects among SHA-2, SHA-3, and BLAKE2. However, SHA-2 and BLAKE2 demonstrate shorter computation times, indicating higher efficiency in execution. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.11499 [pdf, other]

doi 10.1145/3689635

Power-Domain Interference Graph Estimation for Multi-hop BLE Networks

Authors: Haifeng Jia, Yichen Wei, Yibo Pi, Cailian Chen

Abstract: Traditional wisdom for network management allocates network resources separately for the measurement and communication tasks. Heavy measurement tasks may compete limited resources with communication tasks and significantly degrade overall network performance. It is therefore challenging for the interference graph, deemed as incurring heavy measurement overhead, to be used in practice in wireless n… ▽ More Traditional wisdom for network management allocates network resources separately for the measurement and communication tasks. Heavy measurement tasks may compete limited resources with communication tasks and significantly degrade overall network performance. It is therefore challenging for the interference graph, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address this challenge in wireless sensor networks, our core insight is to use power as a new dimension for interference graph estimation (IGE) such that IGE can be done simultaneously with the communication tasks using the same frequency-time resources. We propose to marry power-domain IGE with concurrent flooding to achieve simultaneous measurement and communication in BLE networks, where the power linearity prerequisite for power-domain IGE holds naturally true in concurrent flooding. With extensive experiments, we conclude the necessary conditions for the power linearity to hold and analyze several nonlinearity issues of power related to hardware imperfections. We design and implement network protocols and power control algorithms for IGE in multi-hop BLE networks and conduct experiments to show that the marriage is mutually beneficial for both IGE and concurrent flooding. Furthermore, we demonstrate the potential of IGE in improving channel map convergence and convergecast in BLE networks. △ Less

Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: This paper is accepted for publication in the ACM Transactions on Sensor Networks (TOSN), and is an extension of our conference paper accepted at EWSN'23 (arXiv:2312.16807)

arXiv:2408.11173 [pdf, other]

Delegation with Trust<T>: A Scalable, Type- and Memory-Safe Alternative to Locks

Authors: Noaman Ahmad, Ben Baenen, Chen Chen, Jakob Eriksson

Abstract: We present Trust<T>, a general, type- and memory-safe alternative to locking in concurrent programs. Instead of synchronizing multi-threaded access to an object of type T with a lock, the programmer may place the object in a Trust<T>. The object is then no longer directly accessible. Instead a designated thread, the object's trustee, is responsible for applying any requested operations to the obje… ▽ More We present Trust<T>, a general, type- and memory-safe alternative to locking in concurrent programs. Instead of synchronizing multi-threaded access to an object of type T with a lock, the programmer may place the object in a Trust<T>. The object is then no longer directly accessible. Instead a designated thread, the object's trustee, is responsible for applying any requested operations to the object, as requested via the Trust<T> API. Locking is often said to offer a limited throughput per lock. Trust<T> is based on delegation, a message-passing technique which does not suffer this per-lock limitation. Instead, per-object throughput is limited by the capacity of the object's trustee, which is typically considerably higher. Our evaluation shows Trust<T> consistently and considerably outperforming locking where lock contention exists, with up to 22x higher throughput in microbenchmarks, and 5-9x for a home grown key-value store, as well as memcached, in situations with high lock contention. Moreover, Trust<T> is competitive with locks even in the absence of lock contention. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.10901 [pdf, other]

A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

Authors: Zhongliang Guo, Lei Fang, Jingyu Lin, Yifei Qian, Shuai Zhao, Zeyu Wang, Junhao Dong, Cunjian Chen, Ognjen Arandjelović, Chun Pong Lau

Abstract: Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techn… ▽ More Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techniques as a benign metric to prevent the underlying misuse of generative AI. Current approaches to safeguarding images from manipulation by LDMs are limited by their reliance on model-specific knowledge and their inability to significantly degrade semantic quality of generated images. In response to these shortcomings, we propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge. By accessing merely a small amount of LDM parameters, in specific merely the VAE encoder of LDMs, our method causes a substantial semantic collapse in generation quality, particularly in perceptual consistency, and demonstrates strong transferability across various model architectures. Experimental results show that PCA achieves superior perturbation effects on image generation of LDMs with lower runtime and VRAM. Our method outperforms existing techniques, offering a more robust and generalizable solution that is helpful in alleviating the socio-technical challenges posed by the rapidly evolving landscape of generative AI. △ Less

Submitted 2 September, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 21 pages, 7 figures, 10 tables

arXiv:2408.10673 [pdf, other]

Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification

Authors: Hanrui Wang, Ruoxi Sun, Cunjian Chen, Minhui Xue, Lay-Ki Soon, Shuo Wang, Zhe Jin

Abstract: Face authentication systems have brought significant convenience and advanced developments, yet they have become unreliable due to their sensitivity to inconspicuous perturbations, such as adversarial attacks. Existing defenses often exhibit weaknesses when facing various attack algorithms and adaptive attacks or compromise accuracy for enhanced security. To address these challenges, we have devel… ▽ More Face authentication systems have brought significant convenience and advanced developments, yet they have become unreliable due to their sensitivity to inconspicuous perturbations, such as adversarial attacks. Existing defenses often exhibit weaknesses when facing various attack algorithms and adaptive attacks or compromise accuracy for enhanced security. To address these challenges, we have developed a novel and highly efficient non-deep-learning-based image filter called the Iterative Window Mean Filter (IWMF) and proposed a new framework for adversarial purification, named IWMF-Diff, which integrates IWMF and denoising diffusion models. These methods can function as pre-processing modules to eliminate adversarial perturbations without necessitating further modifications or retraining of the target system. We demonstrate that our proposed methodologies fulfill four critical requirements: preserved accuracy, improved security, generalizability to various threats in different settings, and better resistance to adaptive attacks. This performance surpasses that of the state-of-the-art adversarial purification method, DiffPure. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Under review

arXiv:2408.10556 [pdf, other]

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Authors: Yun Qu, Boyuan Wang, Jianzhun Shao, Yuhang Jiang, Chen Chen, Zhenbin Ye, Lin Liu, Junfeng Yang, Lin Lai, Hongyang Qin, Minwen Deng, Juchao Zhuo, Deheng Ye, Qiang Fu, Wei Yang, Guang Yang, Lanxiao Huang, Xiangyang Ji

Abstract: The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehens… ▽ More The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.10349 [pdf, other]

AIR: Analytic Imbalance Rectifier for Continual Learning

Authors: Di Fang, Yinan Zhu, Runze Fang, Cen Chen, Ziqian Zeng, Huiping Zhuang

Abstract: Continual learning enables AI models to learn new data sequentially without retraining in real-world scenarios. Most existing methods assume the training data are balanced, aiming to reduce the catastrophic forgetting problem that models tend to forget previously generated data. However, data imbalance and the mixture of new and old data in real-world scenarios lead the model to ignore categories… ▽ More Continual learning enables AI models to learn new data sequentially without retraining in real-world scenarios. Most existing methods assume the training data are balanced, aiming to reduce the catastrophic forgetting problem that models tend to forget previously generated data. However, data imbalance and the mixture of new and old data in real-world scenarios lead the model to ignore categories with fewer training samples. To solve this problem, we propose an analytic imbalance rectifier algorithm (AIR), a novel online exemplar-free continual learning method with an analytic (i.e., closed-form) solution for data-imbalanced class-incremental learning (CIL) and generalized CIL scenarios in real-world continual learning. AIR introduces an analytic re-weighting module (ARM) that calculates a re-weighting factor for each class for the loss function to balance the contribution of each category to the overall loss and solve the problem of imbalanced training data. AIR uses the least squares technique to give a non-discriminatory optimal classifier and its iterative update method in continual learning. Experimental results on multiple datasets show that AIR significantly outperforms existing methods in long-tailed and generalized CIL scenarios. The source code is available at https://github.com/fang-d/AIR. △ Less

Submitted 19 August, 2024; originally announced August 2024.

ACM Class: I.2.6

arXiv:2408.09865 [pdf, other]

MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation

Authors: Ching-Wen Yang, Che Wei Chen, Kun-da Wu, Hao Xu, Jui-Feng Yao, Hung-Yu Kao

Abstract: Explainable Recommendation task is designed to receive a pair of user and item and output explanations to justify why an item is recommended to a user. Many models treat review-generation as a proxy of explainable recommendation. Although they are able to generate fluent and grammatical sentences, they suffer from generality and hallucination issues. We propose a personalized, aspect-controlled mo… ▽ More Explainable Recommendation task is designed to receive a pair of user and item and output explanations to justify why an item is recommended to a user. Many models treat review-generation as a proxy of explainable recommendation. Although they are able to generate fluent and grammatical sentences, they suffer from generality and hallucination issues. We propose a personalized, aspect-controlled model called Multi-Aspect Prompt LEarner (MAPLE), in which it integrates aspect category as another input dimension to facilitate the memorization of fine-grained aspect terms. Experiments on two real-world review datasets in restaurant domain show that MAPLE outperforms the baseline review-generation models in terms of text and feature diversity while maintaining excellent coherence and factual relevance. We further treat MAPLE as a retriever component in the retriever-reader framework and employ a Large-Language Model (LLM) as the reader, showing that MAPLE's explanation along with the LLM's comprehension ability leads to enriched and personalized explanation as a result. We will release the code and data in this http upon acceptance. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 8 main pages, 10 pages for appendix. Under review

arXiv:2408.09393 [pdf, other]

Federated Graph Learning with Structure Proxy Alignment

Authors: Xingbo Fu, Zihan Chen, Binchi Zhang, Chen Chen, Jundong Li

Abstract: Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph dat… ▽ More Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph data across clients. For instance, a client can have the majority of nodes from a class, while another client may have only a few nodes from the same class. This issue results in divergent local objectives and impairs FGL convergence for node-level tasks, especially for node classification. Moreover, FGL also encounters a unique challenge for the node classification task: the nodes from a minority class in a client are more likely to have biased neighboring information, which prevents FGL from learning expressive node embeddings with Graph Neural Networks (GNNs). To grapple with the challenge, we propose FedSpray, a novel FGL framework that learns local class-wise structure proxies in the latent space and aligns them to obtain global structure proxies in the server. Our goal is to obtain the aligned structure proxies that can serve as reliable, unbiased neighboring information for node classification. To achieve this, FedSpray trains a global feature-structure encoder and generates unbiased soft targets with structure proxies to regularize local training of GNN models in a personalized way. We conduct extensive experiments over four datasets, and experiment results validate the superiority of FedSpray compared with other baselines. Our code is available at https://github.com/xbfu/FedSpray. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: Accepted by KDD 2024

arXiv:2408.09362 [pdf, other]

Angle of Arrival Estimation with Transformer: A Sparse and Gridless Method with Zero-Shot Capability

Authors: Zhaoxuan Zhu, Chulong Chen, Bo Yang

Abstract: Automotive Multiple-Input Multiple-Output (MIMO) radars have gained significant traction in Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles (AV) due to their cost-effectiveness, resilience to challenging operating conditions, and extended detection range. To fully leverage the advantages of MIMO radars, it is crucial to develop an Angle of Arrival (AOA) algorithm that delivers hi… ▽ More Automotive Multiple-Input Multiple-Output (MIMO) radars have gained significant traction in Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles (AV) due to their cost-effectiveness, resilience to challenging operating conditions, and extended detection range. To fully leverage the advantages of MIMO radars, it is crucial to develop an Angle of Arrival (AOA) algorithm that delivers high performance with reasonable computational workload. This work introduces AAETR (Angle of Arrival Estimation with TRansformer) for high performance gridless AOA estimation. Comprehensive evaluations across various signal-to-noise ratios (SNRs) and multi-target scenarios demonstrate AAETR's superior performance compared to super resolution AOA algorithms such as Iterative Adaptive Approach (IAA). The proposed architecture features efficient, scalable, sparse and gridless angle-finding capability, overcoming the issues of high computational cost and straddling loss in SNR associated with grid-based IAA. AAETR requires fewer tunable hyper-parameters and is end-to-end trainable in a deep learning radar perception pipeline. When trained on large-scale simulated datasets then evaluated on real dataset, AAETR exhibits remarkable zero-shot sim-to-real transferability and emergent sidelobe suppression capability. This highlights the effectiveness of the proposed approach and its potential as a drop-in module in practical systems. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 8 pages, 8 figures

arXiv:2408.08946 [pdf, other]

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges

Authors: Baixiang Huang, Canyu Chen, Kai Shu

Abstract: Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have bl… ▽ More Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We presents a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 12 pages for the main paper. More resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io

arXiv:2408.08872 [pdf, other]

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Authors: Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles , et al. (2 additional authors not shown)

Abstract: This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tas… ▽ More This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our pre-trained base model exhibits strong in-context learning capabilities and the instruction-tuned model demonstrates competitive performance among open-source LMMs with similar model sizes. In addition, we introduce a safety-tuned model with DPO, aiming to mitigate harmful behaviors such as hallucinations and improve safety. We open-source our models, curated large-scale datasets, and our fine-tuning codebase to facilitate further advancements in LMM research. Associated resources will be available on our project page above. △ Less

Submitted 28 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08342 [pdf, other]

CT4D: Consistent Text-to-4D Generation with Animatable Meshes

Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong

Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user… ▽ More Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. The primary challenges of our mesh-based framework involve stably generating a mesh with details that align with the text prompt while directly driving it and maintaining surface continuity. Our CT4D framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. To improve surface continuity, we divide a mesh into several smaller regions and implement a uniform driving function within each area. Additionally, we constrain the animating stage with a rigidity regulation to ensure cross-region continuity. Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry. Furthermore, we showcase that this enhanced representation inherently possesses the capability for combinational 4D generation and texture editing. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08208 [pdf, other]

LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation

Authors: Bohao Wang, Feng Liu, Jiawei Chen, Yudi Wu, Xingyu Lou, Jun Wang, Yan Feng, Chun Chen, Can Wang

Abstract: Sequential recommendation systems fundamentally rely on users' historical interaction sequences, which are often contaminated by noisy interactions. Identifying these noisy interactions accurately without additional information is particularly difficult due to the lack of explicit supervisory signals to denote noise. Large Language Models (LLMs), equipped with extensive open knowledge and semantic… ▽ More Sequential recommendation systems fundamentally rely on users' historical interaction sequences, which are often contaminated by noisy interactions. Identifying these noisy interactions accurately without additional information is particularly difficult due to the lack of explicit supervisory signals to denote noise. Large Language Models (LLMs), equipped with extensive open knowledge and semantic reasoning abilities, present a promising avenue to bridge this information gap. However, employing LLMs for denoising in sequential recommendation introduces notable challenges: 1) Direct application of pretrained LLMs may not be competent for the denoising task, frequently generating nonsensical responses; 2) Even after fine-tuning, the reliability of LLM outputs remains questionable, especially given the complexity of the task and th inherent hallucinatory issue of LLMs. To tackle these challenges, we propose LLM4DSR, a tailored approach for denoising sequential recommendation using LLMs. We constructed a self-supervised fine-tuning task to activate LLMs' capabilities to identify noisy items and suggest replacements. Furthermore, we developed an uncertainty estimation module that ensures only high-confidence responses are utilized for sequence corrections. Remarkably, LLM4DSR is model-agnostic, allowing the corrected sequences to be flexibly applied across various recommendation models. Extensive experiments validate the superiority of LLM4DSR over existing methods across three datasets and three recommendation backbones. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08205 [pdf, other]

doi 10.1145/3665496

A Multi-task Adversarial Attack Against Face Authentication

Authors: Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, Zhe Jin

Abstract: Deep-learning-based identity management systems, such as face authentication systems, are vulnerable to adversarial attacks. However, existing attacks are typically designed for single-task purposes, which means they are tailored to exploit vulnerabilities unique to the individual target rather than being adaptable for multiple users or systems. This limitation makes them unsuitable for certain at… ▽ More Deep-learning-based identity management systems, such as face authentication systems, are vulnerable to adversarial attacks. However, existing attacks are typically designed for single-task purposes, which means they are tailored to exploit vulnerabilities unique to the individual target rather than being adaptable for multiple users or systems. This limitation makes them unsuitable for certain attack scenarios, such as morphing, universal, transferable, and counter attacks. In this paper, we propose a multi-task adversarial attack algorithm called MTADV that are adaptable for multiple users or systems. By interpreting these scenarios as multi-task attacks, MTADV is applicable to both single- and multi-task attacks, and feasible in the white- and gray-box settings. Furthermore, MTADV is effective against various face datasets, including LFW, CelebA, and CelebA-HQ, and can work with different deep learning models, such as FaceNet, InsightFace, and CurricularFace. Importantly, MTADV retains its feasibility as a single-task attack targeting a single user/system. To the best of our knowledge, MTADV is the first adversarial attack method that can target all of the aforementioned scenarios in one algorithm. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: Accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

arXiv:2408.08066 [pdf, other]

Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval

Authors: Hanqi Zhang, Chong Chen, Lang Mei, Qi Liu, Jiaxin Mao

Abstract: In the information retrieval (IR) area, dense retrieval (DR) models use deep learning techniques to encode queries and passages into embedding space to compute their semantic relations. It is important for DR models to balance both efficiency and effectiveness. Pre-trained language models (PLMs), especially Transformer-based PLMs, have been proven to be effective encoders of DR models. However, th… ▽ More In the information retrieval (IR) area, dense retrieval (DR) models use deep learning techniques to encode queries and passages into embedding space to compute their semantic relations. It is important for DR models to balance both efficiency and effectiveness. Pre-trained language models (PLMs), especially Transformer-based PLMs, have been proven to be effective encoders of DR models. However, the self-attention component in Transformer-based PLM results in a computational complexity that grows quadratically with sequence length, and thus exhibits a slow inference speed for long-text retrieval. Some recently proposed non-Transformer PLMs, especially the Mamba architecture PLMs, have demonstrated not only comparable effectiveness to Transformer-based PLMs on generative language tasks but also better efficiency due to linear time scaling in sequence length. This paper implements the Mamba Retriever to explore whether Mamba can serve as an effective and efficient encoder of DR model for IR tasks. We fine-tune the Mamba Retriever on the classic short-text MS MARCO passage ranking dataset and the long-text LoCoV0 dataset. Experimental results show that (1) on the MS MARCO passage ranking dataset and BEIR, the Mamba Retriever achieves comparable or better effectiveness compared to Transformer-based retrieval models, and the effectiveness grows with the size of the Mamba model; (2) on the long-text LoCoV0 dataset, the Mamba Retriever can extend to longer text length than its pre-trained length after fine-tuning on retrieval task, and it has comparable or better effectiveness compared to other long-text retrieval models; (3) the Mamba Retriever has superior inference speed for long-text retrieval. In conclusion, Mamba Retriever is both effective and efficient, making it a practical model, especially for long-text retrieval. △ Less

Submitted 22 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07999 [pdf, other]

Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement

Authors: Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen

Abstract: In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate… ▽ More In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate this issue, we propose a novel approach, Co-Fix3D, which employs a collaborative hybrid multi-stage parallel query generation mechanism for BEV representations. Our method incorporates the Local-Global Feature Enhancement (LGE) module, which refines BEV features to more effectively highlight weak positive samples. It uniquely leverages the Discrete Wavelet Transform (DWT) for accurate noise reduction and features refinement in localized areas, and incorporates an attention mechanism to more comprehensively optimize global BEV features. Moreover, our method increases the volume of BEV queries through a multi-stage parallel processing of the LGE, significantly enhancing the probability of selecting weak positive samples. This enhancement not only improves training efficiency within the decoder framework but also boosts overall system performance. Notably, Co-Fix3D achieves superior results on the stringent nuScenes benchmark, outperforming all previous models with a 69.1% mAP and 72.9% NDS on the LiDAR-based benchmark, and 72.3% mAP and 74.1% NDS on the multi-modality benchmark, without relying on test-time augmentation or additional datasets. The source code will be made publicly available upon acceptance. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07703 [pdf, other]

Knowledge Distillation with Refined Logits

Authors: Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang

Abstract: Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods. Our approach is motivated by the observation that even high-performing teacher models can make incorrect… ▽ More Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods. Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions, creating a conflict between the standard distillation loss and the cross-entropy loss. This conflict can undermine the consistency of the student model's learning objectives. Previous attempts to use labels to empirically correct teacher predictions may undermine the class correlation. In contrast, our RLD employs labeling information to dynamically refine teacher logits. In this way, our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations, thus enhancing the value and efficiency of distilled knowledge. Experimental results on CIFAR-100 and ImageNet demonstrate its superiority over existing methods. The code is provided at \text{https://github.com/zju-SWJ/RLD}. △ Less

Submitted 19 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: 11 pages, 7 figures

arXiv:2408.07536 [pdf, other]

Context-aware Container Orchestration in Serverless Edge Computing

Authors: Peiyuan Guan, Chen Chen, Ziru Chen, Lin X. Cai, Xing Hao, Amir Taherkordi

Abstract: Adopting serverless computing to edge networks benefits end-users from the pay-as-you-use billing model and flexible scaling of applications. This paradigm extends the boundaries of edge computing and remarkably improves the quality of services. However, due to the heterogeneous nature of computing and bandwidth resources in edge networks, it is challenging to dynamically allocate different resour… ▽ More Adopting serverless computing to edge networks benefits end-users from the pay-as-you-use billing model and flexible scaling of applications. This paradigm extends the boundaries of edge computing and remarkably improves the quality of services. However, due to the heterogeneous nature of computing and bandwidth resources in edge networks, it is challenging to dynamically allocate different resources while adapting to the burstiness and high concurrency in serverless workloads. This article focuses on serverless function provisioning in edge networks to optimize end-to-end latency, where the challenge lies in jointly allocating wireless bandwidth and computing resources among heterogeneous computing nodes. To address this challenge, We devised a context-aware learning framework that adaptively orchestrates a wide spectrum of resources and jointly considers them to avoid resource fragmentation. Extensive simulation results justified that the proposed algorithm reduces over 95% of converge time while the end-to-end delay is comparable to the state of the art. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by the IEEE GLOBECOM 2024 Conference

arXiv:2408.06995 [pdf, other]

Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models

Authors: Cheng Chen, Christina Giannoula, Andreas Moshovos

Abstract: Diffusion models are emerging models that generate images by iteratively denoising random Gaussian noise using deep neural networks. These models typically exhibit high computational and memory demands, necessitating effective post-training quantization for high-performance inference. Recent works propose low-bitwidth (e.g., 8-bit or 4-bit) quantization for diffusion models, however 4-bit integer… ▽ More Diffusion models are emerging models that generate images by iteratively denoising random Gaussian noise using deep neural networks. These models typically exhibit high computational and memory demands, necessitating effective post-training quantization for high-performance inference. Recent works propose low-bitwidth (e.g., 8-bit or 4-bit) quantization for diffusion models, however 4-bit integer quantization typically results in low-quality images. We observe that on several widely used hardware platforms, there is little or no difference in compute capability between floating-point and integer arithmetic operations of the same bitwidth (e.g., 8-bit or 4-bit). Therefore, we propose an effective floating-point quantization method for diffusion models that provides better image quality compared to integer quantization methods. We employ a floating-point quantization method that was effective for other processing tasks, specifically computer vision and natural language tasks, and tailor it for diffusion models by integrating weight rounding learning during the mapping of the full-precision values to the quantized values in the quantization process. We comprehensively study integer and floating-point quantization methods in state-of-the-art diffusion models. Our floating-point quantization method not only generates higher-quality images than that of integer quantization methods, but also shows no noticeable degradation compared to full-precision models (32-bit floating-point), when both weights and activations are quantized to 8-bit floating-point values, while has minimal degradation with 4-bit weights and 8-bit activations. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06303 [pdf, other]

Long-Form Answers to Visual Questions from Blind and Low Vision People

Authors: Mina Huh, Fangyuan Xu, Yi-Hao Peng, Chongyan Chen, Hansika Murugu, Danna Gurari, Eunsol Choi, Amy Pavel

Abstract: Vision language models can now generate long-form answers to questions about images - long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate fu… ▽ More Vision language models can now generate long-form answers to questions about images - long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate functional roles of sentences of LFVQA and demonstrate that long-form answers contain information beyond the question answer such as explanations and suggestions. We further conduct automatic and human evaluations with BLV and sighted people to evaluate long-form answers. BLV people perceive both human-written and generated long-form answers to be plausible, but generated answers often hallucinate incorrect visual details, especially for unanswerable visual questions (e.g., blurry or irrelevant images). To reduce hallucinations, we evaluate the ability of VQA models to abstain from answering unanswerable questions across multiple prompting strategies. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: COLM 2024

arXiv:2408.05002 [pdf, other]

An Empirical Study on Challenges for LLM Developers

Authors: Xiang Chen, Chaoyang Gao, Chunyang Chen, Guangbei Zhang, Yong Liu

Abstract: In recent years, large language models (LLMs) have seen rapid advancements, significantly impacting various fields such as natural language processing, and software engineering. These LLMs, exemplified by OpenAI's ChatGPT, have revolutionized the way we approach language understanding and generation tasks. However, in contrast to traditional software development practices, LLM development introduc… ▽ More In recent years, large language models (LLMs) have seen rapid advancements, significantly impacting various fields such as natural language processing, and software engineering. These LLMs, exemplified by OpenAI's ChatGPT, have revolutionized the way we approach language understanding and generation tasks. However, in contrast to traditional software development practices, LLM development introduces new challenges for AI developers in design, implementation, and deployment. These challenges span different areas (such as prompts, APIs, and plugins), requiring developers to navigate unique methodologies and considerations specific to LLM development. Despite the profound influence of LLMs, to the best of our knowledge, these challenges have not been thoroughly investigated in previous empirical studies. To fill this gap, we present the first comprehensive study on understanding the challenges faced by LLM developers. Specifically, we crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum. We first examine their popularity and difficulty. After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers. Based on this taxonomy, we summarize a set of findings and actionable implications for LLM-related stakeholders, including developers and providers (especially the OpenAI organization). △ Less

Submitted 11 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

Comments: 29 pages, 15 figures

arXiv:2408.04883 [pdf, other]

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation

Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

Abstract: Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring sp… ▽ More Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: Accepted to ECCV 2024. Code available at https://github.com/mc-lan/ProxyCLIP

arXiv:2408.04798 [pdf, other]

Manipulable Semantic Components: a Computational Representation of Data Visualization Scenes

Authors: Zhicheng Liu, Chen Chen, John Hooker

Abstract: Various data visualization applications such as reverse engineering and interactive authoring require a vocabulary that describes the structure of visualization scenes and the procedure to manipulate them. A few scene abstractions have been proposed, but they are restricted to specific applications for a limited set of visualization types. A unified and expressive model of data visualization scene… ▽ More Various data visualization applications such as reverse engineering and interactive authoring require a vocabulary that describes the structure of visualization scenes and the procedure to manipulate them. A few scene abstractions have been proposed, but they are restricted to specific applications for a limited set of visualization types. A unified and expressive model of data visualization scenes for different applications has been missing. To fill this gap, we present Manipulable Semantic Components (MSC), a computational representation of data visualization scenes, to support applications in scene understanding and augmentation. MSC consists of two parts: a unified object model describing the structure of a visualization scene in terms of semantic components, and a set of operations to generate and modify the scene components. We demonstrate the benefits of MSC in three applications: visualization authoring, visualization deconstruction and reuse, and animation specification. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: To appear at IEEE Transactions on Visualization & Computer Graphics (Proceedings IEEE VIS'24), 2025. The paper has been selected for the VIS 2024 Honorable Mention award

arXiv:2408.04224 [pdf, other]

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

Authors: Ahmad Arrabi, Xiaohan Zhang, Waqas Sultani, Chen Chen, Safwan Wshah

Abstract: Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to synthesize aerial images from easily collectible ground images. However, G2A is rarely studied, because of its challenges, including but not limited… ▽ More Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to synthesize aerial images from easily collectible ground images. However, G2A is rarely studied, because of its challenges, including but not limited to, the drastic view changes, occlusion, and range of visibility. In this paper, we present a novel Geometric Preserving Ground-to-Aerial (G2A) image synthesis (GPG2A) model that can generate realistic aerial images from ground images. GPG2A consists of two stages. The first stage predicts the Bird's Eye View (BEV) segmentation (referred to as the BEV layout map) from the ground image. The second stage synthesizes the aerial image from the predicted BEV layout map and text descriptions of the ground image. To train our model, we present a new multi-modal cross-view dataset, namely VIGORv2 which is built upon VIGOR with newly collected aerial images, maps, and text descriptions. Our extensive experiments illustrate that GPG2A synthesizes better geometry-preserved aerial images than existing models. We also present two applications, data augmentation for cross-view geo-localization and sketch-based region search, to further verify the effectiveness of our GPG2A. The code and data will be publicly available. △ Less

Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.03608 [pdf, other]

Mixstyle-Entropy: Domain Generalization with Causal Intervention and Perturbation

Authors: Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Xinghao Ding, Yue Huang

Abstract: Despite the considerable advancements achieved by deep neural networks, their performance tends to degenerate when the test environment diverges from the training ones. Domain generalization (DG) solves this issue by learning representations independent of domain-related information, thus facilitating extrapolation to unseen environments. Existing approaches typically focus on formulating tailored… ▽ More Despite the considerable advancements achieved by deep neural networks, their performance tends to degenerate when the test environment diverges from the training ones. Domain generalization (DG) solves this issue by learning representations independent of domain-related information, thus facilitating extrapolation to unseen environments. Existing approaches typically focus on formulating tailored training objectives to extract shared features from the source data. However, the disjointed training and testing procedures may compromise robustness, particularly in the face of unforeseen variations during deployment. In this paper, we propose a novel and holistic framework based on causality, named InPer, designed to enhance model generalization by incorporating causal intervention during training and causal perturbation during testing. Specifically, during the training phase, we employ entropy-based causal intervention (EnIn) to refine the selection of causal variables. To identify samples with anti-interference causal variables from the target domain, we propose a novel metric, homeostatic score, through causal perturbation (HoPer) to construct a prototype classifier in test time. Experimental results across multiple cross-domain tasks confirm the efficacy of InPer. △ Less

Submitted 22 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: Accepted by BMVC2024

arXiv:2408.02965 [pdf, other]

Data-Driven Stochastic Closure Modeling via Conditional Diffusion Model and Neural Operator

Authors: Xinghao Dong, Chuanqi Chen, Jin-Long Wu

Abstract: Closure models are widely used in simulating complex multiscale dynamical systems such as turbulence and the earth system, for which direct numerical simulation that resolves all scales is often too expensive. For those systems without a clear scale separation, deterministic and local closure models often lack enough generalization capability, which limits their performance in many real-world appl… ▽ More Closure models are widely used in simulating complex multiscale dynamical systems such as turbulence and the earth system, for which direct numerical simulation that resolves all scales is often too expensive. For those systems without a clear scale separation, deterministic and local closure models often lack enough generalization capability, which limits their performance in many real-world applications. In this work, we propose a data-driven modeling framework for constructing stochastic and non-local closure models via conditional diffusion model and neural operator. Specifically, the Fourier neural operator is incorporated into a score-based diffusion model, which serves as a data-driven stochastic closure model for complex dynamical systems governed by partial differential equations (PDEs). We also demonstrate how accelerated sampling methods can improve the efficiency of the data-driven stochastic closure model. The results show that the proposed methodology provides a systematic approach via generative machine learning techniques to construct data-driven stochastic closure models for multiscale dynamical systems with continuous spatiotemporal fields. △ Less

Submitted 6 August, 2024; originally announced August 2024.

MSC Class: 68T01

arXiv:2408.02615 [pdf, other]

LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba

Authors: Yunxiang Fu, Chaoqi Chen, Yizhou Yu

Abstract: Recent Transformer-based diffusion models have shown remarkable performance, largely attributed to the ability of the self-attention mechanism to accurately capture both global and local contexts by computing all-pair interactions among input tokens. However, their quadratic complexity poses significant computational challenges for long-sequence inputs. Conversely, a recent state space model calle… ▽ More Recent Transformer-based diffusion models have shown remarkable performance, largely attributed to the ability of the self-attention mechanism to accurately capture both global and local contexts by computing all-pair interactions among input tokens. However, their quadratic complexity poses significant computational challenges for long-sequence inputs. Conversely, a recent state space model called Mamba offers linear complexity by compressing a filtered global context into a hidden state. Despite its efficiency, compression inevitably leads to information loss of fine-grained local dependencies among tokens, which are crucial for effective visual generative modeling. Motivated by these observations, we introduce Local Attentional Mamba (LaMamba) blocks that combine the strengths of self-attention and Mamba, capturing both global contexts and local details with linear complexity. Leveraging the efficient U-Net architecture, our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution, all while utilizing substantially fewer GFLOPs and a comparable number of parameters. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62\% GFLOPs compared to DiT-XL/2, while achieving superior performance with comparable or fewer parameters. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.02179 [pdf]

doi 10.1109/ICSSES62373.2024.10561274

X.509 Information Security Certification Based on Post-Quantum Cryptography

Authors: Abel C. H. Chen

Abstract: In recent years, with the advancement of quantum computing, mainstream asymmetric cryptographic methods in the current Public Key Infrastructure (PKI) systems are gradually being threatened. Therefore, this study explores X.509 security certificates based on Post-Quantum Cryptography (PQC) and discusses implemented solutions. This study compares mainstream asymmetric cryptographic methods (includi… ▽ More In recent years, with the advancement of quantum computing, mainstream asymmetric cryptographic methods in the current Public Key Infrastructure (PKI) systems are gradually being threatened. Therefore, this study explores X.509 security certificates based on Post-Quantum Cryptography (PQC) and discusses implemented solutions. This study compares mainstream asymmetric cryptographic methods (including RSA and Elliptic Curve Digital Signature Algorithm (ECDSA)) with standard PQC methods (including Falcon, Dilithium, SPHINCS+), comparing the efficiency of certificate generation, signature generation, and signature verification. Finally, recommendations for a solution based on PQC for X.509 security certificates are proposed. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: The manuscript was submitted to arXiv on 6 May 2024, but it was rejected on 11 July 2024. The appeal was submitted on 11 July 2024, and it was accepted on 2 August 2024. The manuscript is written in Chinese language

Showing 1–50 of 2,443 results for author: Chen, C