-
Existence of 5 minimal tori in 3-spheres of positive Ricci curvature
Authors:
Adrian Chun-Pong Chu,
Yangyang Li
Abstract:
In 1989, B. White conjectured that every Riemannian 3-sphere has at least 5 embedded minimal tori. We confirm this conjecture for 3-spheres of positive Ricci curvature. While our proof uses min-max theory, the underlying heuristics are largely inspired by mean curvature flow.
In 1989, B. White conjectured that every Riemannian 3-sphere has at least 5 embedded minimal tori. We confirm this conjecture for 3-spheres of positive Ricci curvature. While our proof uses min-max theory, the underlying heuristics are largely inspired by mean curvature flow.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Holography for Boundary Lifshitz Field Theory
Authors:
Chong-Sun Chu,
Ignacio Garrido Gonzalez,
Himanshu Parihar
Abstract:
We propose a holographic duality for the boundary Lifshitz field theory (BLFT). Similar to holographic BCFT, holographic BLFT can be consistently defined by imposing either a Neumann boundary condition (NBC) or a conformal boundary condition (CBC) on the end of the world (EOW) brane. We propose $g$-functions and derive $g$-theorem for these two types of holographic BLFT. On the field theory side,…
▽ More
We propose a holographic duality for the boundary Lifshitz field theory (BLFT). Similar to holographic BCFT, holographic BLFT can be consistently defined by imposing either a Neumann boundary condition (NBC) or a conformal boundary condition (CBC) on the end of the world (EOW) brane. We propose $g$-functions and derive $g$-theorem for these two types of holographic BLFT. On the field theory side, we consider BLFT whose path integral is prescribed to include also paths bouncing off the boundary. The entanglement entropy for an interval for the Lifshitz invariant ground state is computed in the saddle point approximation, and is found to agree precisely with the holographic result in both limits when the interval is very close or very far away from the boundary.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
LSTM-QGAN: Scalable NISQ Generative Adversarial Network
Authors:
Cheng Chu,
Aishwarya Hastak,
Fan Chen
Abstract:
Current quantum generative adversarial networks (QGANs) still struggle with practical-sized data. First, many QGANs use principal component analysis (PCA) for dimension reduction, which, as our studies reveal, can diminish the QGAN's effectiveness. Second, methods that segment inputs into smaller patches processed by multiple generators face scalability issues. In this work, we propose LSTM-QGAN,…
▽ More
Current quantum generative adversarial networks (QGANs) still struggle with practical-sized data. First, many QGANs use principal component analysis (PCA) for dimension reduction, which, as our studies reveal, can diminish the QGAN's effectiveness. Second, methods that segment inputs into smaller patches processed by multiple generators face scalability issues. In this work, we propose LSTM-QGAN, a QGAN architecture that eliminates PCA preprocessing and integrates quantum long short-term memory (QLSTM) to ensure scalable performance. Our experiments show that LSTM-QGAN significantly enhances both performance and scalability over state-of-the-art QGAN models, with visual data improvements, reduced Frechet Inception Distance scores, and reductions of 5x in qubit counts, 5x in single-qubit gates, and 12x in two-qubit gates.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model
Authors:
Luyang Luo,
Mingxiang Wu,
Mei Li,
Yi Xin,
Qiong Wang,
Varut Vardhanabhuti,
Winnie CW Chu,
Zhenhui Li,
Juan Zhou,
Pranav Rajpurkar,
Hao Chen
Abstract:
Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts…
▽ More
Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, offering a noninvasive method for personalized breast cancer management. We have curated the largest multiparametric breast MRI dataset, involving 5,205 patients from three hospitals in the north, southeast, and southwest of China, for the development and extensive evaluation of our model. MOME demonstrated accurate and robust identification of breast cancer. It achieved comparable performance for malignancy recognition to that of four senior radiologists and significantly outperformed a junior radiologist, with 0.913 AUROC, 0.948 AUPRC, 0.905 F1 score, and 0.723 MCC. Our findings suggest that MOME could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694. The model further supports scalable and interpretable inference, adapting to missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions. MOME exemplifies a discriminative, robust, scalable, and interpretable multimodal model, paving the way for noninvasive, personalized management of breast cancer patients based on multiparametric breast imaging data.
△ Less
Submitted 1 September, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?
Authors:
Chengzhi Zhong,
Fei Cheng,
Qianying Liu,
Junfeng Jiang,
Zhen Wan,
Chenhui Chu,
Yugo Murawaki,
Sadao Kurohashi
Abstract:
In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal…
▽ More
In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal $\textbf{latent languages}$.
We examine the latent language of three typical categories of models for Japanese processing: Llama2, an English-centric model; Swallow, an English-centric model with continued pre-training in Japanese; and LLM-jp, a model pre-trained on balanced English and Japanese corpora. Our empirical findings reveal that, unlike Llama2 which relies exclusively on English as the internal latent language, Japanese-specific Swallow and LLM-jp employ both Japanese and English, exhibiting dual internal latent languages. For any given target language, the model preferentially activates the latent language most closely related to it. In addition, we explore how intermediate layers respond to questions involving cultural conflicts between latent internal and target output languages. We further explore how the language identity shifts across layers while keeping consistent semantic meaning reflected in the intermediate layer representations.
This study deepens the understanding of non-English-centric large language models, highlighting the intricate dynamics of language representation within their intermediate layers.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration
Authors:
Xiaogen Zhou,
Yiyou Sun,
Min Deng,
Winnie Chiu Wing Chu,
Qi Dou
Abstract:
Multimodal learning leverages complementary information derived from different modalities, thereby enhancing performance in medical image segmentation. However, prevailing multimodal learning methods heavily rely on extensive well-annotated data from various modalities to achieve accurate segmentation performance. This dependence often poses a challenge in clinical settings due to limited availabi…
▽ More
Multimodal learning leverages complementary information derived from different modalities, thereby enhancing performance in medical image segmentation. However, prevailing multimodal learning methods heavily rely on extensive well-annotated data from various modalities to achieve accurate segmentation performance. This dependence often poses a challenge in clinical settings due to limited availability of such data. Moreover, the inherent anatomical misalignment between different imaging modalities further complicates the endeavor to enhance segmentation performance. To address this problem, we propose a novel semi-supervised multimodal segmentation framework that is robust to scarce labeled data and misaligned modalities. Our framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality, and integrates this information into a unified fusion layer for feature amalgamation. With a channel-wise semantic consistency loss, our framework ensures alignment of modality-independent information from a feature-wise perspective across modalities, thereby fortifying it against misalignments in multimodal scenarios. Furthermore, our framework effectively integrates contrastive consistent learning to regulate anatomical structures, facilitating anatomical-wise prediction alignment on unlabeled data in semi-supervised segmentation tasks. Our method achieves competitive performance compared to other multimodal methods across three tasks: cardiac, abdominal multi-organ, and thyroid-associated orbitopathy segmentations. It also demonstrates outstanding robustness in scenarios involving scarce labeled data and misaligned modalities.
△ Less
Submitted 3 September, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
StyEmp: Stylizing Empathetic Response Generation via Multi-Grained Prefix Encoder and Personality Reinforcement
Authors:
Yahui Fu,
Chenhui Chu,
Tatsuya Kawahara
Abstract:
Recent approaches for empathetic response generation mainly focus on emotional resonance and user understanding, without considering the system's personality. Consistent personality is evident in real human expression and is important for creating trustworthy systems. To address this problem, we propose StyEmp, which aims to stylize the empathetic response generation with a consistent personality.…
▽ More
Recent approaches for empathetic response generation mainly focus on emotional resonance and user understanding, without considering the system's personality. Consistent personality is evident in real human expression and is important for creating trustworthy systems. To address this problem, we propose StyEmp, which aims to stylize the empathetic response generation with a consistent personality. Specifically, it incorporates a multi-grained prefix mechanism designed to capture the intricate relationship between a system's personality and its empathetic expressions. Furthermore, we introduce a personality reinforcement module that leverages contrastive learning to calibrate the generation model, ensuring that responses are both empathetic and reflective of a distinct personality. Automatic and human evaluations on the EMPATHETICDIALOGUES benchmark show that StyEmp outperforms competitive baselines in terms of both empathy and personality expressions.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base
Authors:
Zhiyu An,
Xianzhong Ding,
Yen-Chun Fu,
Cheng-Chung Chu,
Yan Li,
Wan Du
Abstract:
This paper introduces Golden-Retriever, designed to efficiently navigate vast industrial knowledge bases, overcoming challenges in traditional LLM fine-tuning and RAG frameworks with domain-specific jargon and context interpretation. Golden-Retriever incorporates a reflection-based question augmentation step before document retrieval, which involves identifying jargon, clarifying its meaning based…
▽ More
This paper introduces Golden-Retriever, designed to efficiently navigate vast industrial knowledge bases, overcoming challenges in traditional LLM fine-tuning and RAG frameworks with domain-specific jargon and context interpretation. Golden-Retriever incorporates a reflection-based question augmentation step before document retrieval, which involves identifying jargon, clarifying its meaning based on context, and augmenting the question accordingly. Specifically, our method extracts and lists all jargon and abbreviations in the input question, determines the context against a pre-defined list, and queries a jargon dictionary for extended definitions and descriptions. This comprehensive augmentation ensures the RAG framework retrieves the most relevant documents by providing clear context and resolving ambiguities, significantly improving retrieval accuracy. Evaluations using three open-source LLMs on a domain-specific question-answer dataset demonstrate Golden-Retriever's superior performance, providing a robust solution for efficiently integrating and querying industrial knowledge bases.
△ Less
Submitted 20 July, 2024;
originally announced August 2024.
-
The Llama 3 Herd of Models
Authors:
Abhimanyu Dubey,
Abhinav Jauhri,
Abhinav Pandey,
Abhishek Kadian,
Ahmad Al-Dahle,
Aiesha Letman,
Akhil Mathur,
Alan Schelten,
Amy Yang,
Angela Fan,
Anirudh Goyal,
Anthony Hartshorn,
Aobo Yang,
Archi Mitra,
Archie Sravankumar,
Artem Korenev,
Arthur Hinsvark,
Arun Rao,
Aston Zhang,
Aurelien Rodriguez,
Austen Gregerson,
Ava Spataru,
Baptiste Roziere,
Bethany Biron,
Binh Tang
, et al. (510 additional authors not shown)
Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical…
▽ More
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
△ Less
Submitted 15 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective
Authors:
Huu Tan Mai,
Cuong Xuan Chu,
Heiko Paulheim
Abstract:
Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL). However, it has not effectively been verified whether their success is due to their ability to reason…
▽ More
Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL). However, it has not effectively been verified whether their success is due to their ability to reason over unstructured or semi-structured data, or their effective learning of linguistic patterns and senses alone. This unresolved question is particularly crucial when dealing with domain-specific data, where the lexical senses and their meaning can completely differ from what a LLM has learned during its training stage. This paper investigates the following question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning? To answer this question and, we devise a controlled experiment setup that uses WordNet to synthesize parallel corpora, with English and gibberish terms. We examine the differences in the outputs of LLMs for each corpus in two OL tasks: relation extraction and taxonomy discovery. Empirical results show that, while adapting to the gibberish corpora, off-the-shelf LLMs do not consistently reason over semantic relationships between concepts, and instead leverage senses and their frame. However, fine-tuning improves the performance of LLMs on lexical semantic tasks even when the domain-specific terms are arbitrary and unseen during pre-training, hinting at the applicability of pre-trained LLMs for OL.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
The IBEX Knowledge-Base: Achieving more together with open science
Authors:
Andrea J. Radtke,
Ifeanyichukwu Anidi,
Leanne Arakkal,
Armando Arroyo-Mejias,
Rebecca T. Beuschel,
Katy Borner,
Colin J. Chu,
Beatrice Clark,
Menna R. Clatworthy,
Jake Colautti,
Joshua Croteau,
Saven Denha,
Rose Dever,
Walderez O. Dutra,
Sonja Fritzsche,
Spencer Fullam,
Michael Y. Gerner,
Anita Gola,
Kenneth J. Gollob,
Jonathan M. Hernandez,
Jyh Liang Hor,
Hiroshi Ichise,
Zhixin Jing,
Danny Jonigk,
Evelyn Kandov
, et al. (33 additional authors not shown)
Abstract:
Iterative Bleaching Extends multipleXity (IBEX) is a versatile method for highly multiplexed imaging of diverse tissues. Based on open science principles, we created the IBEX Knowledge-Base, a resource for reagents, protocols and more, to empower innovation.
Iterative Bleaching Extends multipleXity (IBEX) is a versatile method for highly multiplexed imaging of diverse tissues. Based on open science principles, we created the IBEX Knowledge-Base, a resource for reagents, protocols and more, to empower innovation.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI
Authors:
Zhaojie Fang,
Shenghao Zhu,
Yifei Chen,
Binfeng Zou,
Fan Jia,
Linwei Qiu,
Chang Liu,
Yiyu Huang,
Xiang Feng,
Feiwei Qin,
Changmiao Wang,
Yeru Wang,
Jin Fan,
Changbiao Chu,
Wan-Zhen Wu,
Hu Zhao
Abstract:
Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that often progresses from Mild Cognitive Impairment (MCI), leading to memory loss and significantly impacting patients' lives. Clinical trials indicate that early targeted interventions for MCI patients can potentially slow or halt the development and progression of AD. Previous research has shown that accurate medical classif…
▽ More
Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that often progresses from Mild Cognitive Impairment (MCI), leading to memory loss and significantly impacting patients' lives. Clinical trials indicate that early targeted interventions for MCI patients can potentially slow or halt the development and progression of AD. Previous research has shown that accurate medical classification requires the inclusion of extensive multimodal data, such as assessment scales and various neuroimaging techniques like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). However, consistently tracking the diagnosis of the same individual over time and simultaneously collecting multimodal data poses significant challenges. To address this issue, we introduce GFE-Mamba, a classifier based on Generative Feature Extraction (GFE). This classifier effectively integrates data from assessment scales, MRI, and PET, enabling deeper multimodal fusion. It efficiently extracts both long and short sequence information and incorporates additional information beyond the pixel space. This approach not only improves classification accuracy but also enhances the interpretability and stability of the model. We constructed datasets of over 3000 samples based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) for a two-step training process. Our experimental results demonstrate that the GFE-Mamba model is effective in predicting the conversion from MCI to AD and outperforms several state-of-the-art methods. Our source code and ADNI dataset processing code are available at https://github.com/Tinysqua/GFE-Mamba.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Explaining Graph Neural Networks for Node Similarity on Graphs
Authors:
Daniel Daza,
Cuong Xuan Chu,
Trung-Kien Tran,
Daria Stepanova,
Michael Cochez,
Paul Groth
Abstract:
Similarity search is a fundamental task for exploiting information in various applications dealing with graph data, such as citation networks or knowledge graphs. While this task has been intensively approached from heuristics to graph embeddings and graph neural networks (GNNs), providing explanations for similarity has received less attention. In this work we are concerned with explainable simil…
▽ More
Similarity search is a fundamental task for exploiting information in various applications dealing with graph data, such as citation networks or knowledge graphs. While this task has been intensively approached from heuristics to graph embeddings and graph neural networks (GNNs), providing explanations for similarity has received less attention. In this work we are concerned with explainable similarity search over graphs, by investigating how GNN-based methods for computing node similarities can be augmented with explanations. Specifically, we evaluate the performance of two prominent approaches towards explanations in GNNs, based on the concepts of mutual information (MI), and gradient-based explanations (GB). We discuss their suitability and empirically validate the properties of their explanations over different popular graph benchmarks. We find that unlike MI explanations, gradient-based explanations have three desirable properties. First, they are actionable: selecting inputs depending on them results in predictable changes in similarity scores. Second, they are consistent: the effect of selecting certain inputs overlaps very little with the effect of discarding them. Third, they can be pruned significantly to obtain sparse explanations that retain the effect on similarity scores.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Authors:
Hao Feng,
Boyuan Zhang,
Fanjiang Ye,
Min Si,
Ching-Hsiang Chu,
Jiannan Tian,
Chunxing Yin,
Summer Deng,
Yuchen Hao,
Pavan Balaji,
Tong Geng,
Dingwen Tao
Abstract:
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we…
▽ More
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.
△ Less
Submitted 25 August, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior
Authors:
Chaoxing Huang,
Ziqiang Yu,
Zijian Gao,
Qiuyi Shen,
Queenie Chan,
Vincent Wai-Sun Wong,
Winnie Chiu-Wing Chu,
Weitian Chen
Abstract:
This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s…
▽ More
This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results showed close alignment between measured and reference double bond values, with phantom experiments yielding a Pearson correlation coefficient of 0.96 (p = .0005). In vivo results demonstrated good agreement in subcutaneous fat. We conclude that Deep Image Prior shows feasibility for quantifying double bonds and fatty acid content from chemical-shift encoded multi-echo MRI.
△ Less
Submitted 25 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Quantum Kerr Black Hole from Matrix Theory of Quantum Gravity
Authors:
Chong-Sun Chu
Abstract:
Recently, a quantum mechanical theory of quantum spaces described by a large $N$ non-commutative coordinates is proposed as a theory for quantum gravity [1]. In this paper, we construct Kerr black hole as a rotating noncommutative geometry solution of this theory. Due to rotation, the fuzzy sphere is deformed into a fuzzy ellipsoid, which matches exactly the outer horizon of the Kerr black hole in…
▽ More
Recently, a quantum mechanical theory of quantum spaces described by a large $N$ non-commutative coordinates is proposed as a theory for quantum gravity [1]. In this paper, we construct Kerr black hole as a rotating noncommutative geometry solution of this theory. Due to rotation, the fuzzy sphere is deformed into a fuzzy ellipsoid, which matches exactly the outer horizon of the Kerr black hole in the Boyer-Lindquist coordinates. Together with a half-filled Fermi sea, the fuzzy solution reproduces the Bekenstein-Hawking entropy as well as the mass of the Kerr black hole. These results provide further support that the proposed theory of quantum spaces is a plausible candidate for the theory of quantum gravity.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
A Matrix Model Proposal for Quantum Gravity and the Quantum Mechanics of Black Holes
Authors:
Chong-Sun Chu
Abstract:
We propose a quantum mechanical theory of quantum spaces described by large $N$ noncommutative geometry as a model for quantum gravity. The theory admits fuzzy sphere as static solution. Over the fuzzy geometry, the quantum mechanics of the fermions is given by a sum of oscillators with equal frequency. The energy state where exactly half of the Fermi sea is filled contains the maximal amount of d…
▽ More
We propose a quantum mechanical theory of quantum spaces described by large $N$ noncommutative geometry as a model for quantum gravity. The theory admits fuzzy sphere as static solution. Over the fuzzy geometry, the quantum mechanics of the fermions is given by a sum of oscillators with equal frequency. The energy state where exactly half of the Fermi sea is filled contains the maximal amount of degeneracy. This state of the fuzzy sphere obeys the mass-radius relation of a Schwarzschild black hole if the fuzzy sphere is identified with the black hole horizon. Moreover the set of states in the Fermi sea gives precisely the Bekenstein-Hawking entropy. We thus propose that quantum black holes are described by fuzzy spheres with a half-filled Fermi sea in our theory. We also consider a system of two fuzzy spheres by embedding them as blocks in the matrix quantum mechanics. When the distance $r$ between the two fuzzy spheres is small, the total energy of the system can be computed using perturbation theory. We show that in the leading order of large $N$ limit, the interaction energy depends on the product of the black holes masses and the Newton constant exactly as in Newton gravity. To extract the correct $r$ dependence in the long range, a resummation of all the large $N$ corrections is needed. We outline how Newton gravity for static sources may be reproduced in the long distance limit. We also show that the interaction energy is generally finite in the short distance limit, suggesting that the black hole singularity in general relativity is removed in our quantum description of black hole.
△ Less
Submitted 17 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge
Authors:
Yifan Mao,
Ming Li,
Jian Liu,
Jiayang Liu,
Zihan Qin,
Chunxi Chu,
Jialei Xu,
Wenbo Zhao,
Junjun Jiang,
Xianming Liu
Abstract:
Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution…
▽ More
Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution(OoD) data. While some works try to improve the robustness of depth model under OoD data, these methods either require additional training data or lake generalizability. In this report, we introduce the DINO-SD, a novel surround-view depth estimation model. Our DINO-SD does not need additional data and has strong robustness. Our DINO-SD get the best performance in the track4 of ICRA 2024 RoboDepth Challenge.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
MELD-ST: An Emotion-aware Speech Translation Dataset
Authors:
Sirou Chen,
Sakiko Yahata,
Shuichiro Shimizu,
Zhengdong Yang,
Yihang Li,
Chenhui Chu,
Sadao Kurohashi
Abstract:
Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline e…
▽ More
Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
OFHE: An Electro-Optical Accelerator for Discretized TFHE
Authors:
Mengxin Zheng,
Cheng Chu,
Qian Lou,
Nathan Youngblood,
Mo Li,
Sajjad Moazeni,
Lei Jiang
Abstract:
This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrappings. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomia…
▽ More
This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrappings. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomial multiplications, which can be time-consuming. Existing TFHE accelerators are not easily upgradable to support DTFHE operations due to limited datapaths, a lack of datapath bit-width reconfigurability, and power inefficiencies when processing FFT and inverse FFT (IFFT) kernels. Compared to prior TFHE accelerators, OFHE addresses these challenges by improving the DTFHE operation latency by 8.7\%, the DTFHE operation throughput by $57\%$, and the DTFHE operation throughput per Watt by $94\%$.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Authors:
Lingdong Kong,
Shaoyuan Xie,
Hanjiang Hu,
Yaru Niu,
Wei Tsang Ooi,
Benoit R. Cottereau,
Lai Xing Ng,
Yuexin Ma,
Wenwei Zhang,
Liang Pan,
Kai Chen,
Ziwei Liu,
Weichao Qiu,
Wei Zhang,
Xu Cao,
Hao Lu,
Ying-Cong Chen,
Caixin Kang,
Xinning Zhou,
Chengyang Ying,
Wentao Shang,
Xingxing Wei,
Yinpeng Dong,
Bo Yang,
Shengyin Jiang
, et al. (66 additional authors not shown)
Abstract:
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c…
▽ More
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation
Authors:
Yihao Zhou,
Timothy Tin-Yan Lee,
Kelly Ka-Lee Lai,
Chonglin Wu,
Hin Ting Lau,
De Yang,
Chui-Yi Chan,
Winnie Chiu-Wing Chu,
Jack Chun-Yiu Cheng,
Tsz-Ping Lam,
Yong-Ping Zheng
Abstract:
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea…
▽ More
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.
△ Less
Submitted 6 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Search for a sub-eV sterile neutrino using Daya Bay's full dataset
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding,
Y. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis…
▽ More
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties.
No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods.
Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$.
△ Less
Submitted 20 August, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
R2D2 image reconstruction with model uncertainty quantification in radio astronomy
Authors:
Amir Aghabiglou,
Chung San Chu,
Arwa Dabbech,
Yves Wiaux
Abstract:
The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we inve…
▽ More
The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate ``R2D2 samples'', from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms; (ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension; (iii) it is characterized by a very low model uncertainty.
△ Less
Submitted 27 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2
Authors:
Yiwei Chen,
Chao Tang,
Amir Aghabiglou,
Chung San Chu,
Yves Wiaux
Abstract:
We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limita…
▽ More
We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limitation and have also proven effective, but their highly iterative nature also affects scalability. To address this scalability challenge, we leverage the "Residual-to-Residual DNN series for high-Dynamic range imaging (R2D2)" approach recently introduced in astronomical imaging. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated data residual as inputs. The method can be interpreted as a learned version of the Matching Pursuit algorithm. We demonstrate R2D2 in simulation, considering radial k-space sampling acquisition sequences. Our preliminary results suggest that R2D2 achieves: (i) suboptimal performance compared to its unrolled incarnation R2D2-Net, which is however non-scalable due to the necessary embedding of NUFFT-based data-consistency layers; (ii) superior reconstruction quality to a scalable version of R2D2-Net embedding an FFT-based approximation for data consistency; (iii) superior reconstruction quality to PnP, while only requiring few iterations.
△ Less
Submitted 28 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
Authors:
Hao Wang,
Tang Li,
Chenhui Chu,
Nengjun Zhu,
Rui Wang,
Pinpin Zhu
Abstract:
Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related t…
▽ More
Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
ChatGPT in Veterinary Medicine: A Practical Guidance of Generative Artificial Intelligence in Clinics, Education, and Research
Authors:
Candice P. Chu
Abstract:
ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guid…
▽ More
ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guidance and actionable examples of how generative AI can be directly utilized by veterinary professionals without a programming background. For practitioners, ChatGPT can extract patient data, generate progress notes, and potentially assist in diagnosing complex cases. Veterinary educators can create custom GPTs for student support, while students can utilize ChatGPT for exam preparation. ChatGPT can aid in academic writing tasks in research, but veterinary publishers have set specific requirements for authors to follow. Despite its transformative potential, careful use is essential to avoid pitfalls like hallucination. This review addresses ethical considerations, provides learning resources, and offers tangible examples to guide responsible implementation. Carefully selected, up-to-date links to platforms that host large language models are provided for advanced readers with programming capability. A table of key takeaways was provided to summarize this review. By highlighting potential benefits and limitations, this review equips veterinarians, educators, and researchers to harness the power of ChatGPT effectively.
△ Less
Submitted 25 February, 2024;
originally announced March 2024.
-
A geometric characterisation of real C*-algebras
Authors:
Cho-Ho Chu
Abstract:
We characterise the positive cone of a real C*-algebra geometrically. Given an open cone $Ω$ in a real Banach space $V$, with closure $\overline Ω$, we show that $Ω$ is the interior of the positive cone of a unital real C*-algebra if and only if it is a Finsler symmetric cone with an orientable extension, which is equivalent to the condition that $V$ is, in an equivalent norm, the hermitian part o…
▽ More
We characterise the positive cone of a real C*-algebra geometrically. Given an open cone $Ω$ in a real Banach space $V$, with closure $\overline Ω$, we show that $Ω$ is the interior of the positive cone of a unital real C*-algebra if and only if it is a Finsler symmetric cone with an orientable extension, which is equivalent to the condition that $V$ is, in an equivalent norm, the hermitian part of a unital real C*-algebra with positive cone $\overlineΩ$.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
QuantumLeak: Stealing Quantum Neural Networks from Cloud-based NISQ Machines
Authors:
Zhenxiao Fu,
Min Yang,
Cheng Chu,
Yilun Xu,
Gang Huang,
Fan Chen
Abstract:
Variational quantum circuits (VQCs) have become a powerful tool for implementing Quantum Neural Networks (QNNs), addressing a wide range of complex problems. Well-trained VQCs serve as valuable intellectual assets hosted on cloud-based Noisy Intermediate Scale Quantum (NISQ) computers, making them susceptible to malicious VQC stealing attacks. However, traditional model extraction techniques desig…
▽ More
Variational quantum circuits (VQCs) have become a powerful tool for implementing Quantum Neural Networks (QNNs), addressing a wide range of complex problems. Well-trained VQCs serve as valuable intellectual assets hosted on cloud-based Noisy Intermediate Scale Quantum (NISQ) computers, making them susceptible to malicious VQC stealing attacks. However, traditional model extraction techniques designed for classical machine learning models encounter challenges when applied to NISQ computers due to significant noise in current devices. In this paper, we introduce QuantumLeak, an effective and accurate QNN model extraction technique from cloud-based NISQ machines. Compared to existing classical model stealing techniques, QuantumLeak improves local VQC accuracy by 4.99\%$\sim$7.35\% across diverse datasets and VQC architectures.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
The R2D2 deep neural network series paradigm for fast precision imaging in radio astronomy
Authors:
Amir Aghabiglou,
Chung San Chu,
Arwa Dabbech,
Yves Wiaux
Abstract:
Radio-interferometric (RI) imaging entails solving high-resolution high-dynamic range inverse problems from large data volumes. Recent image reconstruction techniques grounded in optimization theory have demonstrated remarkable capability for imaging precision, well beyond CLEAN's capability. These range from advanced proximal algorithms propelled by handcrafted regularization operators, such as t…
▽ More
Radio-interferometric (RI) imaging entails solving high-resolution high-dynamic range inverse problems from large data volumes. Recent image reconstruction techniques grounded in optimization theory have demonstrated remarkable capability for imaging precision, well beyond CLEAN's capability. These range from advanced proximal algorithms propelled by handcrafted regularization operators, such as the SARA family, to hybrid plug-and-play (PnP) algorithms propelled by learned regularization denoisers, such as AIRI. Optimization and PnP structures are however highly iterative, which hinders their ability to handle the extreme data sizes expected from future instruments. To address this scalability challenge, we introduce a novel deep learning approach, dubbed "Residual-to-Residual DNN series for high-Dynamic range imaging". R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. It thus takes a hybrid structure between a PnP algorithm and a learned version of the matching pursuit algorithm that underpins CLEAN. We present a comprehensive study of our approach, featuring its multiple incarnations distinguished by their DNN architectures. We provide a detailed description of its training process, targeting a telescope-specific approach. R2D2's capability to deliver high precision is demonstrated in simulation, across a variety of image and observation settings using the Very Large Array (VLA). Its reconstruction speed is also demonstrated: with only few iterations required to clean data residuals at dynamic ranges up to 100000, R2D2 opens the door to fast precision imaging. R2D2 codes are available in the BASPLib library on GitHub.
△ Less
Submitted 1 May, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese
Authors:
Yikun Sun,
Zhen Wan,
Nobuhiro Ueda,
Sakiko Yahata,
Fei Cheng,
Chenhui Chu,
Sadao Kurohashi
Abstract:
The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos…
▽ More
The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propose an efficient self-instruct method based on GPT-4. We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality. GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data. We also construct an evaluation benchmark containing 80 questions across 8 categories, using GPT-4 to automatically assess the response quality of LLMs without human references. The empirical results suggest that the models fine-tuned on our GPT-4 self-instruct data significantly outperformed the Japanese-Alpaca across all three base pre-trained models. Our GPT-4 self-instruct data allowed the LLaMA 13B model to defeat GPT-3.5 (Davinci-003) with a 54.37\% win-rate. The human evaluation exhibits the consistency between GPT-4's assessments and human preference. Our high-quality instruction data and evaluation benchmark have been released here.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Authors:
Liang Luo,
Buyun Zhang,
Michael Tsang,
Yinbin Ma,
Ching-Hsiang Chu,
Yuxin Chen,
Shen Li,
Yuchen Hao,
Yanli Zhao,
Guna Lakshminarayanan,
Ellie Dingqiao Wen,
Jongsoo Park,
Dheevatsa Mudigere,
Maxim Naumov
Abstract:
We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global…
▽ More
We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9x speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales.
△ Less
Submitted 2 May, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
TITAN: A Distributed Large-Scale Trapped-Ion NISQ Computer
Authors:
Cheng Chu,
Zhenxiao Fu,
Yilun Xu,
Gang Huang,
Hausi Muller,
Fan Chen,
Lei Jiang
Abstract:
Trapped-Ion (TI) technology offers potential breakthroughs for Noisy Intermediate Scale Quantum (NISQ) computing. TI qubits offer extended coherence times and high gate fidelity, making them appealing for large-scale NISQ computers. Constructing such computers demands a distributed architecture connecting Quantum Charge Coupled Devices (QCCDs) via quantum matter-links and photonic switches. Howeve…
▽ More
Trapped-Ion (TI) technology offers potential breakthroughs for Noisy Intermediate Scale Quantum (NISQ) computing. TI qubits offer extended coherence times and high gate fidelity, making them appealing for large-scale NISQ computers. Constructing such computers demands a distributed architecture connecting Quantum Charge Coupled Devices (QCCDs) via quantum matter-links and photonic switches. However, current distributed TI NISQ computers face hardware and system challenges. Entangling qubits across a photonic switch introduces significant latency, while existing compilers generate suboptimal mappings due to their unawareness of the interconnection topology. In this paper, we introduce TITAN, a large-scale distributed TI NISQ computer, which employs an innovative photonic interconnection design to reduce entanglement latency and an advanced partitioning and mapping algorithm to optimize matter-link communications. Our evaluations show that TITAN greatly enhances quantum application performance by 56.6% and fidelity by 19.7% compared to existing systems.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
CityFlowER: An Efficient and Realistic Traffic Simulator with Embedded Machine Learning Models
Authors:
Longchao Da,
Chen Chu,
Weinan Zhang,
Hua Wei
Abstract:
Traffic simulation is an essential tool for transportation infrastructure planning, intelligent traffic control policy learning, and traffic flow analysis. Its effectiveness relies heavily on the realism of the simulators used. Traditional traffic simulators, such as SUMO and CityFlow, are often limited by their reliance on rule-based models with hyperparameters that oversimplify driving behaviors…
▽ More
Traffic simulation is an essential tool for transportation infrastructure planning, intelligent traffic control policy learning, and traffic flow analysis. Its effectiveness relies heavily on the realism of the simulators used. Traditional traffic simulators, such as SUMO and CityFlow, are often limited by their reliance on rule-based models with hyperparameters that oversimplify driving behaviors, resulting in unrealistic simulations. To enhance realism, some simulators have provided Application Programming Interfaces (APIs) to interact with Machine Learning (ML) models, which learn from observed data and offer more sophisticated driving behavior models. However, this approach faces challenges in scalability and time efficiency as vehicle numbers increase. Addressing these limitations, we introduce CityFlowER, an advancement over the existing CityFlow simulator, designed for efficient and realistic city-wide traffic simulation. CityFlowER innovatively pre-embeds ML models within the simulator, eliminating the need for external API interactions and enabling faster data computation. This approach allows for a blend of rule-based and ML behavior models for individual vehicles, offering unparalleled flexibility and efficiency, particularly in large-scale simulations. We provide detailed comparisons with existing simulators, implementation insights, and comprehensive experiments to demonstrate CityFlowER's superiority in terms of realism, efficiency, and adaptability.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay
Authors:
Daya Bay Collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546…
▽ More
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Authors:
Duzhen Zhang,
Yahan Yu,
Jiahua Dong,
Chenxing Li,
Dan Su,
Chenhui Chu,
Dong Yu
Abstract:
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive surve…
▽ More
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.
△ Less
Submitted 28 May, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
Authors:
Wangjin Zhou,
Zhengdong Yang,
Chenhui Chu,
Sheng Li,
Raj Dabre,
Yi Zhao,
Tatsuya Kawahara
Abstract:
Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection a…
▽ More
Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. In training data selection, we demonstrate that MOS enables effective filtering of samples from unbalanced datasets. In the model fusion, our results demonstrate that incorporating MOS as a gating mechanism in FAD model fusion enhances overall performance.
△ Less
Submitted 24 January, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Pressure-induced superconductivity in a novel germanium allotrope
Authors:
Liangzi Deng,
Jianbo Zhang,
Yuki Sakai,
Zhongjia Tang,
Moein Adnani,
Rabin Dahal,
Alexander P. Litvinchuk,
James R. Chelikowsky,
Marvin L. Cohen,
Russell J. Hemley,
Arnold Guloy,
Yang Ding,
Ching-Wu Chu
Abstract:
High-pressure studies on elements play an essential role in superconductivity research, with implications for both fundamental science and applications. Here we report the experimental discovery of surprisingly low pressure driving a novel germanium allotrope into a superconducting state in comparison to that for alpha-Ge. Raman measurements revealed structural phase transitions and possible elect…
▽ More
High-pressure studies on elements play an essential role in superconductivity research, with implications for both fundamental science and applications. Here we report the experimental discovery of surprisingly low pressure driving a novel germanium allotrope into a superconducting state in comparison to that for alpha-Ge. Raman measurements revealed structural phase transitions and possible electronic topological transitions under pressure up to 58 GPa. Based on pressure-dependent resistivity measurements, superconductivity was induced above 2 GPa and the maximum Tc of 6.8 K was observed under 4.6 GPa. Interestingly, a superconductivity enhancement was discovered during decompression, indicating the possibility of maintaining pressure-induced superconductivity at ambient pressure with better superconducting performance. Density functional theory analysis further suggested that the electronic structure of Ge (oP32) is sensitive to its detailed geometry and revealed that disorder in the beta-tin structure leads to a higher Tc in comparison to the perfect beta-tin Ge.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Distilling Vision-Language Models on Millions of Videos
Authors:
Yue Zhao,
Long Zhao,
Xingyi Zhou,
Jialin Wu,
Chun-Te Chu,
Hui Miao,
Florian Schroff,
Hartwig Adam,
Ting Liu,
Boqing Gong,
Philipp Krähenbühl,
Liangzhe Yuan
Abstract:
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-i…
▽ More
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-instruction-tuning (VIIT) is then used to auto-label millions of videos to generate high-quality captions. We show the adapted video-language model performs well on a wide range of video-language benchmarks. For instance, it surpasses the best prior result on open-ended NExT-QA by 2.8%. Besides, our model generates detailed descriptions for previously unseen videos, which provide better textual supervision than existing methods. Experiments show that a video-language dual-encoder model contrastively trained on these auto-generated captions is 3.8% better than the strongest baseline that also leverages vision-language models. Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%. As a side product, we generate the largest video caption dataset to date.
△ Less
Submitted 15 April, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Charged-current non-standard neutrino interactions at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-…
▽ More
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases.
△ Less
Submitted 19 March, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Massless Lifshitz Field Theory for Arbitrary $z$
Authors:
Jaydeep Kumar Basak,
Adrita Chakraborty,
Chong-Sun Chu,
Dimitrios Giataganas,
Himanshu Parihar
Abstract:
By using the notion of fractional derivatives, we introduce a class of massless Lifshitz scalar field theory in (1+1)-dimension with an arbitrary anisotropy index $z$. The Lifshitz scale invariant ground state of the theory is constructed explicitly and takes the form of Rokhsar-Kivelson (RK). We show that there is a continuous family of ground states with degeneracy parameterized by the choice of…
▽ More
By using the notion of fractional derivatives, we introduce a class of massless Lifshitz scalar field theory in (1+1)-dimension with an arbitrary anisotropy index $z$. The Lifshitz scale invariant ground state of the theory is constructed explicitly and takes the form of Rokhsar-Kivelson (RK). We show that there is a continuous family of ground states with degeneracy parameterized by the choice of solution to the equation of motion of an auxiliary classical system. The quantum mechanical path integral establishes a 2d/1d correspondence with the equal time correlation functions of the Lifshitz scalar field theory. We study the entanglement properties of the Lifshitz theory for arbitrary $z$ using the path integral representation. The entanglement measures are expressed in terms of certain cross ratio functions we specify, and satisfy the $c$-function monotonicity theorems. We also consider the holographic description of the Lifshitz theory. In order to match with the field theory result for the entanglement entropy, we propose a $z$-dependent radius scale for the Lifshitz background. This relation is consistent with the $z$-dependent scaling symmetry respected by the Lifshitz vacuum. Furthermore, the time-like entanglement entropy is determined using holography. Our result suggests that there should exist a fundamental definition of time-like entanglement other than employing analytic continuation as performed in relativistic field theory.
△ Less
Submitted 16 July, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Blockchain Smart Contract Threat Detection Technology Based on Symbolic Execution
Authors:
Chang Chu
Abstract:
The security of smart contracts, which are an important part of blockchain technology, has attracted much attention. In particular, reentrancy vulnerability, which is hidden and complex, poses a great threat to smart contracts. In order to improve the existing detection methods, which exhibit low efficiency and accuracy, in this paper, we propose a smart contract threat detection technology based…
▽ More
The security of smart contracts, which are an important part of blockchain technology, has attracted much attention. In particular, reentrancy vulnerability, which is hidden and complex, poses a great threat to smart contracts. In order to improve the existing detection methods, which exhibit low efficiency and accuracy, in this paper, we propose a smart contract threat detection technology based on symbolic execution. In this method, first, the recursive descent algorithm is used to recover the basic blocks of contract code and control flow diagram, and static type inference is performed for static single assignment (SSA) variables. Then, the control flow diagram is encoded into constrained horn clause (CHC) constraints in combination with the symbolic execution technology. Model checking is conducted for the generated constraints using an automatic theorem prover based on the abstraction refinement technique for fast static detection of common security threats in smart contracts. Compared with existing detection methods, the method proposed in this paper allows the detection of both the checks-effects-interactions pattern and the vulnerability in relation to reentrant locks. It can simulate the state changes of reentrant locks as well as other global variables in multiple recursive transactions. The experimental results show that this method significantly increases both detection efficiency and accuracy, improving the security of smart contracts.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection
Authors:
Wenqing Wei,
Zhengdong Yang,
Yuan Gao,
Jiyi Li,
Chenhui Chu,
Shogo Okada,
Sheng Li
Abstract:
The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, i…
▽ More
The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, its distributed learning process also causes performance reduction. To alleviate this problem while protecting user privacy, we propose a federated contrastive pre-training (FedCPC) performed before federated training for AD speech detection, which can learn a better representation from raw data and enables different clients to share data in the pre-training and training stages. Experimental results demonstrate that the proposed methods can achieve satisfactory performance while preserving data privacy.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Room-temperature ferromagnetism in epitaxial bilayer FeSb/SrTiO3(001) terminated with a Kagome lattice
Authors:
Huimin Zhang,
Qinxi Liu,
Liangzi Deng,
Yanjun Ma,
Samira Daneshmandi,
Cheng Cen,
Chenyu Zhang,
Paul M. Voyles,
Xue Jiang,
Jijun Zhao,
Ching-Wu Chu,
Zheng Gai,
Lian Li
Abstract:
Two-dimensional (2D) magnets exhibit unique physical properties for potential applications in spintronics. To date, most 2D ferromagnets are obtained by mechanical exfoliation of bulk materials with van der Waals interlayer interactions, and the synthesis of single or few-layer 2D ferromagnets with strong interlayer coupling remains experimentally challenging. Here, we report the epitaxial growth…
▽ More
Two-dimensional (2D) magnets exhibit unique physical properties for potential applications in spintronics. To date, most 2D ferromagnets are obtained by mechanical exfoliation of bulk materials with van der Waals interlayer interactions, and the synthesis of single or few-layer 2D ferromagnets with strong interlayer coupling remains experimentally challenging. Here, we report the epitaxial growth of 2D non-van der Waals ferromagnetic bilayer FeSb on SrTiO3(001) substrates stabilized by strong coupling to the substrate, which exhibits in-plane magnetic anisotropy and a Curie temperature above 300 K. In-situ low-temperature scanning tunneling microscopy/spectroscopy and density-functional theory calculations further reveal that a Fe Kagome layer terminates the bilayer FeSb. Our results open a new avenue for further exploring emergent quantum phenomena from the interplay of ferromagnetism and topology for application in spintronics.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts
Authors:
Haiyue Song,
Raj Dabre,
Chenhui Chu,
Atsushi Fujita,
Sadao Kurohashi
Abstract:
Lecture transcript translation helps learners understand online courses, however, building a high-quality lecture machine translation system lacks publicly available parallel corpora. To address this, we examine a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. To create the parallel corpora, we…
▽ More
Lecture transcript translation helps learners understand online courses, however, building a high-quality lecture machine translation system lacks publicly available parallel corpora. To address this, we examine a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences. The sentence alignment F1 score reaches 96%, which is higher than using the BERTScore, LASER, or sentBERT methods. For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets through manual filtering for benchmarking translation performance. Through machine translation experiments, we show that the mined corpora enhance the quality of lecture transcript translation when used in conjunction with out-of-domain parallel corpora via multistage fine-tuning. Furthermore, this study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits. For the sake of reproducibility, we have released the corpora as well as the code to create them. The dataset is available at https://github.com/shyyhs/CourseraParallelCorpusMining.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Replication and study of anomalies in LK-99--the alleged ambient-pressure, room-temperature superconductor
Authors:
T. Habamahoro,
T. Bontke,
M. Chirom,
Z. Wu,
J. M. Bao,
L. Z. Deng,
C. W. Chu
Abstract:
We have studied LK-99 [Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O], alleged by Lee et al. to exhibit superconductivity above room temperature and at ambient pressure, and have reproduced all anomalies in electric and magnetic measurements that they reported as evidence for the claim of LK-99 being an ambient-pressure, room-temperature superconductor. We found that these anomalies are associated with the struc…
▽ More
We have studied LK-99 [Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O], alleged by Lee et al. to exhibit superconductivity above room temperature and at ambient pressure, and have reproduced all anomalies in electric and magnetic measurements that they reported as evidence for the claim of LK-99 being an ambient-pressure, room-temperature superconductor. We found that these anomalies are associated with the structural transition of the Cu$_2$S impurity in their sample and not with superconductivity.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Video-Helpful Multimodal Machine Translation
Authors:
Yihang Li,
Shuichiro Shimizu,
Chenhui Chu,
Sadao Kurohashi,
Wei Li
Abstract:
Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily c…
▽ More
Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/video-helpful-MMT.git.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Authors:
Hao Wang,
Qingxuan Wang,
Yue Li,
Changqing Wang,
Chenhui Chu,
Rui Wang
Abstract:
The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD d…
▽ More
The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Authors:
Hao Wang,
Xiahua Chen,
Rui Wang,
Chenhui Chu
Abstract:
Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resu…
▽ More
Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation by enhancing the model's ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.