-
Optimal Dispatch Strategy for a Multi-microgrid Cooperative Alliance Using a Two-Stage Pricing Mechanism
Authors:
Yonghui Nie,
Zhi Li,
Jie Zhang,
Lei Gao,
Yang Li,
Hengyu Zhou
Abstract:
To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a found…
▽ More
To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a foundation for collaborative scheduling. The two-stage pricing mechanism initiates with a leader-follower game, wherein the microgrid operator acts as the leader and users as followers. Subsequently, it adjusts EV tariffs based on the game's equilibrium, taking into account factors such as battery degradation and travel needs to optimize EVs' electricity consumption. Furthermore, a bi-level optimization model refines power interactions and pricing strategies across the network, significantly enhancing demand response capabilities and economic outcomes. Simulation results demonstrate that this strategy not only increases renewable energy consumption but also reduces energy costs, thereby improving the overall efficiency and sustainability of the system.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents
Authors:
Zhiqiang Wang,
Hao Zheng,
Yunshuang Nie,
Wenjun Xu,
Qingwei Wang,
Hua Ye,
Zhe Li,
Kaidong Zhang,
Xuewen Cheng,
Wanxi Dong,
Chang Cai,
Liang Lin,
Feng Zheng,
Xiaodan Liang
Abstract:
Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of…
▽ More
Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by offering a unified data format, comprehensive sensory modalities, and a combination of real-world and simulated data. ARIO aims to improve the training of embodied AI agents, increasing their robustness and adaptability across various tasks and environments. Building upon the proposed new standard, we present a large-scale unified ARIO dataset, comprising approximately 3 million episodes collected from 258 series and 321,064 tasks. The ARIO standard and dataset represent a significant step towards bridging the gaps of existing data resources. By providing a cohesive framework for data collection and representation, ARIO paves the way for the development of more powerful and versatile embodied AI agents, capable of navigating and interacting with the physical world in increasingly complex and diverse ways. The project is available on https://imaei.github.io/project_pages/ario/
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Unlocking the Power of LSTM for Long Term Time Series Forecasting
Authors:
Yaxuan Kong,
Zepu Wang,
Yuqi Nie,
Tian Zhou,
Stefan Zohren,
Yuxuan Liang,
Peng Sun,
Qingsong Wen
Abstract:
Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is…
▽ More
Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results. Furthermore, we provide theoretical justifications for our design, and conduct extensive comparative and analytical experiments to fully validate the efficiency and superior performance of our model.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Moonshine: Distilling Game Content Generators into Steerable Generative Models
Authors:
Yuhe Nie,
Michael Middleton,
Tim Merino,
Nidhushan Kanagaraja,
Ashutosh Kumar,
Zhan Zhuang,
Julian Togelius
Abstract:
Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We…
▽ More
Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
A Novel Generative Artificial Intelligence Method for Interference Study on Multiplex Brightfield Immunohistochemistry Images
Authors:
Satarupa Mukherjee,
Jim Martin,
Yao Nie
Abstract:
Multiplex brightfield imaging offers the advantage of simultaneously analyzing multiple biomarkers on a single slide, as opposed to single biomarker labeling on multiple consecutive slides. To accurately analyze multiple biomarkers localized at the same cellular compartment, two representative biomarker sets were selected as assay models - cMET-PDL1-EGFR and CD8-LAG3-PDL1, where all three biomarke…
▽ More
Multiplex brightfield imaging offers the advantage of simultaneously analyzing multiple biomarkers on a single slide, as opposed to single biomarker labeling on multiple consecutive slides. To accurately analyze multiple biomarkers localized at the same cellular compartment, two representative biomarker sets were selected as assay models - cMET-PDL1-EGFR and CD8-LAG3-PDL1, where all three biomarkers can co-localize on the cell membrane. One of the most crucial preliminary stages for analyzing such assay is identifying each unique chromogen on individual cells. This is a challenging problem due to the co-localization of membrane stains from all the three biomarkers. It requires advanced color unmixing for creating the equivalent singleplex images from each triplex image for each biomarker.
In this project, we developed a cycle-Generative Adversarial Network (cycle-GAN) method for unmixing the triplex images generated from the above-mentioned assays. Three different models were designed to generate the singleplex image for each of the three stains Tamra (purple), QM-Dabsyl (yellow) and Green. A notable novelty of our approach was that the input to the network were images in the optical density domain instead of conventionally used RGB images. The use of the optical density domain helped in reducing the blurriness of the synthetic singleplex images, which was often observed when the network was trained on RGB images.
The cycle-GAN models were validated on 10,800 lung, gastric and colon images for the cMET-PDL1-EGFR assay and 3600 colon images for the CD8-LAG3-PDL1 assay. Visual as well as quantified assessments demonstrated that the proposed method is effective and efficient when compared with the manual reviewing results and is readily applicable to various multiplex assays.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
A 103-TOPS/mm$^2$ Integrated Photonic Computing Engine Enabling Next-Generation Reservoir Computing
Authors:
Dongliang Wang,
Yikun Nie,
Gaolei Hu,
Hon Ki Tsang,
Chaoran Huang
Abstract:
Reservoir computing (RC) is a leading machine learning algorithm for information processing due to its rich expressiveness. A new RC paradigm has recently emerged, showcasing superior performance and delivering more interpretable results with shorter training data sets and training times, representing the next generation of RC computing. This work presents the first realization of a high-speed nex…
▽ More
Reservoir computing (RC) is a leading machine learning algorithm for information processing due to its rich expressiveness. A new RC paradigm has recently emerged, showcasing superior performance and delivering more interpretable results with shorter training data sets and training times, representing the next generation of RC computing. This work presents the first realization of a high-speed next-generation RC system on an integrated photonic chip. Our experimental results demonstrate state-of-the-art forecasting and classification performances under various machine learning tasks and achieve the fastest speeds of 60 Gbaud and a computing density of 103 tera operations/second/mm$^2$ (TOPS/mm$^2$). The passive system, composed of a simple star coupler with on-chip delay lines, offers several advantages over traditional RC systems, including no speed limitations, compact footprint, extremely high fabrication error tolerance, fewer metaparameters, and greater interpretability. This work lays the foundation for ultrafast on-chip photonic RC, representing significant progress toward developing next-generation high-speed photonic computing and signal processing.
△ Less
Submitted 31 May, 2024;
originally announced July 2024.
-
Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey
Authors:
Han Wang,
Yuman Nie,
Yun Li,
Hongjie Liu,
Min Liu,
Wen Cheng,
Yaoxiong Wang
Abstract:
Event-based cameras, inspired by the biological retina, have evolved into cutting-edge sensors distinguished by their minimal power requirements, negligible latency, superior temporal resolution, and expansive dynamic range. At present, cameras used for pedestrian detection are mainly frame-based imaging sensors, which have suffered from lethargic response times and hefty data redundancy. In contr…
▽ More
Event-based cameras, inspired by the biological retina, have evolved into cutting-edge sensors distinguished by their minimal power requirements, negligible latency, superior temporal resolution, and expansive dynamic range. At present, cameras used for pedestrian detection are mainly frame-based imaging sensors, which have suffered from lethargic response times and hefty data redundancy. In contrast, event-based cameras address these limitations by eschewing extraneous data transmissions and obviating motion blur in high-speed imaging scenarios. On pedestrian detection via event-based cameras, this paper offers an exhaustive review of research and applications particularly in the autonomous driving context. Through methodically scrutinizing relevant literature, the paper outlines the foundational principles, developmental trajectory, and the comparative merits and demerits of eventbased detection relative to traditional frame-based methodologies. This review conducts thorough analyses of various event stream inputs and their corresponding network models to evaluate their applicability across diverse operational environments. It also delves into pivotal elements such as crucial datasets and data acquisition techniques essential for advancing this technology, as well as advanced algorithms for processing event stream data. Culminating with a synthesis of the extant landscape, the review accentuates the unique advantages and persistent challenges inherent in event-based pedestrian detection, offering a prognostic view on potential future developments in this fast-progressing field.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
Authors:
Ying Nie,
Binwei Yan,
Tianyu Guo,
Hao Liu,
Haoyu Wang,
Wei He,
Binfan Zheng,
Weihao Wang,
Qiang Li,
Weijian Sun,
Yunhe Wang,
Dacheng Tao
Abstract:
Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b…
▽ More
Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to better align with the career trajectory of Chinese financial practitioners, we build a systematic evaluation from 4 first-level categories: (1) Financial Subject: whether LLMs can memorize the necessary basic knowledge of financial subjects, such as economics, statistics and auditing. (2) Financial Qualification: whether LLMs can obtain the needed financial qualified certifications, such as certified public accountant, securities qualification and banking qualification. (3) Financial Practice: whether LLMs can fulfill the practical financial jobs, such as tax consultant, junior accountant and securities analyst. (4) Financial Law: whether LLMs can meet the requirement of financial laws and regulations, such as tax law, insurance law and economic law. CFinBench comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. We conduct extensive experiments of 50 representative LLMs with various model size on CFinBench. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%, highlighting the challenge presented by CFinBench. The dataset and evaluation code are available at https://cfinbench.github.io/.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Authors:
Yuqi Nie,
Yaxuan Kong,
Xiaowen Dong,
John M. Mulvey,
H. Vincent Poor,
Qingsong Wen,
Stefan Zohren
Abstract:
Recent advances in large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain. These models have demonstrated remarkable capabilities in understanding context, processing vast amounts of data, and generating human-preferred contents. In this survey, we explore the application of LLMs on various financial tasks, focusing on their potenti…
▽ More
Recent advances in large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain. These models have demonstrated remarkable capabilities in understanding context, processing vast amounts of data, and generating human-preferred contents. In this survey, we explore the application of LLMs on various financial tasks, focusing on their potential to transform traditional practices and drive innovation. We provide a discussion of the progress and advantages of LLMs in financial contexts, analyzing their advanced technologies as well as prospective capabilities in contextual understanding, transfer learning flexibility, complex emotion detection, etc. We then highlight this survey for categorizing the existing literature into key application areas, including linguistic tasks, sentiment analysis, financial time series, financial reasoning, agent-based modeling, and other applications. For each application area, we delve into specific methodologies, such as textual analysis, knowledge-based analysis, forecasting, data augmentation, planning, decision support, and simulations. Furthermore, a comprehensive collection of datasets, model assets, and useful codes associated with mainstream applications are presented as resources for the researchers and practitioners. Finally, we outline the challenges and opportunities for future research, particularly emphasizing a number of distinctive aspects in this field. We hope our work can help facilitate the adoption and further development of LLMs in the financial sector.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Resilience patterns in higher-order meta-population networks
Authors:
Yanyi Nie,
Yanbing Liu,
Qixuan Cao,
Tao Lin,
Wei Wang
Abstract:
Meta-population networks are effective tools for capturing population movement across distinct regions, but the assumption of well-mixed regions fails to capture the reality of population higher-order interactions. As a multidimensional system capturing mobility characteristics, meta-population networks are inherently complex and difficult to interpret when subjected to resilience analysis based o…
▽ More
Meta-population networks are effective tools for capturing population movement across distinct regions, but the assumption of well-mixed regions fails to capture the reality of population higher-order interactions. As a multidimensional system capturing mobility characteristics, meta-population networks are inherently complex and difficult to interpret when subjected to resilience analysis based on N-dimensional equations. We propose a higher-order meta-population model that captures large-scale global cross-regional mobility and small-scale higher-order interactions within regions. Remarkably, we extend the dimension-reduction approach, simplifying the N-dimensional higher-order meta-population system into a one-dimensional equation by decomposing different network behaviours into a single universal resilience function, thereby allowing for convenient and accurate prediction of the system resilience. The network structure and human mobility parameters can clearly and simply express the epidemic threshold. Numerical experimental results on both real networks and star networks confirm the accuracy of the proposed dimension-reduction framework in predicting the evolution of epidemic dynamics on higher-order meta-population networks. Additionally, higher-order interactions among populations are shown to lead to explosive growth in the epidemic infection size potentially. Population mobility causes changes in the spatial distribution of infectious diseases across different regions.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
Authors:
Xuan Chen,
Yuzhou Nie,
Lu Yan,
Yunshu Mao,
Wenbo Guo,
Xiangyu Zhang
Abstract:
Modern large language model (LLM) developers typically conduct a safety alignment to prevent an LLM from generating unethical or harmful content. Recent studies have discovered that the safety alignment of LLMs can be bypassed by jailbreaking prompts. These prompts are designed to create specific conversation scenarios with a harmful question embedded. Querying an LLM with such prompts can mislead…
▽ More
Modern large language model (LLM) developers typically conduct a safety alignment to prevent an LLM from generating unethical or harmful content. Recent studies have discovered that the safety alignment of LLMs can be bypassed by jailbreaking prompts. These prompts are designed to create specific conversation scenarios with a harmful question embedded. Querying an LLM with such prompts can mislead the model into responding to the harmful question. The stochastic and random nature of existing genetic methods largely limits the effectiveness and efficiency of state-of-the-art (SOTA) jailbreaking attacks. In this paper, we propose RL-JACK, a novel black-box jailbreaking attack powered by deep reinforcement learning (DRL). We formulate the generation of jailbreaking prompts as a search problem and design a novel RL approach to solve it. Our method includes a series of customized designs to enhance the RL agent's learning efficiency in the jailbreaking context. Notably, we devise an LLM-facilitated action space that enables diverse action variations while constraining the overall search space. We propose a novel reward function that provides meaningful dense rewards for the agent toward achieving successful jailbreaking. Through extensive evaluations, we demonstrate that RL-JACK is overall much more effective than existing jailbreaking attacks against six SOTA LLMs, including large open-source models and commercial models. We also show the RL-JACK's resiliency against three SOTA defenses and its transferability across different models. Finally, we validate the insensitivity of RL-JACK to the variations in key hyper-parameters.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Authors:
Xuan Chen,
Yuzhou Nie,
Wenbo Guo,
Xiangyu Zhang
Abstract:
Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to ``fool'' LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. More advanced attacks utilize genetic algorithms for automatic and black-box attacks. However, the random nature of genetic algorithms significantly limits the effe…
▽ More
Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to ``fool'' LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. More advanced attacks utilize genetic algorithms for automatic and black-box attacks. However, the random nature of genetic algorithms significantly limits the effectiveness of these attacks. In this paper, we propose RLbreaker, a black-box jailbreaking attack driven by deep reinforcement learning (DRL). We model jailbreaking as a search problem and design an RL agent to guide the search, which is more effective and has less randomness than stochastic search, such as genetic algorithms. Specifically, we design a customized DRL system for the jailbreaking problem, including a novel reward function and a customized proximal policy optimization (PPO) algorithm. Through extensive experiments, we demonstrate that RLbreaker is much more effective than existing jailbreaking attacks against six state-of-the-art (SOTA) LLMs. We also show that RLbreaker is robust against three SOTA defenses and its trained agents can transfer across different LLMs. We further validate the key design choices of RLbreaker via a comprehensive ablation study.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
On Onsager's type conjecture for the inviscid Boussinesq equations
Authors:
Changxing Miao,
Yao Nie,
Weikui Ye
Abstract:
In this paper, we investigate the Cauchy problem for the three dimensional inviscid Boussinesq system in the periodic setting. For $1\le p\le \infty$, we show that the threshold regularity exponent for $L^p$-norm conservation of temperature of this system is $1/3$, consistent with Onsager exponent. More precisely, for $1\le p\le\infty$, every weak solution $(v,θ)\in C_tC^β_x$ to the inviscid Bouss…
▽ More
In this paper, we investigate the Cauchy problem for the three dimensional inviscid Boussinesq system in the periodic setting. For $1\le p\le \infty$, we show that the threshold regularity exponent for $L^p$-norm conservation of temperature of this system is $1/3$, consistent with Onsager exponent. More precisely, for $1\le p\le\infty$, every weak solution $(v,θ)\in C_tC^β_x$ to the inviscid Boussinesq equations satisfies that $\|θ(t)\|_{L^p(\mathbb{T}^3)}=\|θ_0\|_{L^p(\mathbb{T}^3)}$ if $β>\frac{1}{3}$, while if $β<\frac{1}{3}$, there exist infinitely many weak solutions $(v,θ)\in C_tC^β_x$ such that the $L^p$-norm of temperature is not conserved. As a byproduct, we are able to construct many weak solutions in $C_tC^β_x$ for $β<\frac{1}{3}$ displaying wild behavior, such as fast kinetic energy dissipation and high oscillation of velocity. Moreover, we also show that if a weak solution $(v, θ)$ of this system has at least one interval of regularity, then this weak solution $(v,θ)$ is not unique in $C_tC^β_x$ for $β<\frac{1}{3}$.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Over-the-Air Collaborative Inference with Feature Differential Privacy
Authors:
Mohamed Seif,
Yuqi Nie,
Andrea Goldsmith,
Vincent Poor
Abstract:
Collaborative inference in next-generation networks can enhance Artificial Intelligence (AI) applications, including autonomous driving, personal identification, and activity classification. This method involves a three-stage process: a) data acquisition through sensing, b) feature extraction, and c) feature encoding for transmission. Transmission of the extracted features entails the potential ri…
▽ More
Collaborative inference in next-generation networks can enhance Artificial Intelligence (AI) applications, including autonomous driving, personal identification, and activity classification. This method involves a three-stage process: a) data acquisition through sensing, b) feature extraction, and c) feature encoding for transmission. Transmission of the extracted features entails the potential risk of exposing sensitive personal data. To address this issue, in this work a new privacy-protecting collaborative inference mechanism is developed. Under this mechanism, each edge device in the network protects the privacy of extracted features before transmitting them to a central server for inference. This mechanism aims to achieve two main objectives while ensuring effective inference performance: 1) reducing communication overhead, and 2) maintaining strict privacy guarantees during features transmission.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Correctable Landmark Discovery via Large Models for Vision-Language Navigation
Authors:
Bingqian Lin,
Yunshuang Nie,
Ziming Wei,
Yi Zhu,
Hang Xu,
Shikui Ma,
Jianzhuang Liu,
Xiaodan Liang
Abstract:
Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack s…
▽ More
Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.
△ Less
Submitted 5 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Authors:
Yuzhou. Nie,
Yanting. Wang,
Jinyuan. Jia,
Michael J. De Lucia,
Nathaniel D. Bastian,
Wenbo. Guo,
Dawn. Song
Abstract:
One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation…
▽ More
One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model parameters. This enables TrojFM to efficiently launch downstream task-agnostic backdoor attacks against very large foundation models under limited computational resources. Moreover, we optimize the fine-tuning process with our customized QLoRA technique, enabling launching our attack via only~\textit{one A100 GPU}. Furthermore, we design a new trigger injection method to ensure our attack stealthiness. Through extensive experiments, we first demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models without jeopardizing their normal functionalities (and outperforming existing attacks on BERT-style models). Furthermore, we show that TrojFM is resilient to SOTA defenses and is insensitive to changes in key hyper-parameters. Finally, we conduct a resource analysis to quantify that our method can significantly save computational and memory costs compared to existing backdoor attacks.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Safety Alignment for Vision Language Models
Authors:
Zhendong Liu,
Yuanbi Nie,
Yingshui Tan,
Xiangyu Yue,
Qiushi Cui,
Chongjun Wang,
Xiaoyong Zhu,
Bo Zheng
Abstract:
Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerable, with attackers easily bypassing LLMs' safety alignment through visual modality features to launch attacks. To address this issue, we enhance the e…
▽ More
Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerable, with attackers easily bypassing LLMs' safety alignment through visual modality features to launch attacks. To address this issue, we enhance the existing VLMs' visual modality safety alignment by adding safety modules, including a safety projector, safety tokens, and a safety head, through a two-stage training process, effectively improving the model's defense against risky images. For example, building upon the LLaVA-v1.5 model, we achieve a safety score of 8.26, surpassing the GPT-4V on the Red Teaming Visual Language Models (RTVLM) benchmark. Our method boasts ease of use, high flexibility, and strong controllability, and it enhances safety while having minimal impact on the model's general performance. Moreover, our alignment strategy also uncovers some possible risky content within commonly used open-source multimodal datasets. Our code will be open sourced after the anonymous review.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
On the superconducting gap structure of the miassite Rh17S15: Nodal or nodeless?
Authors:
J. Y. Nie,
C. C. Zhao,
C. Q. Xu,
B. Li,
C. P. Tu,
X. Zhang,
D. Z. Dai,
H. R. Wang,
S. Xu,
Wenhe Jiao,
B. M. Wang,
Zhu'an Xu,
Xiaofeng Xu,
S. Y. Li
Abstract:
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down…
▽ More
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down to 110 mK and up to a field of 8 T ($\simeq 0.4H{\rm_{c2}}$). In marked contrast to the penetration depth measurement, we observe a negligible residual linear term $κ_0/T$ in zero field, in line with the nodeless gap structure. The field dependence of $κ_0(H)/T$ shows a profile that is more consistent with either a highly anisotropic gap structure or multiple nodeless gaps with significantly different magnitudes. Moreover, first-principles calculations give two electronic bands with complex shape of Fermi surfaces. These results suggest multigap nodeless superconductivity in this multiband Rh$_{17}$S$_{15}$ superconductor.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment
Authors:
L. T. Yang,
S. K. Liu,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio…
▽ More
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Authors:
Chen Min,
Dawei Zhao,
Liang Xiao,
Jian Zhao,
Xinli Xu,
Zheng Zhu,
Lei Jin,
Jianshu Li,
Yulan Guo,
Junliang Xing,
Liping Jing,
Yiming Nie,
Bin Dai
Abstract:
Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by i…
▽ More
Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by introducing a world model-based autonomous driving 4D representation learning framework, dubbed \emph{DriveWorld}, which is capable of pre-training from multi-camera driving videos in a spatio-temporal fashion. Specifically, we propose a Memory State-Space Model for spatio-temporal modelling, which consists of a Dynamic Memory Bank module for learning temporal-aware latent dynamics to predict future changes and a Static Scene Propagation module for learning spatial-aware latent statics to offer comprehensive scene contexts. We additionally introduce a Task Prompt to decouple task-aware features for various downstream tasks. The experiments demonstrate that DriveWorld delivers promising results on various autonomous driving tasks. When pre-trained with the OpenScene dataset, DriveWorld achieves a 7.5% increase in mAP for 3D object detection, a 3.0% increase in IoU for online mapping, a 5.0% increase in AMOTA for multi-object tracking, a 0.1m decrease in minADE for motion forecasting, a 3.0% increase in IoU for occupancy prediction, and a 0.34m reduction in average L2 error for planning.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks
Authors:
Zijian Zhang,
Yujie Sun,
Zepu Wang,
Yuqi Nie,
Xiaobo Ma,
Peng Sun,
Ruolin Li
Abstract:
Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban plan…
▽ More
Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban planning. Machine learning and deep learning methods are favored for their flexibility and accuracy. Nowadays, with the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors. However, there is a lack of comprehensive studies on how LLMs can contribute to this field. This survey explores existing approaches using LLMs for mobility forecasting problems. We provide a literature review concerning the forecasting applications within transportation systems, elucidating how researchers utilize LLMs, showcasing recent state-of-the-art advancements, and identifying the challenges that must be overcome to fully leverage LLMs in this domain.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
Authors:
Sam Earle,
Filippos Kokkinos,
Yuhe Nie,
Julian Togelius,
Roberta Raileanu
Abstract:
Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, s…
▽ More
Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning
Authors:
Sunan He,
Yuxiang Nie,
Zhixuan Chen,
Zhiyuan Cai,
Hongmei Wang,
Shu Yang,
Hao Chen
Abstract:
The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to con…
▽ More
The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Sharp ill-posedness for the non-resistive MHD equations in Sobolev spaces
Authors:
Qionglei Chen,
Yao Nie,
Weikui Ye
Abstract:
In this paper, we prove a sharp ill-posedness result for the incompressible non-resistive MHD equations. In any dimension $d\ge 2$, we show the ill-posedness of the non-resistive MHD equations in $H^{\frac{d}{2}-1}(\mathbb{R}^d)\times H^{\frac{d}{2}}(\mathbb{R}^d)$, which is sharp in view of the results of the local well-posedness in…
▽ More
In this paper, we prove a sharp ill-posedness result for the incompressible non-resistive MHD equations. In any dimension $d\ge 2$, we show the ill-posedness of the non-resistive MHD equations in $H^{\frac{d}{2}-1}(\mathbb{R}^d)\times H^{\frac{d}{2}}(\mathbb{R}^d)$, which is sharp in view of the results of the local well-posedness in $H^{s-1}(\mathbb{R}^d)\times H^{s}(\mathbb{R}^d)(s>\frac{d}{2})$ established by Fefferman et al.(Arch. Ration. Mech. Anal., \textbf{223} (2), 677-691, 2017). Furthermore, we generalize the ill-posedness results from $H^{\frac{d}{2}-1}(\mathbb{R}^d)\times H^{\frac{d}{2}}(\mathbb{R}^d)$ to Besov spaces $B^{\frac{d}{p}-1}_{p, q}(\mathbb{R}^d)\times B^{\frac{d}{p}}_{p, q}(\mathbb{R}^d)$ and $\dot B^{\frac{d}{p}-1}_{p, q}(\mathbb{R}^d)\times \dot B^{\frac{d}{p}}_{p, q}(\mathbb{R}^d)$ for $1\le p\le\infty, q>1$. Different from the ill-posedness mechanism of the incompressible Navier-Stokes equations in $\dot B^{-1}_{\infty, q}$ \cite{B,W}, we construct an initial data such that the paraproduct terms (low-high frequency interaction) of the nonlinear term make the main contribution to the norm inflation of the magnetic field.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment
Authors:
J. X. Liu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne…
▽ More
We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions
Authors:
Yuting He,
Fuxiang Huang,
Xinrui Jiang,
Yuxiang Nie,
Minghao Wang,
Jiguang Wang,
Hao Chen
Abstract:
Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model…
▽ More
Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model (HFM), improving their advanced intelligent healthcare services. Despite the impending widespread deployment of HFMs, there is currently a lack of clear understanding about how they work in the healthcare field, their current challenges, and where they are headed in the future. To answer these questions, a comprehensive and deep survey of the challenges, opportunities, and future directions of HFMs is presented in this survey. It first conducted a comprehensive overview of the HFM including the methods, data, and applications for a quick grasp of the current progress. Then, it made an in-depth exploration of the challenges present in data, algorithms, and computing infrastructures for constructing and widespread application of foundation models in healthcare. This survey also identifies emerging and promising directions in this field for future development. We believe that this survey will enhance the community's comprehension of the current progress of HFM and serve as a valuable source of guidance for future development in this field. The latest HFM papers and related resources are maintained on our website: https://github.com/YutingHe-list/Awesome-Foundation-Models-for-Advancing-Healthcare.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to…
▽ More
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment
Authors:
Z. H. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran…
▽ More
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond
Authors:
Chongjie Ye,
Yinyu Nie,
Jiahao Chang,
Yuantao Chen,
Yihao Zhi,
Xiaoguang Han
Abstract:
We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdo…
▽ More
We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Authors:
Yujin Chen,
Yinyu Nie,
Benjamin Ummenhofer,
Reiner Birkl,
Michael Paulitsch,
Matthias Müller,
Matthias Nießner
Abstract:
We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues.…
▽ More
We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs.
△ Less
Submitted 5 September, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Mix-Initiative Response Generation with Dynamic Prefix Tuning
Authors:
Yuxiang Nie,
Heyan Huang,
Xian-Ling Mao,
Lizi Liao
Abstract:
Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuse…
▽ More
Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuses different initiatives and generates inappropriate responses. Moreover, obtaining plenty of human annotations for initiative labels can be expensive. To address this issue, we propose a general mix-Initiative Dynamic Prefix Tuning framework (IDPT) to decouple different initiatives from the generation model, which learns initiative-aware prefixes in both supervised and unsupervised settings. Specifically, IDPT decouples initiative factors into different prefix parameters and uses the attention mechanism to adjust the selection of initiatives in guiding generation dynamically. The prefix parameters can be tuned towards accurate initiative prediction as well as mix-initiative response generation. Extensive experiments on two public dialogue datasets show that the proposed IDPT outperforms previous baselines on both automatic metrics and human evaluations. It also manages to generate appropriate responses with manipulated initiatives.
△ Less
Submitted 27 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Multi-Convergence-Angle Ptychography with Simultaneous Strong Contrast and High Resolution
Authors:
Wei Mao,
Weiyang Zhang,
Chen Huang,
Liqi Zhou,
Judy. S. Kim,
Si Gao,
Yu Lei,
Xiaopeng Wu,
Yiming Hu,
Xudong Pei,
Weina Fang,
Xiaoguo Liu,
Jingdong Song,
Chunhai Fan,
Yuefeng Nie,
Angus. I. Kirkland,
Peng Wang
Abstract:
Advances in bioimaging methods and hardware facilities have revolutionised the determination of numerous biological structures at atomic or near-atomic resolution. Among these developments, electron ptychography has recently attracted considerable attention because of its superior resolution, remarkable sensitivity to light elements, and high electron dose efficiency. Here, we introduce an innovat…
▽ More
Advances in bioimaging methods and hardware facilities have revolutionised the determination of numerous biological structures at atomic or near-atomic resolution. Among these developments, electron ptychography has recently attracted considerable attention because of its superior resolution, remarkable sensitivity to light elements, and high electron dose efficiency. Here, we introduce an innovative approach called multi-convergence-angle (MCA) ptychography, which can simultaneously enhance both contrast and resolution with continuous information transfer across a wide spectrum of spatial frequency. Our work provides feasibility of future applications of MCA-ptychography in providing high-quality two-dimensional images as input to three-dimensional reconstruction methods, thereby facilitating more accurate determination of biological structures.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Elysium: Exploring Object-level Perception in Videos via MLLM
Authors:
Han Wang,
Yanjie Wang,
Yongjie Ye,
Yuxiang Nie,
Can Huang
Abstract:
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to two key challenges. Firstly, extensive pretraining on large-scale video datasets is required to equip MLLMs with the capability to perceive objects acr…
▽ More
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to two key challenges. Firstly, extensive pretraining on large-scale video datasets is required to equip MLLMs with the capability to perceive objects across multiple frames and understand inter-frame relationships. Secondly, processing a large number of frames within the context window of Large Language Models (LLMs) can impose a significant computational burden. To address the first challenge, we introduce ElysiumTrack-1M, a large-scale video dataset supported for three tasks: Single Object Tracking (SOT), Referring Single Object Tracking (RSOT), and Video Referring Expression Generation (Video-REG). ElysiumTrack-1M contains 1.27 million annotated video frames with corresponding object boxes and descriptions. Leveraging this dataset, we conduct training of MLLMs and propose a token-compression model T-Selector to tackle the second challenge. Our proposed approach, Elysium: Exploring Object-level Perception in Videos via MLLM, is an end-to-end trainable MLLM that attempts to conduct object-level tasks in videos without requiring any additional plug-in or expert models. All codes and datasets are available at https://github.com/Hon-Wong/Elysium.
△ Less
Submitted 29 March, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Foundation Models for Time Series Analysis: A Tutorial and Survey
Authors:
Yuxuan Liang,
Haomin Wen,
Yuqi Nie,
Yushan Jiang,
Ming Jin,
Dongjin Song,
Shirui Pan,
Qingsong Wen
Abstract:
Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage…
▽ More
Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage pre-trained or fine-tuned FMs to harness generalized knowledge tailored for time series analysis. This survey aims to furnish a comprehensive and up-to-date overview of FMs for time series analysis. While prior surveys have predominantly focused on either application or pipeline aspects of FMs in time series analysis, they have often lacked an in-depth understanding of the underlying mechanisms that elucidate why and how FMs benefit time series analysis. To address this gap, our survey adopts a methodology-centric classification, delineating various pivotal elements of time-series FMs, including model architectures, pre-training techniques, adaptation methods, and data modalities. Overall, this survey serves to consolidate the latest advancements in FMs pertinent to time series analysis, accentuating their theoretical underpinnings, recent strides in development, and avenues for future exploration.
△ Less
Submitted 18 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Authors:
Rao Fu,
Jingyu Liu,
Xilun Chen,
Yixin Nie,
Wenhan Xiong
Abstract:
This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths of Large Language Models (LLMs). Scene-LLM adopts a hybrid 3D visual feature representation, that incorporates dense spatial information and supports scene state updates. The model employs a projection layer to efficiently…
▽ More
This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths of Large Language Models (LLMs). Scene-LLM adopts a hybrid 3D visual feature representation, that incorporates dense spatial information and supports scene state updates. The model employs a projection layer to efficiently project these features in the pre-trained textual embedding space, enabling effective interpretation of 3D visual information. Unique to our approach is the integration of both scene-level and ego-centric 3D information. This combination is pivotal for interactive planning, where scene-level data supports global planning and ego-centric data is important for localization. Notably, we use ego-centric 3D frame features for feature alignment, an efficient technique that enhances the model's ability to align features of small objects within the scene. Our experiments with Scene-LLM demonstrate its strong capabilities in dense captioning, question answering, and interactive planning. We believe Scene-LLM advances the field of 3D visual understanding and reasoning, offering new possibilities for sophisticated agent interactions in indoor settings.
△ Less
Submitted 22 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Measurement-device-independent quantum random number generation over 23 Mbps with imperfect single-photon sources
Authors:
You-Qi Nie,
Hongyi Zhou,
Bing Bai,
Qi Xu,
Xiongfeng Ma,
Jun Zhang,
Jian-Wei Pan
Abstract:
Quantum randomness relies heavily on the accurate characterization of the generator implementation, where the device imperfection or inaccurate characterization can lead to incorrect entropy estimation and practical bias, significantly affecting the reliability of the generated randomness. Measurement-device-independent (MDI) quantum random number generation (QRNG) endeavors to produce certified r…
▽ More
Quantum randomness relies heavily on the accurate characterization of the generator implementation, where the device imperfection or inaccurate characterization can lead to incorrect entropy estimation and practical bias, significantly affecting the reliability of the generated randomness. Measurement-device-independent (MDI) quantum random number generation (QRNG) endeavors to produce certified randomness, utilizing uncharacterized and untrusted measurement devices that are vulnerable to numerous attack schemes targeting measurement loopholes. However, existing implementations have shown insufficient performance thus far. Here, we propose a high-speed MDI-QRNG scheme based on a robust measurement tomography approach against the imperfection of single-photon sources. Compared with the conventional approach, the decoy-state method is introduced to obtain more accurate tomography results and a tighter lower bound of randomness. Finally, by using a high-speed time-bin encoding system, we experimentally demonstrated the scheme and obtained a reliable min-entropy lower bound of $7.37 \times 10^{-2}$ bits per pulse, corresponding to a generation rate over 23 Mbps, which substantially outperforms the existing realizations and makes a record in discrete-variable semi-device-independent QRNGs.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Authors:
Bingqian Lin,
Yunshuang Nie,
Ziming Wei,
Jiaqi Chen,
Shikui Ma,
Jianhua Han,
Hang Xu,
Xiaojun Chang,
Xiaodan Liang
Abstract:
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offlin…
▽ More
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offline manner usually suffers from substantial domain gap between the VLN task and the LLM training corpus. This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision, leading to a significant mitigation of the domain gap in a cost-effective manner. Specifically, at each timestep, the LLM is prompted to forecast the navigational chain-of-thought by: 1) acting as a world model to imagine the next observation according to the instruction, 2) selecting the candidate observation that best aligns with the imagination, and 3) determining the action based on the reasoning from the prior steps. Through constructing formalized labels for training, the LLM can learn to generate desired and reasonable chain-of-thought outputs for improving the action decision. Experimental results across various training settings and popular VLN benchmarks (e.g., Room-to-Room (R2R), Room-across-Room (RxR), Room-for-Room (R4R)) show the significant superiority of NavCoT over the direct action prediction variants. Through simple parameter-efficient finetuning, our NavCoT outperforms a recent GPT4-based approach with ~7% relative improvement on the R2R dataset. We believe that NavCoT will help unlock more task-adaptive and scalable LLM-based embodied agents, which are helpful for developing real-world robotics applications. Code is available at https://github.com/expectorlin/NavCoT.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Electronic Structure of Superconducting Infinite-Layer Lanthanum Nickelates
Authors:
Wenjie Sun,
Zhicheng Jiang,
Chengliang Xia,
Bo Hao,
Yueying Li,
Shengjun Yan,
Maosen Wang,
Hongquan Liu,
Jianyang Ding,
Jiayu Liu,
Zhengtai Liu,
Jishan Liu,
Hanghui Chen,
Dawei Shen,
Yuefeng Nie
Abstract:
Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared…
▽ More
Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared by molecular beam epitaxy and ${\mathrm{\textit{in situ}}}$ atomic-hydrogen reduction. The measured Fermi topology closely matches theoretical calculations, showing a large Ni-$d_{x^2-y^2}$ derived Fermi sheet that evolves from hole-like to electron-like along $k_{z}$, and a three-dimensional (3D) electron pocket centered at Brillouin zone corner. The Ni-$d_{x^2-y^2}$ derived bands show a mass enhancement ($m^*/m_{\rm{DFT}}$) of 2-3,while the 3D electron band shows negligible band renormalization. Moreover, the Ni-$d_{x^2-y^2}$ derived states also display a band dispersion anomaly at higher binding energy, reminiscent of the waterfall feature and kinks observed in cuprates.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Tracking-in-range Formulations for Numerical Optimal Control
Authors:
Nikilesh Ramesh,
Eric C. Kerrigan,
Yuanbo Nie
Abstract:
In contrast to set-point tracking which aims to reduce the tracking error between the tracker and the reference, tracking-in-range problems only focus on whether the tracker is within a given range around the reference, making it more suitable for the mission specifications of many practical applications. In this work, we present novel optimal control formulations to solve tracking-in-range proble…
▽ More
In contrast to set-point tracking which aims to reduce the tracking error between the tracker and the reference, tracking-in-range problems only focus on whether the tracker is within a given range around the reference, making it more suitable for the mission specifications of many practical applications. In this work, we present novel optimal control formulations to solve tracking-in-range problems, for both problems requiring the tracker to be always in range, and problems allowing the tracker to go out of range to yield overall better outcomes. As the problem naturally involves discontinuous functions, we present alternative formulations and regularisation strategies to improve the performance of numerical solvers. The extension to in-range tracking with multiple trackers and in-range tracking in high dimensional space are also discussed and illustrated with numerical examples, demonstrating substantial increases in mission duration in comparison to traditional set-point tracking.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Model Predictive Bang-Bang Controller Synthesis via Approximate Value Functions
Authors:
Morgan Jones,
Yuanbo Nie,
Matthew M. Peet
Abstract:
In this paper, we propose a novel method for addressing Optimal Control Problems (OCPs) with input-affine dynamics and cost functions. This approach adopts a Model Predictive Control (MPC) strategy, wherein a controller is synthesized to handle an approximated OCP within a finite time horizon. Upon reaching this horizon, the controller is re-calibrated to tackle another approximation of the OCP, w…
▽ More
In this paper, we propose a novel method for addressing Optimal Control Problems (OCPs) with input-affine dynamics and cost functions. This approach adopts a Model Predictive Control (MPC) strategy, wherein a controller is synthesized to handle an approximated OCP within a finite time horizon. Upon reaching this horizon, the controller is re-calibrated to tackle another approximation of the OCP, with the approximation updated based on the final state and time information. To tackle each OCP instance, all non-polynomial terms are Taylor-expanded about the current time and state and the resulting Hamilton-Jacobi-Bellman (HJB) PDE is solved via Sum-of-Squares (SOS) programming, providing us with an approximate polynomial value function that can be used to synthesize a bang-bang controller.
△ Less
Submitted 16 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Multiple-Crop Human Mesh Recovery with Contrastive Learning and Camera Consistency in A Single Image
Authors:
Yongwei Nie,
Changzhen Liu,
Chengjiang Long,
Qing Zhang,
Guiqing Li,
Hongmin Cai
Abstract:
We tackle the problem of single-image Human Mesh Recovery (HMR). Previous approaches are mostly based on a single crop. In this paper, we shift the single-crop HMR to a novel multiple-crop HMR paradigm. Cropping a human from image multiple times by shifting and scaling the original bounding box is feasible in practice, easy to implement, and incurs neglectable cost, but immediately enriches availa…
▽ More
We tackle the problem of single-image Human Mesh Recovery (HMR). Previous approaches are mostly based on a single crop. In this paper, we shift the single-crop HMR to a novel multiple-crop HMR paradigm. Cropping a human from image multiple times by shifting and scaling the original bounding box is feasible in practice, easy to implement, and incurs neglectable cost, but immediately enriches available visual details. With multiple crops as input, we manage to leverage the relation among these crops to extract discriminative features and reduce camera ambiguity. Specifically, (1) we incorporate a contrastive learning scheme to enhance the similarity between features extracted from crops of the same human. (2) We also propose a crop-aware fusion scheme to fuse the features of multiple crops for regressing the target mesh. (3) We compute local cameras for all the input crops and build a camera-consistency loss between the local cameras, which reward us with less ambiguous cameras. Based on the above innovations, our proposed method outperforms previous approaches as demonstrated by the extensive experiments.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
RimiRec: Modeling Refined Multi-interest in Hierarchical Structure for Recommendation
Authors:
Haolei Pei,
Yuanyuan Xu,
Yangping Zhu,
Yuan Nie
Abstract:
Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, a user shows refined multi-interests in a hierarchical structure. For example, a user likes Conan and Kuroba Kaito, whic…
▽ More
Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, a user shows refined multi-interests in a hierarchical structure. For example, a user likes Conan and Kuroba Kaito, which are the roles in hierarchical structure "Animation, Japanese Animation, Detective Conan". However, most existing methods ignore this hierarchical nature, and simply average the fine-grained interest information. Therefore, we propose a novel two-stage approach to explicitly modeling refined multi-interest in a hierarchical structure for recommendation. In the first hierarchical multi-interest mining stage, the hierarchical clustering and transformer-based model adaptively generate circles or sub-circles that users are interested in. In the second stage, the partition of retrieval space allows the EBR models to deal only with items within each circle and accurately capture users' refined interests. Experimental results show that the proposed approach achieves state-of-the-art performance. Our framework has also been deployed at Lofter.
△ Less
Submitted 5 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Superconductivity in freestanding infinite-layer nickelate membranes
Authors:
Shengjun Yan,
Wei Mao,
Wenjie Sun,
Yueying Li,
Haoying Sun,
Jiangfeng Yang,
Bo Hao,
Wei Guo,
Leyan Nian,
Zhengbin Gu,
Peng Wang,
Yuefeng Nie
Abstract:
The observation of superconductivity in infinite-layer nickelates has attracted significant attention due to its potential as a new platform for exploring high $ \mathrm{\textit{T}}_{c} $ superconductivity. However, thus far, superconductivity has only been observed in epitaxial thin films, which limits the manipulation capabilities and modulation methods compared to two-dimensional exfoliated mat…
▽ More
The observation of superconductivity in infinite-layer nickelates has attracted significant attention due to its potential as a new platform for exploring high $ \mathrm{\textit{T}}_{c} $ superconductivity. However, thus far, superconductivity has only been observed in epitaxial thin films, which limits the manipulation capabilities and modulation methods compared to two-dimensional exfoliated materials. Given the exceptionally giant strain tunability and stacking capability of freestanding membranes, separating superconducting nickelates from the as-grown substrate is a novel way to engineer the superconductivity and uncover the underlying physics. Herein, we report the synthesis of the superconducting freestanding $ \mathrm{La}_{0.8}\mathrm{Sr}_{0.2}\mathrm{Ni}\mathrm{O}_{2} $ membranes ($ \mathrm{\textit{T}}_{c}\mathrm{=}\mathrm{10.9}\;\mathrm{K} $), emphasizing the crucial roles of the interface engineering in the precursor phase film growth and the quick transfer process in achieving superconductivity. Our work offers a new versatile platform for investigating the superconductivity in nickelates, such as the pairing symmetry via constructing Josephson tunneling junctions and higher $ \mathrm{\textit{T}}_{c} $ values via high-pressure experiments.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
${\mathrm{\textit{In situ}}}$ preparation of superconducting infinite-layer nickelate thin films with atomically flat surface
Authors:
Wenjie Sun,
Zhichao Wang,
Bo Hao,
Shengjun Yan,
Haoying Sun,
Zhengbin Gu,
Yu Deng,
Yuefeng Nie
Abstract:
Since their discovery, the infinite-layer nickelates have been regarded as an appealing system for gaining deeper insights into high temperature superconductivity (HTSC). However, the synthesis of superconducting samples has been proved to be challenging. Here, we develop an ultrahigh vacuum (UHV) ${\mathrm{\textit{in situ}}}$ reduction method using atomic hydrogen as reducing agent and apply it i…
▽ More
Since their discovery, the infinite-layer nickelates have been regarded as an appealing system for gaining deeper insights into high temperature superconductivity (HTSC). However, the synthesis of superconducting samples has been proved to be challenging. Here, we develop an ultrahigh vacuum (UHV) ${\mathrm{\textit{in situ}}}$ reduction method using atomic hydrogen as reducing agent and apply it in lanthanum nickelate system. The reduction parameters, including the reduction temperature (${\mathrm{\textit{T}_{R}}}$) and hydrogen pressure (${\mathrm{\textit{P}_{H}}}$), are systematically explored. We found that the reduction window for achieving superconducting transition is quite wide, reaching nearly 80$^\circ$C in ${\mathrm{\textit{T}_{R}}}$ and 3 orders of magnitude in ${\mathrm{\textit{P}_{H}}}$ when the reduction time is set to 30 mins. And there exists an optimal ${\mathrm{\textit{P}_{H}}}$ for achieving the highest ${\mathrm{\textit{T}_{c}}}$ if both ${\mathrm{\textit{T}_{R}}}$ and reduction time are fixed. More prominently, as confirmed by atomic force microscopy and scanning transmission electron microscopy, the atomically flat surface can be preserved during the ${\mathrm{\textit{in situ}}}$ reduction process, providing advantages over the ${\mathrm{\textit{ex situ}}}$ CaH$_2$ method for surface-sensitive experiments.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Incorporating Exemplar Optimization into Training with Dual Networks for Human Mesh Recovery
Authors:
Yongwei Nie,
Mingxian Fan,
Chengjiang Long,
Qing Zhang,
Jian Zhu,
Xuemiao Xu
Abstract:
We propose a novel optimization-based human mesh recovery method from a single image. Given a test exemplar, previous approaches optimize the pre-trained regression network to minimize the 2D re-projection loss, which however suffer from over-/under-fitting problems. This is because the ``exemplar optimization'' at testing time has too weak relation to the pre-training process, and the exemplar op…
▽ More
We propose a novel optimization-based human mesh recovery method from a single image. Given a test exemplar, previous approaches optimize the pre-trained regression network to minimize the 2D re-projection loss, which however suffer from over-/under-fitting problems. This is because the ``exemplar optimization'' at testing time has too weak relation to the pre-training process, and the exemplar optimization loss function is different from the training loss function. (1) We incorporate exemplar optimization into the training stage. During training, our method first executes exemplar optimization and subsequently proceeds with training-time optimization. The exemplar optimization may run into a wrong direction, while the subsequent training optimization serves to correct the deviation. Involved in training, the exemplar optimization learns to adapt its behavior to training data, thereby acquires generalibility to test exemplars. (2) We devise a dual-network architecture to convey the novel training paradigm, which is composed of a main regression network and an auxiliary network, in which we can formulate the exemplar optimization loss function in the same form as the training loss function. This further enhances the compatibility between the exemplar and training optimizations. Experiments demonstrate that our exemplar optimization after the novel training scheme significantly outperforms state-of-the-art approaches.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
Authors:
Yongwei Nie,
Hao Huang,
Chengjiang Long,
Qing Zhang,
Pradipta Maji,
Hongmin Cai
Abstract:
Without human annotations, a typical Unsupervised Video Anomaly Detection (UVAD) method needs to train two models that generate pseudo labels for each other. In previous work, the two models are closely entangled with each other, and it is not known how to upgrade their method without modifying their training framework significantly. Second, previous work usually adopts fixed thresholding to obtai…
▽ More
Without human annotations, a typical Unsupervised Video Anomaly Detection (UVAD) method needs to train two models that generate pseudo labels for each other. In previous work, the two models are closely entangled with each other, and it is not known how to upgrade their method without modifying their training framework significantly. Second, previous work usually adopts fixed thresholding to obtain pseudo labels, however the user-specified threshold is not reliable which inevitably introduces errors into the training process. To alleviate these two problems, we propose a novel interleaved framework that alternately trains a One-Class Classification (OCC) model and a Weakly-Supervised (WS) model for UVAD. The OCC or WS models in our method can be easily replaced with other OCC or WS models, which facilitates our method to upgrade with the most recent developments in both fields. For handling the fixed thresholding problem, we break through the conventional cognitive boundary and propose a weighted OCC model that can be trained on both normal and abnormal data. We also propose an adaptive mechanism for automatically finding the optimal threshold for the WS model in a loose to strict manner. Experiments demonstrate that the proposed UVAD method outperforms previous approaches.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning
Authors:
Zhishuai Li,
Yunhao Nie,
Ziyue Li,
Lei Bai,
Yisheng Lv,
Rui Zhao
Abstract:
Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer…
▽ More
Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer constructive information, and neighbors could also be misleading. To this end, we propose ``Contrastive-Prototypical'' self-supervised learning for Kriging (KCP) to refine valuable information from neighbors and recycle the one from non-neighbors. As a pre-trained paradigm, we conduct the Kriging task from a new perspective of representation: we aim to first learn robust and general representations and then recover attributes from representations. A neighboring contrastive module is designed that coarsely learns the representations by narrowing the representation distance between the target and its neighbors while pushing away the non-neighbors. In parallel, a prototypical module is introduced to identify similar representations via exchanged prediction, thus refining the misleading neighbors and recycling the useful non-neighbors from the neighboring contrast component. As a result, not all the neighbors and some of the non-neighbors will be used to infer the target. To encourage the two modules above to learn general and robust representations, we design an adaptive augmentation module that incorporates data-driven attribute augmentation and centrality-based topology augmentation over the spatiotemporal Kriging graph data. Extensive experiments on real-world datasets demonstrate the superior performance of KCP compared to its peers with 6% improvements and exceptional transferability and robustness. The code is available at https://github.com/bonaldli/KCP
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
A Day-to-Day Dynamical Approach to the Most Likely User Equilibrium Problem
Authors:
Jiayang Li,
Qianni Wang,
Liyang Feng,
Jun Xie,
Yu Marco Nie
Abstract:
The lack of a unique user equilibrium (UE) route flow in traffic assignment has posed a significant challenge to many transportation applications. The maximum-entropy principle, which advocates for the consistent selection of the most likely solution as a representative, is often used to address the challenge. Built on a recently proposed day-to-day (DTD) discrete-time dynamical model called cumul…
▽ More
The lack of a unique user equilibrium (UE) route flow in traffic assignment has posed a significant challenge to many transportation applications. The maximum-entropy principle, which advocates for the consistent selection of the most likely solution as a representative, is often used to address the challenge. Built on a recently proposed day-to-day (DTD) discrete-time dynamical model called cumulative logit (CULO), this study provides a new behavioral underpinning for the maximum-entropy UE (MEUE) route flow. It has been proven that CULO can reach a UE state without presuming travelers are perfectly rational. Here, we further establish that CULO always converges to the MEUE route flow if (i) travelers have zero prior information about routes and thus are forced to give all routes an equal choice probability, or (ii) all travelers gather information from the same source such that the so-called general proportionality condition is satisfied. Thus, CULO may be used as a practical solution algorithm for the MEUE problem. To put this idea into practice, we propose to eliminate the route enumeration requirement of the original CULO model through an iterative route discovery scheme. We also examine the discrete-time versions of four popular continuous-time dynamical models and compare them to CULO. The analysis shows that the replicator dynamic is the only one that has the potential to reach the MEUE solution with some regularity. The analytical results are confirmed through numerical experiments.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Inelastic electron scattering at large angles: the phonon polariton contribution
Authors:
Hongbin Yang,
Paul Zeiger,
Andrea Konečná,
Lu Han,
Guangyao Miao,
Yinong Zhou,
Yifeng Huang,
Xingxu Yan,
Weihua Wang,
Jiandong Guo,
Yuefeng Nie,
Ruqian Wu,
Jan Rusz,
Xiaoqing Pan
Abstract:
We explore the inelastic electron scattering in SrTiO3, PbTiO3, and SiC in their phonon energy range, challenging the assumption that phonon polaritons are excluded at large angles in high-resolution transmission electron energy-loss spectroscopy. We demonstrate that through multiple scattering, the electron beam can excite both phonons and phonon polaritons, and the relative proportion of each va…
▽ More
We explore the inelastic electron scattering in SrTiO3, PbTiO3, and SiC in their phonon energy range, challenging the assumption that phonon polaritons are excluded at large angles in high-resolution transmission electron energy-loss spectroscopy. We demonstrate that through multiple scattering, the electron beam can excite both phonons and phonon polaritons, and the relative proportion of each varies depending on the structure factor and scattering angle. Integrating dielectric theory, density functional theory, and multi-slice simulations, we provide a comprehensive framework for understanding these interactions in materials with polar optical phonons.
△ Less
Submitted 9 January, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
Authors:
Yunhe Wang,
Hanting Chen,
Yehui Tang,
Tianyu Guo,
Kai Han,
Ying Nie,
Xutao Wang,
Hailin Hu,
Zheyuan Bai,
Yun Wang,
Fangcheng Liu,
Zhicheng Liu,
Jianyuan Guo,
Sinan Zeng,
Yinchen Zhang,
Qinghua Xu,
Qun Liu,
Jun Yao,
Chao Xu,
Dacheng Tao
Abstract:
The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of…
▽ More
The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$π$. Experiments are then conducted using the same dataset and training strategy to compare PanGu-$π$ with state-of-the-art LLMs. The results show that PanGu-$π$-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-$π$-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-$π$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.