-
ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs
Authors:
Debashis Gupta,
Aditi Golder,
Luis Fernendez,
Miles Silman,
Greg Lersen,
Fan Yang,
Bob Plemmons,
Sarra Alqahtani,
Paul Victor Pauca
Abstract:
Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (A…
▽ More
Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Authors:
Md Tahmid Rahman Laskar,
Sawsan Alqahtani,
M Saiful Bari,
Mizanur Rahman,
Mohammad Abdullah Matin Khan,
Haidar Khan,
Israt Jahan,
Amran Bhuiyan,
Chee Wei Tan,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty,
Jimmy Huang
Abstract:
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple…
▽ More
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
PalmProbNet: A Probabilistic Approach to Understanding Palm Distributions in Ecuadorian Tropical Forest via Transfer Learning
Authors:
Kangning Cui,
Zishan Shao,
Gregory Larsen,
Victor Pauca,
Sarra Alqahtani,
David Segurado,
João Pinheiro,
Manqi Wang,
David Lutz,
Robert Plemmons,
Miles Silman
Abstract:
Palms play an outsized role in tropical forests and are important resources for humans and wildlife. A central question in tropical ecosystems is understanding palm distribution and abundance. However, accurately identifying and localizing palms in geospatial imagery presents significant challenges due to dense vegetation, overlapping canopies, and variable lighting conditions in mixed-forest land…
▽ More
Palms play an outsized role in tropical forests and are important resources for humans and wildlife. A central question in tropical ecosystems is understanding palm distribution and abundance. However, accurately identifying and localizing palms in geospatial imagery presents significant challenges due to dense vegetation, overlapping canopies, and variable lighting conditions in mixed-forest landscapes. Addressing this, we introduce PalmProbNet, a probabilistic approach utilizing transfer learning to analyze high-resolution UAV-derived orthomosaic imagery, enabling the detection of palm trees within the dense canopy of the Ecuadorian Rainforest. This approach represents a substantial advancement in automated palm detection, effectively pinpointing palm presence and locality in mixed tropical rainforests. Our process begins by generating an orthomosaic image from UAV images, from which we extract and label palm and non-palm image patches in two distinct sizes. These patches are then used to train models with an identical architecture, consisting of an unaltered pre-trained ResNet-18 and a Multilayer Perceptron (MLP) with specifically trained parameters. Subsequently, PalmProbNet employs a sliding window technique on the landscape orthomosaic, using both small and large window sizes to generate a probability heatmap. This heatmap effectively visualizes the distribution of palms, showcasing the scalability and adaptability of our approach in various forest densities. Despite the challenging terrain, our method demonstrated remarkable performance, achieving an accuracy of 97.32% and a Cohen's kappa of 94.59% in testing.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Securing the Invisible Thread: A Comprehensive Analysis of BLE Tracker Security in Apple AirTags and Samsung SmartTags
Authors:
Hosam Alamleh,
Michael Gogarty,
David Ruddell,
Ali Abdullah S. AlQahtani
Abstract:
This study presents an in-depth analysis of the security landscape in Bluetooth Low Energy (BLE) tracking systems, with a particular emphasis on Apple AirTags and Samsung SmartTags, including their cryptographic frameworks. Our investigation traverses a wide spectrum of attack vectors such as physical tampering, firmware exploitation, signal spoofing, eavesdropping, jamming, app security flaws, Bl…
▽ More
This study presents an in-depth analysis of the security landscape in Bluetooth Low Energy (BLE) tracking systems, with a particular emphasis on Apple AirTags and Samsung SmartTags, including their cryptographic frameworks. Our investigation traverses a wide spectrum of attack vectors such as physical tampering, firmware exploitation, signal spoofing, eavesdropping, jamming, app security flaws, Bluetooth security weaknesses, location spoofing, threats to owner devices, and cloud-related vulnerabilities. Moreover, we delve into the security implications of the cryptographic methods utilized in these systems. Our findings reveal that while BLE trackers like AirTags and SmartTags offer substantial utility, they also pose significant security risks. Notably, Apple's approach, which prioritizes user privacy by removing intermediaries, inadvertently leads to device authentication challenges, evidenced by successful AirTag spoofing instances. Conversely, Samsung SmartTags, designed to thwart beacon spoofing, raise critical concerns about cloud security and user privacy. Our analysis also highlights the constraints faced by these devices due to their design focus on battery life conservation, particularly the absence of secure boot processes, which leaves them susceptible to OS modification and a range of potential attacks. The paper concludes with insights into the anticipated evolution of these tracking systems. We predict that future enhancements will likely focus on bolstering security features, especially as these devices become increasingly integrated into the broader IoT ecosystem and face evolving privacy regulations. This shift is imperative to address the intricate balance between functionality and security in next-generation BLE tracking systems.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Navigating Cybersecurity Training: A Comprehensive Review
Authors:
Saif Al-Dean Qawasmeh,
Ali Abdullah S. AlQahtani,
Muhammad Khurram Khan
Abstract:
In the dynamic realm of cybersecurity, awareness training is crucial for strengthening defenses against cyber threats. This survey examines a spectrum of cybersecurity awareness training methods, analyzing traditional, technology-based, and innovative strategies. It evaluates the principles, efficacy, and constraints of each method, presenting a comparative analysis that highlights their pros and…
▽ More
In the dynamic realm of cybersecurity, awareness training is crucial for strengthening defenses against cyber threats. This survey examines a spectrum of cybersecurity awareness training methods, analyzing traditional, technology-based, and innovative strategies. It evaluates the principles, efficacy, and constraints of each method, presenting a comparative analysis that highlights their pros and cons. The study also investigates emerging trends like artificial intelligence and extended reality, discussing their prospective influence on the future of cybersecurity training. Additionally, it addresses implementation challenges and proposes solutions, drawing on insights from real-world case studies. The goal is to bolster the understanding of cybersecurity awareness training's current landscape, offering valuable perspectives for both practitioners and scholars.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Leveraging Machine Learning for Wi-Fi-based Environmental Continuous Two-Factor Authentication
Authors:
Ali Abdullah S. AlQahtani,
Thamraa Alshayeb,
Mahmoud Nabil,
Ahmad Patooghy
Abstract:
The traditional two-factor authentication (2FA) methods primarily rely on the user manually entering a code or token during the authentication process. This can be burdensome and time-consuming, particularly for users who must be authenticated frequently. To tackle this challenge, we present a novel 2FA approach replacing the user's input with decisions made by Machine Learning (ML) that continuou…
▽ More
The traditional two-factor authentication (2FA) methods primarily rely on the user manually entering a code or token during the authentication process. This can be burdensome and time-consuming, particularly for users who must be authenticated frequently. To tackle this challenge, we present a novel 2FA approach replacing the user's input with decisions made by Machine Learning (ML) that continuously verifies the user's identity with zero effort. Our system exploits unique environmental features associated with the user, such as beacon frame characteristics and Received Signal Strength Indicator (RSSI) values from Wi-Fi Access Points (APs). These features are gathered and analyzed in real-time by our ML algorithm to ascertain the user's identity. For enhanced security, our system mandates that the user's two devices (i.e., a login device and a mobile device) be situated within a predetermined proximity before granting access. This precaution ensures that unauthorized users cannot access sensitive information or systems, even with the correct login credentials. Through experimentation, we have demonstrated our system's effectiveness in determining the location of the user's devices based on beacon frame characteristics and RSSI values, achieving an accuracy of 92.4%. Additionally, we conducted comprehensive security analysis experiments to evaluate the proposed 2FA system's resilience against various cyberattacks. Our findings indicate that the system exhibits robustness and reliability in the face of these threats. The scalability, flexibility, and adaptability of our system render it a promising option for organizations and users seeking a secure and convenient authentication system.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Comprehensive Survey: Biometric User Authentication Application, Evaluation, and Discussion
Authors:
Reem Alrawili,
Ali Abdullah S. AlQahtani,
Muhammad Khurram Khan
Abstract:
This paper conducts an extensive review of biometric user authentication literature, addressing three primary research questions: (1) commonly used biometric traits and their suitability for specific applications, (2) performance factors such as security, convenience, and robustness, and potential countermeasures against cyberattacks, and (3) factors affecting biometric system accuracy and po-tent…
▽ More
This paper conducts an extensive review of biometric user authentication literature, addressing three primary research questions: (1) commonly used biometric traits and their suitability for specific applications, (2) performance factors such as security, convenience, and robustness, and potential countermeasures against cyberattacks, and (3) factors affecting biometric system accuracy and po-tential improvements. Our analysis delves into physiological and behavioral traits, exploring their pros and cons. We discuss factors influencing biometric system effectiveness and highlight areas for enhancement. Our study differs from previous surveys by extensively examining biometric traits, exploring various application domains, and analyzing measures to mitigate cyberattacks. This paper aims to inform researchers and practitioners about the biometric authentication landscape and guide future advancements.
△ Less
Submitted 20 January, 2024; v1 submitted 7 July, 2023;
originally announced November 2023.
-
Automatic Restoration of Diacritics for Speech Data Sets
Authors:
Sara Shatnawi,
Sawsan Alqahtani,
Hanan Aldarmaki
Abstract:
Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language. In this work, we explore the possibility of improving the performance of automatic diacritic restoration when applied to speech data by utilizing parallel spoken utterances. In particular, we use the pre-trained Wh…
▽ More
Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language. In this work, we explore the possibility of improving the performance of automatic diacritic restoration when applied to speech data by utilizing parallel spoken utterances. In particular, we use the pre-trained Whisper ASR model fine-tuned on relatively small amounts of diacritized Arabic speech data to produce rough diacritized transcripts for the speech utterances, which we then use as an additional input for diacritic restoration models. The proposed framework consistently improves diacritic restoration performance compared to text-only baselines. Our results highlight the inadequacy of current text-based diacritic restoration models for speech data sets and provide a new baseline for speech-based diacritic restoration.
△ Less
Submitted 6 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Autonomous Vehicles an overview on system, cyber security, risks, issues, and a way forward
Authors:
Md Aminul Islam,
Sarah Alqahtani
Abstract:
This chapter explores the complex realm of autonomous cars, analyzing their fundamental components and operational characteristics. The initial phase of the discussion is elucidating the internal mechanics of these automobiles, encompassing the crucial involvement of sensors, artificial intelligence (AI) identification systems, control mechanisms, and their integration with cloud-based servers wit…
▽ More
This chapter explores the complex realm of autonomous cars, analyzing their fundamental components and operational characteristics. The initial phase of the discussion is elucidating the internal mechanics of these automobiles, encompassing the crucial involvement of sensors, artificial intelligence (AI) identification systems, control mechanisms, and their integration with cloud-based servers within the framework of the Internet of Things (IoT). It delves into practical implementations of autonomous cars, emphasizing their utilization in forecasting traffic patterns and transforming the dynamics of transportation. The text also explores the topic of Robotic Process Automation (RPA), illustrating the impact of autonomous cars on different businesses through the automation of tasks. The primary focus of this investigation lies in the realm of cybersecurity, specifically in the context of autonomous vehicles. A comprehensive analysis will be conducted to explore various risk management solutions aimed at protecting these vehicles from potential threats including ethical, environmental, legal, professional, and social dimensions, offering a comprehensive perspective on their societal implications. A strategic plan for addressing the challenges and proposing strategies for effectively traversing the complex terrain of autonomous car systems, cybersecurity, hazards, and other concerns are some resources for acquiring an understanding of the intricate realm of autonomous cars and their ramifications in contemporary society, supported by a comprehensive compilation of resources for additional investigation.
Keywords: RPA, Cyber Security, AV, Risk, Smart Cars
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Secure Mobile Payment Architecture Enabling Multi-factor Authentication
Authors:
Hosam Alamleh,
Ali Abdullah S. AlQahtani,
Baker Al Smadi
Abstract:
The rise of smartphones has led to a significant increase in the usage of mobile payments. Mobile payments allow individuals to access financial resources and make transactions through their mobile devices while on the go. However, the current mobile payment systems were designed to align with traditional payment structures, which limits the full potential of smartphones, including their security…
▽ More
The rise of smartphones has led to a significant increase in the usage of mobile payments. Mobile payments allow individuals to access financial resources and make transactions through their mobile devices while on the go. However, the current mobile payment systems were designed to align with traditional payment structures, which limits the full potential of smartphones, including their security features. This has become a major concern in the rapidly growing mobile payment market. To address these security concerns,in this paper we propose new mobile payment architecture. This architecture leverages the advanced capabilities of modern smartphones to verify various aspects of a payment, such as funds, biometrics, location, and others. The proposed system aims to guarantee the legitimacy of transactions and protect against identity theft by verifying multiple elements of a payment. The security of mobile payment systems is crucial, given the rapid growth of the market. Evaluating mobile payment systems based on their authentication, encryption, and fraud detection capabilities is of utmost importance. The proposed architecture provides a secure mobile payment solution that enhances the overall payment experience by taking advantage of the advanced capabilities of modern smartphones. This will not only improve the security of mobile payments but also offer a more user-friendly payment experience for consumers.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Zero-Effort Two-Factor Authentication Using Wi-Fi Radio Wave Transmission and Machine Learning
Authors:
Ali Abdullah S. AlQahtani,
Thamraa Alshayeb
Abstract:
The proliferation of sensitive information being stored online highlights the pressing need for secure and efficient user authentication methods. To address this issue, this paper presents a novel zero-effort two-factor authentication (2FA) approach that combines the unique characteristics of a users environment and Machine Learning (ML) to confirm their identity. Our proposed approach utilizes Wi…
▽ More
The proliferation of sensitive information being stored online highlights the pressing need for secure and efficient user authentication methods. To address this issue, this paper presents a novel zero-effort two-factor authentication (2FA) approach that combines the unique characteristics of a users environment and Machine Learning (ML) to confirm their identity. Our proposed approach utilizes Wi-Fi radio wave transmission and ML algorithms to analyze beacon frame characteristics and Received Signal Strength Indicator (RSSI) values from Wi-Fi access points to determine the users location. The aim is to provide a secure and efficient method of authentication without the need for additional hardware or software. A prototype was developed using Raspberry Pi devices and experiments were conducted to demonstrate the effectiveness and practicality of the proposed approach. Results showed that the proposed system can significantly enhance the security of sensitive information in various industries such as finance, healthcare, and retail. This study sheds light on the potential of Wi-Fi radio waves and RSSI values as a means of user authentication and the power of ML to identify patterns in wireless signals for security purposes. The proposed system holds great promise in revolutionizing the field of 2FA and user authentication, offering a new era of secure and seamless access to sensitive information.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems
Authors:
Denis Emelin,
Daniele Bonadiman,
Sawsan Alqahtani,
Yi Zhang,
Saab Mansour
Abstract:
Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-speci…
▽ More
Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-specific, and modifiable knowledge bases that are prominent in real-world task-oriented dialogue (TOD) systems. In this paper, we showcase the advantages of injecting domain-specific knowledge prior to fine-tuning on TOD tasks. To this end, we utilize light-weight adapters that can be easily integrated with PLMs and serve as a repository for facts learned from different KBs. To measure the efficacy of proposed knowledge injection methods, we introduce Knowledge Probing using Response Selection (KPRS) -- a probe designed specifically for TOD models. Experiments on KPRS and the response generation task show improvements of knowledge injection with adapters over strong baselines.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
ML for Location Prediction Using RSSI On WiFi 2.4 GHZ Frequency Band
Authors:
Ali Abdullah S. AlQahtani,
Nazim Choudhury
Abstract:
For decades, the determination of an objects location has been implemented utilizing different technologies. Despite GPS (Global Positioning System) provides a scalable efficient and cost effective location services however the satellite emitted signals cannot be exploited indoor to effectively determine the location. In contrast to GPS which is a cost effective localization technology for outdoor…
▽ More
For decades, the determination of an objects location has been implemented utilizing different technologies. Despite GPS (Global Positioning System) provides a scalable efficient and cost effective location services however the satellite emitted signals cannot be exploited indoor to effectively determine the location. In contrast to GPS which is a cost effective localization technology for outdoor locations several technologies have been studied for indoor localization. These include Wireless Fidelity (Wi-Fi) Bluetooth Low Energy (BLE) and Received Signal Strength Indicator (RSSI) etc. This paper presents an enhanced method of using RSSI as a mean to determine an objects location by applying some Machine Learning (ML) concepts. The binary classification is defined by considering the adjacency of the coordinates denoting objects locations. The proposed features were tested empirically via multiple classifiers that achieved a maximum of 96 percent accuracy.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Technical Report-IoT Devices Proximity Authentication In Ad Hoc Network Environment
Authors:
Ali Abdullah S. AlQahtani,
Hosam Alamleh,
Baker Al Smadi
Abstract:
Internet of Things (IoT) is a distributed communication technology system that offers the possibility for physical devices (e.g. vehicles home appliances sensors actuators etc.) known as Things to connect and exchange data more importantly without human interaction. Since IoT plays a significant role in our daily lives we must secure the IoT environment to work effectively. Among the various secur…
▽ More
Internet of Things (IoT) is a distributed communication technology system that offers the possibility for physical devices (e.g. vehicles home appliances sensors actuators etc.) known as Things to connect and exchange data more importantly without human interaction. Since IoT plays a significant role in our daily lives we must secure the IoT environment to work effectively. Among the various security requirements authentication to the IoT devices is essential as it is the first step in preventing any negative impact of possible attackers. Using the current IEEE 802.11 infrastructure this paper implements an IoT devices authentication scheme based on something that is in the IoT devices environment (i.e. ambient access points). Data from the broadcast messages (i.e. beacon frame characteristics) are utilized to implement the authentication factor that confirms proximity between two devices in an ad hoc IoT network.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
Preprint: Privacy-preserving IoT Data Sharing Scheme
Authors:
Ali Abdullah S. AlQahtani,
Hosam Alamleh,
Reem Alrawili
Abstract:
Data sharing can be granted using different factors one of which is something in a users or an IoT devices environment which is in this paper broadcast signals. Using broadcast signals to measure Received Signal Strength Indicator values and Machine Learning models this paper implements an IoT data sharing scheme based on something that is in an IoT devices environment. The proposed scheme is expe…
▽ More
Data sharing can be granted using different factors one of which is something in a users or an IoT devices environment which is in this paper broadcast signals. Using broadcast signals to measure Received Signal Strength Indicator values and Machine Learning models this paper implements an IoT data sharing scheme based on something that is in an IoT devices environment. The proposed scheme is experimentally tested using different ML models and shows 97.78 percent as its highest accuracy.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Semi-supervised Change Detection of Small Water Bodies Using RGB and Multispectral Images in Peruvian Rainforests
Authors:
Kangning Cui,
Seda Camalan,
Ruoning Li,
Victor P. Pauca,
Sarra Alqahtani,
Robert J. Plemmons,
Miles Silman,
Evan N. Dethier,
David Lutz,
Raymond H. Chan
Abstract:
Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of developing countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work…
▽ More
Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of developing countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work focuses on the recognition of ASGM activities in Peruvian Amazon rainforests. We tested several semi-supervised classifiers based on Support Vector Machines (SVMs) to detect the changes of water bodies from 2019 to 2021 in the Madre de Dios region, which is one of the global hotspots of ASGM activities. Experiments show that SVM-based models can achieve reasonable performance for both RGB (using Cohen's $κ$ 0.49) and 6-channel images (using Cohen's $κ$ 0.71) with very limited annotations. The efficacy of incorporating Lab color space for change detection is analyzed as well.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
BunchBFT: Across-Cluster Consensus Protocol
Authors:
Salem Alqahtani,
Murat Demirbas
Abstract:
In this paper, we present BunchBFT Byzantine fault-tolerant state-machine replication for high performance and scalability. At the heart of BunchBFT is a novel design called the cluster-based approach that divides the replicas into clusters of replicas. By combining this cluster-based approach with hierarchical communications across clusters, piggybacking techniques for sending messages across clu…
▽ More
In this paper, we present BunchBFT Byzantine fault-tolerant state-machine replication for high performance and scalability. At the heart of BunchBFT is a novel design called the cluster-based approach that divides the replicas into clusters of replicas. By combining this cluster-based approach with hierarchical communications across clusters, piggybacking techniques for sending messages across clusters, and decentralized leader election for each cluster, BunchBFT achieves high performance and scalability.
We also prove that BunchBFT satisfies the basic safety and liveness properties of Byzantine consensus. We implemented a prototype of BunchBFT in our PaxiBFT framework to show that the BunchBFT can improve the MirBFT's throughput by 10x, depending on the available bandwidth on wide-area links.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings
Authors:
Sawsan Alqahtani,
Garima Lalwani,
Yi Zhang,
Salvatore Romeo,
Saab Mansour
Abstract:
Recent studies have proposed different methods to improve multilingual word representations in contextualized settings including techniques that align between source and target embedding spaces. For contextualized embeddings, alignment becomes more complex as we additionally take context into consideration. In this work, we propose using Optimal Transport (OT) as an alignment objective during fine…
▽ More
Recent studies have proposed different methods to improve multilingual word representations in contextualized settings including techniques that align between source and target embedding spaces. For contextualized embeddings, alignment becomes more complex as we additionally take context into consideration. In this work, we propose using Optimal Transport (OT) as an alignment objective during fine-tuning to further improve multilingual contextualized representations for downstream cross-lingual transfer. This approach does not require word-alignment pairs prior to fine-tuning that may lead to sub-optimal matching and instead learns the word alignments within context in an unsupervised manner. It also allows different types of mappings due to soft matching between source and target sentences. We benchmark our proposed method on two tasks (XNLI and XQuAD) and achieve improvements over baselines as well as competitive results compared to similar recent works.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
BigBFT: A Multileader Byzantine Fault Tolerance Protocol for High Throughput
Authors:
Salem Alqahtani,
Murat Demirbas
Abstract:
This paper describes BigBFT, a multi-leader Byzantine fault tolerance protocol that achieves high throughput and scalable consensus in blockchain systems. BigBFT achieves this by (1) enabling every node to be a leader that can propose and order the blocks in parallel, (2) piggybacking votes within rounds, (3) pipelining blocks across rounds, and (4) using only two communication steps to order bloc…
▽ More
This paper describes BigBFT, a multi-leader Byzantine fault tolerance protocol that achieves high throughput and scalable consensus in blockchain systems. BigBFT achieves this by (1) enabling every node to be a leader that can propose and order the blocks in parallel, (2) piggybacking votes within rounds, (3) pipelining blocks across rounds, and (4) using only two communication steps to order blocks in the common case.
BigBFT has an amortized communication cost of $O(n)$ over $n$ requests. We evaluate BigBFT's performance both analytically, using back-of-the-envelope load formulas to construct a cost analysis, and also empirically by implementing it in our PaxiBFT framework. Our evaluation compares BigBFT with PBFT, Tendermint, Streamlet, and Hotstuff under various workloads using deployments of 4 to 20 nodes. Our results show that BigBFT outperforms PBFT, Tendermint, Streamlet, and Hotstuff protocols either in terms of latency (by up to $70\%$) or in terms of throughput (by up to $190\%$).
△ Less
Submitted 12 October, 2021; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments
Authors:
Ashley Peake,
Joe McCalmon,
Yixin Zhang,
Daniel Myers,
Sarra Alqahtani,
Paul Pauca
Abstract:
Performing autonomous exploration is essential for unmanned aerial vehicles (UAVs) operating in unknown environments. Often, these missions start with building a map for the environment via pure exploration and subsequently using (i.e. exploiting) the generated map for downstream navigation tasks. Accomplishing these navigation tasks in two separate steps is not always possible or even disadvantag…
▽ More
Performing autonomous exploration is essential for unmanned aerial vehicles (UAVs) operating in unknown environments. Often, these missions start with building a map for the environment via pure exploration and subsequently using (i.e. exploiting) the generated map for downstream navigation tasks. Accomplishing these navigation tasks in two separate steps is not always possible or even disadvantageous for UAVs deployed in outdoor and dynamically changing environments. Current exploration approaches either use a priori human-generated maps or use heuristics such as frontier-based exploration. Other approaches use learning but focus only on learning policies for specific tasks by either using sample inefficient random exploration or by making impractical assumptions about full map availability. In this paper, we develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs searching for areas of interest (AoIs) in unknown environments using Deep Reinforcement Learning (DRL). The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps. Then, a simple information gain function is repeatedly computed to determine the best target region to search during each iteration of the process. DDQN and A2C algorithms are extended with a stack of LSTM layers and trained to generate optimal policies for the exploration and exploitation, respectively. We tested our approach in 3 different tasks against 4 baselines. The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Bottlenecks in Blockchain Consensus Protocols
Authors:
Salem Alqahtani,
Murat Demirbas
Abstract:
Most of the Blockchain permissioned systems employ Byzantine fault-tolerance (BFT) consensus protocols to ensure that honest validators agree on the order for appending entries to their ledgers. In this paper, we study the performance and the scalability of prominent consensus protocols, namely PBFT, Tendermint, HotStuff, and Streamlet, both analytically via load formulas and practically via imple…
▽ More
Most of the Blockchain permissioned systems employ Byzantine fault-tolerance (BFT) consensus protocols to ensure that honest validators agree on the order for appending entries to their ledgers. In this paper, we study the performance and the scalability of prominent consensus protocols, namely PBFT, Tendermint, HotStuff, and Streamlet, both analytically via load formulas and practically via implementation and evaluation. Under identical conditions, we identify the bottlenecks of these consensus protocols and show that these protocols do not scale well as the number of validators increases. Our investigation points to the communication complexity as the culprit. Even when there is enough network bandwidth, the CPU cost of serialization and deserialization of the messages limits the throughput and increases the latency of the protocols. To alleviate the bottlenecks, the most useful techniques include reducing the communication complexity, rotating the hotspot of communications, and pipelining across consensus instances.
△ Less
Submitted 11 October, 2021; v1 submitted 6 March, 2021;
originally announced March 2021.
-
A Multitask Learning Approach for Diacritic Restoration
Authors:
Sawsan Alqahtani,
Ajay Mishra,
Mona Diab
Abstract:
In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings. Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word. This results in a more ambiguous text making computational processing on such text more difficult. Diacritic restoration is the task of restoring missing diacritics in the writt…
▽ More
In many languages like Arabic, diacritics are used to specify pronunciations as well as meanings. Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word. This results in a more ambiguous text making computational processing on such text more difficult. Diacritic restoration is the task of restoring missing diacritics in the written text. Most state-of-the-art diacritic restoration models are built on character level information which helps generalize the model to unseen data, but presumably lose useful information at the word level. Thus, to compensate for this loss, we investigate the use of multi-task learning to jointly optimize diacritic restoration with related NLP problems namely word segmentation, part-of-speech tagging, and syntactic diacritization. We use Arabic as a case study since it has sufficient data resources for tasks that we consider in our joint modeling. Our joint models significantly outperform the baselines and are comparable to the state-of-the-art models that are more complex relying on morphological analyzers and/or a lot more data (e.g. dialectal data).
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
Efficient Convolutional Neural Networks for Diacritic Restoration
Authors:
Sawsan Alqahtani,
Ajay Mishra,
Mona Diab
Abstract:
Diacritic restoration has gained importance with the growing need for machines to understand written texts. The task is typically modeled as a sequence labeling problem and currently Bidirectional Long Short Term Memory (BiLSTM) models provide state-of-the-art results. Recently, Bai et al. (2018) show the advantages of Temporal Convolutional Neural Networks (TCN) over Recurrent Neural Networks (RN…
▽ More
Diacritic restoration has gained importance with the growing need for machines to understand written texts. The task is typically modeled as a sequence labeling problem and currently Bidirectional Long Short Term Memory (BiLSTM) models provide state-of-the-art results. Recently, Bai et al. (2018) show the advantages of Temporal Convolutional Neural Networks (TCN) over Recurrent Neural Networks (RNN) for sequence modeling in terms of performance and computational resources. As diacritic restoration benefits from both previous as well as subsequent timesteps, we further apply and evaluate a variant of TCN, Acausal TCN (A-TCN), which incorporates context from both directions (previous and future) rather than strictly incorporating previous context as in the case of TCN. A-TCN yields significant improvement over TCN for diacritization in three different languages: Arabic, Yoruba, and Vietnamese. Furthermore, A-TCN and BiLSTM have comparable performance, making A-TCN an efficient alternative over BiLSTM since convolutions can be trained in parallel. A-TCN is significantly faster than BiLSTM at inference time (270%-334% improvement in the amount of text diacritized per minute).
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
Homograph Disambiguation Through Selective Diacritic Restoration
Authors:
Sawsan Alqahtani,
Hanan Aldarmaki,
Mona Diab
Abstract:
Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overal…
▽ More
Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications. In this paper, we propose approaches for automatically marking a subset of words for diacritic restoration, which leads to selective homograph disambiguation. Compared to full or no diacritic restoration, these approaches yield selectively-diacritized datasets that balance sparsity and lexical disambiguation. We evaluate the various selection strategies extrinsically on several downstream applications: neural machine translation, part-of-speech tagging, and semantic textual similarity. Our experiments on Arabic show promising results, where our devised strategies on selective diacritization lead to a more balanced and consistent performance in downstream applications.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Robustness-Driven Exploration with Probabilistic Metric Temporal Logic
Authors:
Xiaotian Liu,
Pengyi Shi,
Sarra Alqahtani,
Victor Paúl Pauca,
Miles Silman
Abstract:
The ability to perform autonomous exploration is essential for unmanned aerial vehicles (UAV) operating in unstructured or unknown environments where it is hard or even impossible to describe the environment beforehand. However, algorithms for autonomous exploration often focus on optimizing time and coverage in a greedy fashion. That type of exploration can collect irrelevant data and wastes time…
▽ More
The ability to perform autonomous exploration is essential for unmanned aerial vehicles (UAV) operating in unstructured or unknown environments where it is hard or even impossible to describe the environment beforehand. However, algorithms for autonomous exploration often focus on optimizing time and coverage in a greedy fashion. That type of exploration can collect irrelevant data and wastes time navigating areas with no important information. In this paper, we propose a method for exploiting the discovered knowledge about the environment while exploring it by relying on a theory of robustness based on Probabilistic Metric Temporal Logic (P-MTL) as applied to offline verification and online control of hybrid systems. By maximizing the satisfaction of the predefined P-MTL specifications of the exploration problem, the robustness values guide the UAV towards areas with more interesting information to gain. We use Markov Chain Monte Carlo to solve the P-MTL constraints. We demonstrate the effectiveness of the proposed approach by simulating autonomous exploration over Amazonian rainforest where our approach is used to detect areas occupied by illegal Artisanal Small-scale Gold Mining (ASGM) activities. The results show that our approach outperform a greedy exploration approach (Autonomous Exploration Planner) by 38% in terms of ASGM coverage.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Performance Analysis and Comparison of Distributed Machine Learning Systems
Authors:
Salem Alqahtani,
Murat Demirbas
Abstract:
Deep learning has permeated through many aspects of computing/processing systems in recent years. While distributed training architectures/frameworks are adopted for training large deep learning models quickly, there has not been a systematic study of the communication bottlenecks of these architectures and their effects on the computation cycle time and scalability. In order to analyze this probl…
▽ More
Deep learning has permeated through many aspects of computing/processing systems in recent years. While distributed training architectures/frameworks are adopted for training large deep learning models quickly, there has not been a systematic study of the communication bottlenecks of these architectures and their effects on the computation cycle time and scalability. In order to analyze this problem for synchronous Stochastic Gradient Descent (SGD) training of deep learning models, we developed a performance model of computation time and communication latency under three different system architectures: Parameter Server (PS), peer-to-peer (P2P), and Ring allreduce (RA). To complement and corroborate our analytical models with quantitative results, we evaluated the computation and communication performance of these system architectures of the systems via experiments performed with Tensorflow and Horovod frameworks. We found that the system architecture has a very significant effect on the performance of training. RA-based systems achieve scalable performance as they successfully decouple network usage from the number of workers in the system. In contrast, 1PS systems suffer from low performance due to network congestion at the parameter server side. While P2P systems fare better than 1PS systems, they still suffer from significant network bottleneck. Finally, RA systems also excel by virtue of overlapping computation time and communication time, which PS and P2P architectures fail to achieve.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
Social Credibility Incorporating Semantic Analysis and Machine Learning: A Survey of the State-of-the-Art and Future Research Directions
Authors:
Bilal Abu-Salih,
Bushra Bremie,
Pornpit Wongthongtham,
Kevin Duan,
Tomayess Issa,
Kit Yan Chan,
Mohammad Alhabashneh,
Teshreen Albtoush,
Sulaiman Alqahtani,
Abdullah Alqahtani,
Muteeb Alahmari,
Naser Alshareef,
Abdulaziz Albahlal
Abstract:
The wealth of Social Big Data (SBD) represents a unique opportunity for organisations to obtain the excessive use of such data abundance to increase their revenues. Hence, there is an imperative need to capture, load, store, process, analyse, transform, interpret, and visualise such manifold social datasets to develop meaningful insights that are specific to an application domain. This paper lays…
▽ More
The wealth of Social Big Data (SBD) represents a unique opportunity for organisations to obtain the excessive use of such data abundance to increase their revenues. Hence, there is an imperative need to capture, load, store, process, analyse, transform, interpret, and visualise such manifold social datasets to develop meaningful insights that are specific to an application domain. This paper lays the theoretical background by introducing the state-of-the-art literature review of the research topic. This is associated with a critical evaluation of the current approaches, and fortified with certain recommendations indicated to bridge the research gap.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.