Modeling Human Error Factors with Security Incidents in Industrial Control Systems: A Bayesian Belief Network Approach

Pushparaj Bhosale, Institute of Computer Engineering, TU Wien, Austria, [email protected]

Wolfgang Kastner, Institute of Computer Engineering, TU Wien, Austria, [email protected]

Thilo Sauter, Institute of Computer Technology, TU Wien, Austria, [email protected]

DOI: https://doi.org/10.1145/3664476.3670875
ARES 2024: The 19th International Conference on Availability, Reliability and Security, Vienna, Austria, July 2024

Industrial Control Systems (ICSs) are critical in automating and controlling industrial processes. Human errors within ICSs can significantly impact the system's underlying processes and users’ safety. Thus, it is essential to understand the factors contributing to human errors and implement targeted interventions. Various factors that influence and mitigate human errors must be explored, including organizational, supervisory, personal, and technical factors. In parallel, the impact of a security incident also needs consideration. The paper presents a Bayesian Belief Network (BBN) model developed to model these factors comprehensively and demonstrate their impact, especially in the context of security incidents. Probability distributions are employed with practical assumptions to overcome data limitations, emphasizing the model's utility in risk assessment. The model's complexity is addressed using multiple interconnected sub-models, enhancing accuracy and avoiding unnecessary intricacies. Despite challenges in identifying all relevant factors, a sincere effort is made to incorporate diverse research findings. This paper highlights the essential role of BBN models in understanding and mitigating human errors, contributing to the resilience of ICS processes. The use of BBN and probabilistic distributions enables quantitative and probabilistic analysis of the impact of human errors, aiding in developing more robust risk management strategies to improve system resilience in ICSs.

CCS Concepts: • Security and privacy → Human and societal aspects of security and privacy; • Mathematics of computing → Probability and statistics; • Computing methodologies → Modeling and simulation;

Keywords: Human error, Industrial Control System, Bayesian Belief Networks, Safety, security

ACM Reference Format:
Pushparaj Bhosale, Wolfgang Kastner, and Thilo Sauter. 2024. Modeling Human Error Factors with Security Incidents in Industrial Control Systems: A Bayesian Belief Network Approach. In The 19th International Conference on Availability, Reliability and Security (ARES 2024), July 30--August 02, 2024, Vienna, Austria. ACM, New York, NY, USA 9 Pages. https://doi.org/10.1145/3664476.3670875

1 INTRODUCTION

Industrial Control Systems (ICSs) have been around for several decades and have played a crucial role in automating and controlling industrial processes. As technology evolves, ICSs have become more sophisticated and robust, enabling even greater efficiency and control over complex industrial operations in various industries, including manufacturing, energy, transportation, and others [21].

In an ICS environment, humans fulfill critical roles and responsibilities to ensure efficient and safe operation of industrial processes [11]. For instance, operators are responsible for monitoring and controlling the processes, operating control panels, and making necessary adjustments, while engineers design, install, and maintain the control systems, and maintenance personnel conduct inspections, preventive maintenance, and repairs.

Human actions are significant for the proper functioning of the ICSs. However, performing the actions could lead to errors. These errors can significantly impact the ICS's safety and the system's proper functioning. These errors can lead to accidents, equipment failures, or endangering human lives, resulting in downtime, production losses, and financial impacts [19]. Moreover, errors can have long-lasting consequences for a company's reputation and legal standing [24].

Human errors can occur due to factors like lack of training, inadequate knowledge, fatigue, distractions, or miscommunication [11]. These factors are categorized as organizational factors (OF), supervisory factors (SF), individual factors (IF), and technical factors (TF). Sometimes, a security attack can also lead to a human error. For instance, attackers can manipulate data by altering sensor readings (i.e. TF). If operators rely on this manipulated data for decision-making, they may inadvertently introduce errors [23]. In this paper, we contribute to representing the relationship between security incidents and human error. We consider the impact only on the technical factors, as they would be impacted first.

Bayesian Belief Networks (BBNs) provide a valuable framework for addressing human error by identifying influential factors, assessing probabilities and dependencies, analyzing causal relationships, supporting decision-making, and designing interventions [3, 13, 24]. BBNs enable the modeling of the influencing factors contributing to human error, allowing for a comprehensive understanding of their interdependencies. BBN also supports decision-making processes by providing a quantitative basis for evaluating interventions and exploring different scenarios.

Probabilistic distributions help in understanding the uncertainty in human performance and influencing factors. These distributions provide a means to quantify the likelihood of different error scenarios. BBNs, enriched with probabilistic distributions, allow for modeling complex relationships between variables. It helps in scenarios where only little data is available [10].

In this paper, we model factors that lead to human error. We introduce technical factors that contribute to the human error and security incident impact on the technical factors. Probability distributions are used with certain practical assumptions to overcome the lack of data. The fundamentals of the model implementation are shown, and their importance is highlighted. A use case of a risk propagation BBN model, consisting of component failures and human error consequences, is an example of the model's use in risk assessment.

The remainder of the paper is structured as follows: Section 2 provides an overview of the related research. The identified factors, their definition and importance, and their relations are highlighted in Section 3. The use and the choice of a probability distribution, along with its reasons, are mentioned in Section 4. In Section 5, a BBN model is presented, demonstrating the interconnections among factors that contribute to the possibility of human errors. The benefits of combining human assessment with risk assessment are provided using a use case and some scenarios in Section 5.4. Finally, Section 6 concludes the paper and discusses potential avenues for future research, highlighting opportunities to enhance the study further.

2 RELATED RESEARCH

In Industrial Control Systems (ICSs), human factors have been identified as significant contributors to accidents[19]. Various types of human errors are mentioned, including action errors, checking errors, retrieval errors, transmission errors, diagnostic errors, and decision errors [11]. To address these issues, organizations should focus on identifying the underlying causes of errors. They must take steps to prevent them by implementing new procedures, providing additional training, or redesigning systems or equipment to reduce the risk of errors [23]. The focus on listing error types lacks a structured representation in the form of model.

Human reliability analysis (HRA) has been a topic of interest since the beginning. One significant part of the research focuses on the impact of human error and its consequences on operations, such as quality, productivity, safety, and overall loss. In contrast, another part focuses on the influencing factors for the error [20] [25]. Factors that influence the performance of humans and errors are provided along with a qualitative framework in the literature. The names of performance factors are derived from methods like the Technique for Human Error-Rate Prediction (THERP), Human Error Assessment and Reduction Technique (HEART), A Technique for Human Event Analysis (ATHENA), and Standardized Plant Analysis Risk Human Reliability Analysis (SPAR-H) [14]. The qualitative approach and broad identification of performance-shaping factors lack quantitative rigor and specific guidelines for practical application in various industries.

BBNs are quantitative probabilistic methods that can handle uncertainty. It has been used in ICS for analysis of fault detection and diagnosis and accident analysis [4, 18, 22]. They have been used in human error assessment for modeling organizational elements, exploring connections between factors affecting failures, extending established HRA methodologies with BBNs, evaluating dependencies among human error events, and assessing situational awareness [15].

Factors such as the industry, level of automation, training, and safety culture are claimed to impact human error [24]. A comprehensive methodology is presented for utilizing BBNs to enhance the development of risk models for complex socio-technical systems such as Human and Organizational Factors (HOF) in the analysis. Furthermore, a Success Likelihood Index Model (SLIM) is enhanced using BBNs to determine an estimation of human error probability [13]. The models using BBN have not incorporated security-related incidents for a comprehensive HRA in an interconnected scenario.

In [3], human error probability is again estimated for safety analysis in a nuclear power plant. Here, the focus was on enhancing human cognitive reliability with BBN. In [17], a machine learning-based approach is used to determine the factors leading to human error. A classification system that emphasizes the connection between human errors and the various factors that influence them, such as cognitive functions, organizational aspects, and technological factors, is developed using data sets. Most of the HRA depend on expert judgment as there is a need for more data availability [20].

To overcome the problem with data availability, [16] proposed to use the existing data provided by the Multi-Attribute Technological Accidents Dataset (MATA-D). The factors causing human errors and the result of errors are estimated using the dataset. BBN is used to represent the relations between the factors. The model represents possible human error (observation, interpretation, planning, or execution). A proposal for using a similar dataset with artificial intelligence (AI) tools and methodology is presented in [12].

An approach to use simulated data in the form of probabilistic distributions for human reliability analysis using SPAR-H is shown in [10]. The study considered two discrete distributions for each of the eight SPAR-H performance shaping factors (PSFs): one assuming equal likelihood (uniform discrete distribution) and the other based on frequencies derived from a subjective assessment of the Human Error Reliability Analysis (HERA) database.

From this section, it is abundantly clear that HRA has been a topic of interest for many years. However, the impact of security incidents was never investigated. In this paper, we focus on identifying the factors that influence human error and create a model using BBN. We also take into consideration the security impact of one or many factors and its influence on human error. To demonstrate the model, we use continuous distributions. The importance and the choice of probabilistic distribution are mentioned in Section 4.

3 FACTORS INFLUENCING HUMAN ERROR PROBABILITY

Determining responsibility for human error is complex and contingent on specific circumstances. Generally, responsibility is shared among the individual making the mistake, their supervisor, and the organization as a whole. As outlined in [11], organizational, supervisory, and individual factors influence human error. The article misses the technical factors in ICS, which is necessary as these factors directly impact human error. The technical factor can be measured for a given time frame based on the operator's or administrator's observations and procedures. Establishing a connection between security impact and human error requires the inclusion of an influential technical factor.

Organizational factors (OF), such as workload, staffing, and safety culture, directly impact individual and supervisory performance. Effective communication and coordination within the organization are crucial for preventing errors caused by miscommunication. A strong safety culture prioritizes security and encourages reporting, minimizing human errors, and enhancing incident response [11, 16]. Support to supervisors is crucial, highlighted in the figure, particularly during a security incident.

Supervisory factors (SF), encompassing training, monitoring, and decision-making support, bridge the gap between organizational and individual levels. Adequate supervision mitigates human error, while inadequate supervision can contribute to unnoticed errors. Insufficient training may result in decision-making and communication errors, affecting response effectiveness [11, 19, 24]. Supervisors are critical in monitoring deviations from established procedures during a security incident.

Individual factors (IF), such as knowledge, skills, motivation, and work habits, directly impact individual performance and can be influenced by organizational, supervisory, and technical factors. Insufficient experience or training may contribute to errors and delays in the response. Cognitive factors, including attention and decision-making abilities, are crucial during a security incident. High fatigue and stress levels can impair mental functioning, leading to errors. The motivation and engagement of operators are essential for their diligent performance and adherence to security protocols during the incident response process.

Technical factors (TF) in the context of human error encompass various elements related to the design, functionality, and performance of tools, systems, or technologies. These factors significantly influence the likelihood of human errors and contribute to the overall reliability of a system. They encompass equipment failures, software faults, and access problems.

In the real-time scenario, a security breach -if undetected- can significantly impact these technical factors, leading to compromised equipment reliability, software vulnerabilities, and unauthorized access. Such breaches may introduce errors and disruptions, influencing the overall effectiveness of incident response efforts. Addressing technical factors is essential for mitigating the potential impact of security breaches on human error in complex systems.

After a security incident is investigated, specific changes address all the factors. It includes updating security policies and procedures, implementing enhanced security awareness training, upgrading security infrastructure and controls, refining incident response processes, and increasing monitoring and oversight [2, 21]. These changes aim to prevent future incidents, improve security measures, and foster a culture of security awareness and accountability within the organization; for the current model, only the impact of security incidents on the technical factors is considered.

Table 1: Influencing factors of Human error used in model

ID	Name	Definition
HE	Human Error	Probability of the occurrence of human error
OF	Organisational Factors	confirms to specific aspects of the organisation that can increase or decrease HE and is an impact of organization's leadership and management.
SF	Supervisory Factors	confirms to how supervisors, managers and team leaders oversee and interact with their subordinates and can have a significant impact on employee performance, job satisfaction, and overall workplace outcomes.
IF	Individual Factors	confirms to the personal characteristics, traits, and attributes of an individual that can influence their actions, decisions, and performance in various situations.
TF	Technical Factors	confirms to the fundamental design, tools and technologies for human interaction.
SI2	Security incident impact	confirms to an impact on system due to security breach such as spoofing, and equipment failure
OF_01	Communication and Coordination	confirms to harmonizing the exchange of information, and involves strategic planning. Effective communication supports better coordination, thus reducing errors.
OF_02	Hiring people with Competency	confirms to hiring or assigning right people for the task. Inadequate experience, or insufficient education can result in employees not having the necessary skills and knowledge to perform their tasks accurately.
OF_03	Policy and Procedure	confirms to well-defined, up-to-date, and easy-to-follow policies and procedures by management to avoid confusion and errors.
OF_04	Set up safety requirements	involves specific rules, guidelines, standards, and measures within an organization or a particular setting to ensure the safety and well-being of people, process and environment.
OF_05	Resource allocation	Inadequate resources, including tools, equipment, or personnel, can strain employees and make them more prone to errors.
SF_01	Control and monitoring	Supervisors are responsible to control and monitor the individuals and processes and avoid conflicts.
SF_02	Decision-Making Support	confirms to the assistance provided to individuals or organizations by the supervisor to help them make informed and effective decisions.
SF_03	Workload Management	Properly distributing and managing workloads among employees is crucial for supervisors along with addressing workload concerns, and preventing overload.
SF_04	Procedural Compliance	confirms to the practice of adhering to established procedures, protocols, and guidelines within an organization or system. They ensure that employees follow safety procedures and practices set up by the organisation.
SF_05	Performance Appraisals and Motivation	Conducting fair and constructive performance evaluations, while motivating and engaging employees is crucial for productivity
SF_06	Training and development	is a structured and systematic process of imparting specific knowledge, skills, and information to individuals.
IF_01	Fatigue and Stress	confirms to the inefficiency of humans performing the task due to mental or physical factors.
IF_02	Experience, knowledge and skill	Previous experiences can shape an individual's approach to new challenges and Skills gained with training can help in making decisions.
IF_03	Motivation and Engagement	confirms to the attitude and dedication of an individual to perform the task.
IF_04	Work Habits and Behavior	confirms to the daily routine of the individual and his approach towards the job.
IF_05	Task Complexity	Excessive workload, tight deadlines, or complex tasks can lead to errors.
IF_06	Fulfillment of safety procedures	confirms to risk free completion of a particular task.
IF_07	Cognitive factors	Cognitive factors, including problem-solving skills, decision-making abilities, memory, attention, and reasoning, play a role in an individual's performance.
TF_01	Equipment failure	confirms to the inefficiency of the equipment to perform the intended function.
TF_02	Human interface unresponsive	confirms to the ineffectiveness of the human machine interface to perform the commands due to software freeze, system lag, etc.
TF_03	Access blocked	confirms to a restricted access to the system, even though everything is fine.
TF_04	Unavailable personnel	confirms to individuals who are not currently present for a particular purpose or task
TF_05	Software fault	confirms to a problem in program code that leads to unexpected failures and vulnerabilities.

Human errors can be classified and organized chronologically based on the cognitive processes involved in human behavior that can contribute to or mitigate errors depending on how they are executed. There are four categories: Perception error (the process of recognizing and interpreting sensory stimuli from the environment), Interpretation error (understanding the context of the information), Decision-making error (the process of selecting a course of action), and Execution error (implementation of decisions) [19]. For instance, let's consider an example of an operator sitting in the control room of an automated system. The operator looks at the displayed alert, but they have yet to fully notice or perceive the information sent by the computerized system (Perception error). The operator notices the alert and has perceived the information but fails to properly understand the status of the process (Interpretation error). The operator understood the status and decided to act, but the decision was wrong (Decision-making error). Finally, the operator has taken the right decision but executes it wrongly (Execution error). The errors lead to lack of necessary action taken.

4 PROBABILISTIC DISTRIBUTION

Probabilistic distributions are mathematical representations of the likelihood or probability of different outcomes or values occurring in a given scenario. They are fundamental tools in statistics and probability theory used to model uncertainty and variability in data and make predictions or decisions under uncertainty [9]. Probabilistic distribution can be used for making decisions, modeling uncertainty, Bayesian inferences, risk assessment, sensitivity analysis, and Monte Carlo simulations [1].

The elements of probability distribution include Random Variable (Represents the uncertain outcomes), Sample space (values the random variable can take), Probability function (provides probability of the random variable occurring), Cumulative function (provides the probability of the random variable is less than or equal to a specific value x), and Parameters (determines the shape and properties of the distribution). The probability value from the distributions is determined using a random variable.

There are various probabilistic distributions based on the behavior of random variables, namely: Continuous (consists of continuous random variables and is described by probability density functions (PDF)), Discrete (consists of discrete random variables and is described by probability mass functions (PMF)), Empirical (derived from observed data and represented using cumulative distribution functions (CDF)), and Customised (user-specific and made up of a combination of distribution types or custom mathematical modeling) [1]. To demonstrate the model, we have chosen continuous distribution, as they capture the variability and uncertainty inherent in the factors.

Some known continuous probability distributions are: Normal (used for modeling continuous data with a bell-shaped curve), Uniform (represents a constant probability over a specified interval), Exponential (models the time between events in a Poisson process, often used in reliability analysis), Log Normal Distribution (results from taking the natural logarithm of a normally distributed variable, used for data that is positively skewed), and Beta (often used for modeling proportions, probabilities, and success rates) [1]. Assignment of a distribution relies on data or expert opinion, but one can assume the probabilistic distribution under certain conditions [1, 8, 10, 26].

Out of the distributions presented, the beta distribution provides a wide range of shapes to represent probabilities. Beta distribution¹ offers a limited range (0, 1) suitable for modeling probabilities; it offers flexibility as its parameters adjust from Uniform to highly skewed by adjusting the shape parameters α and β and normalization factor B(α, β). It provides bi-modality, prior distributions for Bayesian statistics. It can be used for success or failure modeling where the output is binary [1]. The probability density function of a beta distribution is:

\begin{equation} \begin{aligned} f(x|\alpha,\beta) &= \frac{1}{B(\alpha,\beta)} x^{\alpha -1} (1-x)^{\beta -1}, \\ \text{where } B(\alpha,\beta) &= \frac{\Gamma (\alpha)\Gamma (\beta)}{\Gamma (\alpha +\beta)} \end{aligned} \end{equation}
(1)

In our model, the impact of organizational, supervisory, technical, and individual factors on human error can be mitigated when the performance of these factors is favorable. It also represents the underlying factors that have an indirect impact on human error. It is witnessed that if organizational support, supervisory effectiveness, technical reliability, and individual competence are high (reflected by larger values of α relative to β in the Beta distribution), the probability of human error tends to decrease. To model the relationship between security incidents and human errors, Choosing α > β would skew the distribution towards lower human error probabilities. It represents an assumption that the organization emphasizes low human errors. This aligns with the organizational intention to optimize performance by fostering a supportive culture, effective supervision, reliable technology, and well-trained individuals. Moreover, our model underscores the practicality of emphasizing these factors to enhance overall operational reliability and reduce the likelihood of human errors.

5 BAYESIAN NETWORK FOR HUMAN ERROR ESTIMATION

A BBN is a graphical model that utilizes probabilistic reasoning. It comprises a Directed Acyclic Graph (DAG) and Conditional Probabilities (CPs). The DAG represents the causal relationships between nodes using directed edges (DEs). Each node is a random variable with unique identification and name attributes. The DEs show the relation between the variables (parent → child node). CPs are the actual causal relationships. By considering the CPs of all nodes in the network, the joint probability distribution of the entire system can be calculated.

The CPs connecting the nodes representing the organizational, supervisory, and individual factors require definition by an expert or can be derived through data analysis. Mathematically, the joint probability distribution is expressed as a product of the conditional probabilities of each node given its parents, as shown in Equation 2. The equation calculates the probability of all variables, denoted as X₁, X₂, …, X_n, occurring together.

\begin{equation} P(X_1, X_2,..., X_n)=\prod _{i=1}^nP(X_i | parents(X_i)) \end{equation}
(2)

5.1 BBN Model

Figure 1: BBN of factors affecting [A] Human error, [B] Organisational errors, [C] Supervisory errors, [D] Individual errors, and [E] Technical errors

In Fig. 1, the complete BBN network for human error is divided into five parts ([A] to [E]). It represents the network of nodes that lead to human error. It also shows the interdependency between the factors. In part [A], the primary factors identified as the cause of or mitigating human error are shown. It shows the relationship between organizational, supervisory, individual, and technical factors. In part [B], the organizational error is influenced by five independent factors. The errors in these factors originate from the organization in the form of process, communication, or a lack of resources. Organizational errors can have a widespread impact on various aspects of the system, including supervision and management errors. It is depicted in part [C], where the OF node relates to the SF node (OF → SF). The independent factors shown in part [C] also affect the supervisory errors. It could involve mistakes supervisors or managers make in overseeing operations, making decisions, or communicating instructions. Supervisory errors can influence the performance of individuals or teams under their supervision.

While human errors [A] encompass errors made by individuals, in part [D], factors leading to individual-level mistakes and their causes are mentioned. It could include factors such as fatigue, distraction, lack of training, or personal factors affecting an individual's performance. The technical errors are related to technical or mechanical aspects of a system. They could result from faults in equipment, software bugs, or design flaws. These factors can lead to problems during the process's completion. However, as individuals are part of the process, the availability of individuals is also primarily a technical factor. The availability can be affected mainly by individual aspects. Hence, a DE between IF and TF_04 is shown. The effect of the security incident is also shown in the technical factors in part [E].

5.2 Fundamentals of model use

The model primarily requires the nodes, the probability of the parent nodes, and the relation between the parent and child nodes (denoted by CPs). Table 1 takes care of the nodes’ requirement, while the beta distribution gives the probabilities of the parent nodes as shown in Fig. 2. It only represents OF's parent nodes. The graph is then divided into two probabilities at 0.5. The probability value from 0 to 0.5 is the error probability of the factor, while the other half is the reliability value of that factor. A probability distribution can provide a range of values for more resolution.

Figure 2: Beta distribution of Organisational Factors’ parent nodes

The provided algorithm 1 is designed to generate CP distribution based on conditions involving two factors, referred to as factor1 and factor2. These factors are binary, taking values 0 (error) or 1 (no error), and the algorithm computes the probability of error (Err) for different combinations of these factors.

The outer loop iterates over the possible values of factor 1 (0 and 1), while the inner loop iterates over the possible values of factor 2 (0 and 1) for each value of factor 1. Within the loops, a series of conditional statements determines the values of the variables Rel (reliability of the factor) and Err (error probability of factor) based on the specific combinations of factor1 and factor2 as represented in algorithm 1 . The algorithm can be further generalized to generate CPs for more factors.

Algorithm 1

The algorithm's output is the Err and Rel values of the factor necessary for the next child node. The algorithm systematically defines the probability of error based on the interaction of factors (can be many). The output represents the probability of error for different scenarios, considering the reliability values associated with the individual factors and their joint reliability. To ensure the accuracy and relevance of our output values, we have chosen to construct distinct BBN models for each primary factor illustrated in Fig. 1. These separate models help to reduce the complexity and encompass the OF, SF, IF, TF factors, and the ultimate Human Error (HE). The rationale behind this approach is to enhance the precision of our predictions by individually addressing the unique characteristics and dependencies within each factor. Importantly, this modular design allows the output from one model, such as OF, to serve as input for another model, like SF. This interconnected structure enables a more nuanced understanding of how different factors influence each other and contribute to human errors in the system. By adopting this strategy, we aim to capture the intricate relationships between factors and enhance our Bayesian network models’ overall robustness and accuracy.

5.3 Result

To implement the idea of human error, a common platform is necessary to demonstrate the probability distribution and BBN. Python provides powerful tools for working with probability distributions, including beta distribution. The scipy.stats module offers a comprehensive implementation of the beta distribution through the beta class. The parameters alpha and beta define the shape of the beta distribution. The Fig. 3 shows the beta function (Beta(6, 2)) on the left corner of each distribution. For BBN, the pgmpy library simplifies BBN implementation, enabling easy network structure definition, CPD fitting, and efficient inference. A Bayesian Network is created with defined edges, and CPs are fitted to sample data using the ParameterEstimator. The VariableElimination method allows for efficient probabilistic inference.

The implementation of the Python program is done using the following steps:

Import necessary libraries.
Define nodes and edges of the Bayesian network.
Create beta distribution for the specified alpha and beta parameters.
Calculate the probability of error and reliability values from the distribution.
Create separate Bayesian network models for factors (OF, IF, SF, TF, and HE).
Define CPs for the network. It might be necessary to define the output of one model as an input for another model.
Add CPs to the model, including the output of another model.
Perform Variable Elimination inference on the Bayesian Network to calculate the probability.
Visualize the graph and results if needed

Fig. 3 provides the visual result of the probability of error (Error = True) and reliability (Error = False) for organizational, supervisory, individual, technical factors, and human error. The results in Fig. 3 are obtained based on the beta distribution for independent factors (E.g., OF_01 to OF_05, SF_01 to SF_o6, IF_01 to IF_07 or SI2) and obtained results of the dependent factors (E.g., OF, SF, IF, TF, or HE).

The probability of organizational error (OEP) is calculated first as its parent nodes are independent and only rely on the beta distribution of the nodes. Based on the selected beta distribution of OF_01 to OF_05 and conditional probability algorithm 1, the OEP value for error is 0.0916. The calculated OEP is one of the inputs to calculate probability of supervisory error (SEP). The SEP is 0.0662 for the beta distribution of SF_01 to SF_o6 and OEP. Similarly, the probability of individual error (IEP) is 0.2046 for beta distribution of IF_01 to IF_07 and SEP. The probability of technical error (TEP) is 0.19 for beta distribution of SI2, and TF_01 to TF_05 and IEP. The probability of human error (HEP) is calculated using the values of OEP, SEP, IEP, and TEP.

Figure 3: Resulting probability of Organisational, Supervisory, Individual, Technical errors, and Human errors

5.4 Importance and Limitation of Model

The BBN model, designed to determine human error, holds substantial importance across diverse industries, particularly in safety, reliability, and risk assessment. Human behavior is inherently uncertain and influenced by a multitude of factors. The BBN model helps solve this problem by providing nodes and relations to various factors and providing a propagation flow for human error reasons. One of the critical advantages of BBNs is their dynamic updating capability. As new data becomes available or the understanding of the system evolves, BBNs can be refined to enhance accuracy over time. BBN for HRA helps handle incomplete data by integrating expert judgment, considering dependencies among factors using joint probabilities, and providing easy-to-understand acyclic graphs for interdisciplinary communication [16]. The ability to update probabilities with new information, identify influential factors in human errors, and conduct "what if" scenario analyses makes Bayesian networks a versatile and powerful tool in HRA [3]. BBNs contribute significantly to risk assessment and mitigation strategies [5, 6]. By quantifying the probability of human error in a system, these models assist in identifying factors with higher probabilities of contributing to errors.

To demonstrate the use of a human error estimator model, we show an example of a production system's risk assessment performed using BBN published in [6]. Here, we show a part called the stacking module's risk propagation and some safety incidents caused by a human error, as shown in Fig. 4. The stacking module is responsible for ejecting the workpiece onto the conveyor belt to begin the process. It utilizes solenoid-controlled cylinders and proximity sensors to push out the workpieces while a control valve regulates the cylinder's speed. The workpieces are stored in a vertical magazine, with an IO link sensor measuring the stack's height. Light barriers and level sensors indicate the magazine's emptiness. The stacking module transfers the workpiece onto a conveyor belt, controlled by a DC motor and optical sensors for position detection. Automation is handled by a programmable logic controller (PLC) (Simatic S7), with an HMI (Simatic TP 700) displaying module statuses.

Figure 4: BBN based risk propagation of Stacking module in distribution system

In part A of Fig. 4, the relation between the hazards, components, and the users (operator or maintenance personnel) is highlighted. Part B deals with the user's work, which deals with their safety. A human error in the part A can lead to process-related issues, while in part B can lead to injury or death of the user. The risk propagation can be divided into the normal automation function of the system as programmed in the PLC and the human function to assist in completing the task. The probability of failure of automation components (PLC, sensors, and actuators) depends on factors like Mean Time To Failure (MTTF) or failure rate. The module's operation also depends on the users (operator and maintenance personnel); failure to perform necessary actions can lead to an incomplete process or danger to the users. The rate at which the user causes an error can be determined by the model presented in Fig. 1. For instance, if the operators lack adequate training in operating the system and understanding security protocols, it could increase the chances of errors and security vulnerabilities. In a safety incident, the operator can only respond effectively if there is proper training in safety procedures or suitable communication protocols.

When employees perceive a lack of fair treatment, limited growth opportunities, or feel undervalued (bad management), it can create discontentment and resentment. This hostile work environment may motivate individuals to engage in malicious activities, including insider attacks [7]. It is important to note that a security incident can increase the likelihood of human error within an organization. It can also impact the IF. For instance, when employees face the aftermath of a security breach or incident, they may experience heightened stress, anxiety, and distraction. However, this aspect is not covered in the current model. The model only addresses the security breach leading to technical errors, focusing on how breaches directly cause technical failures or malfunctions. It does not account for the psychological impact on individuals. The model is designed to deal with the real-time impact of security breaches on technical errors. rather than the broader organizational and human factors that can be indirectly influenced by security attacks. The broader impact of security attacks on IF will be covered in future enhancements.

6 CONCLUSION AND FUTURE WORK

Human error and factors leading to it have been a concern as it is the most dominant issue for safe and secure operation in ICS setting [19, 24]. As these systems have grown in sophistication, human involvement in ICS environments has become increasingly crucial for ensuring industrial processes’ safe and efficient operation. Addressing human error can be complex and challenging. This paper explores the various factors contributing to errors, including lack of training, inadequate knowledge, fatigue, distractions, and procedural deviations. Security attacks compound the risk by potentially manipulating data and introducing errors into decision-making processes.

A model is developed using BBN, and the paper demonstrates the application of BBNs to model factors leading to human errors and highlights the impact of security incidents. The model is demonstrated using probability distributions with practical assumptions to overcome the constraint of lack of data. The paper underscores the fundamental importance of implementing such models to gain insights into risk propagation, allowing for a comprehensive understanding of component failures and human error consequences in risk assessment. The model for human error assessment can be used jointly with safety, security, or an integrated risk assessment (as pointed out in examples in Section 5.4) for enhancing risk assessment practices and ultimately contributing to the robustness and resilience of industrial processes.

When using BBN for modeling human error, one needs to consider that BBNs are inherently complex, and their complexity can increase with the number of nodes. To avoid complexity and enhance accuracy, we have used multiple models, where an output of one can be used as an input for another model. The challenge and limitation of comprehensively identifying all pertinent factors can potentially result in the omission of some contributors. However, a sincere effort has been made to incorporate as many relevant factors from different relevant research as possible in our BBN models. Assumptions about probability distributions are fundamental, but the challenge lies in accurately specifying these distributions, particularly in data-limited scenarios. Another potential limitation of the model is its reliance on expert knowledge for the relationship knowledge between the factors. The future steps involve continuous improvement of the BBN model to address any irregularities in the influencing factors and relations by leveraging advanced data analysis techniques and combining risk assessment with human error assessment. The paper can be further enhanced by introducing the types of human error, such as decision, perception, and execution errors. It helps to offer more granularity to the error that occurred.

ACKNOWLEDGMENTS

This paper is supported by ’#SafeSecLab’, a joint venture of TU Wien and TÜV AUSTRIA for research in Safety and Security in Industry.

REFERENCES

2012. Understanding and Choosing the Right Probability Distributions. John Wiley & Sons, Ltd, 899–917. https://doi.org/10.1002/9781119197096.app03
2013. IEC 62443-3-3:2013 Industrial communication networks - Network and system security - Part 3-3: System security requirements and security levels.
Shokoufeh Abrishami, Nima Khakzad, Seyed Mahmoud Hosseini, and Pieter van Gelder. 2020. BN-SLIM: A Bayesian Network methodology for human reliability assessment based on Success Likelihood Index Method (SLIM). Reliability Engineering & System Safety (2020).
Md. Tanjin Amin, Faisal Khan, Salim Ahmed, and Syed Imtiaz. 2021. A data-driven Bayesian network learning method for process fault diagnosis. Process Safety and Environmental Protection 150 (2021), 110–122. https://doi.org/10.1016/j.psep.2021.04.004
Pushparaj Bhosale, Wolfgang Kastner, and Thilo Sauter. 2022. Automating Safety and Security Risk Assessment in Industrial Control Systems: Challenges and Constraints. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA). https://doi.org/10.1109/ETFA52439.2022.9921517
Pushparaj Bhosale, Wolfgang Kastner, and Thilo Sauter. 2023. Integrated Safety-Security Risk Assessment for Production Systems: A Use Case Using Bayesian Belief Networks. In 2023 IEEE 21st International Conference on Industrial Informatics (INDIN). 1–6. https://doi.org/10.1109/INDIN51400.2023.10217926
Clint Bodungen, Bryan Singer, Aaron Shbeeb, Kyle Wilhoit, and Stephen Hilt. 2016. Hacking Exposed Industrial Control Systems: ICS and SCADA Security Secrets & Solutions. McGraw-Hill Education Group.
Dr. John D. Cook. 19 September 2012. How do you justify that distribution?https://www.johndcook.com/blog/2012/09/19/distribution-assumption/
Brian Everitt. 2006. The Cambridge dictionary of statistics. Cambridge University Press Cambridge, UK (2006), 432. http://site.ebrary.com/id/10150287
Sarah Ewing and Ronald Boring. 2016. SIMULATED HUMAN ERROR PROBABILITY AND ITS APPLICATION TO DYNAMIC HUMAN FAILURE EVENTS.
Rachael P.E. Gordon. 1998. The contribution of human factors to accidents in the offshore oil industry. Reliability Engineering & System Safety (1998).
Karl Johnson, Caroline Morais, and Edoardo Patelli. 2023. AI Tools for Human Reliability Analysis. https://doi.org/10.7712/120223.10348.20039
M. Karthick, C. Senthil Kumar, and T. Paul Robert. 2017. BAYES-HEP: Bayesian belief networks for estimation of human error probability. Life Cycle Reliability and Safety Engineering (2017).
Seung Woo Lee, Ar Ryum Kim, Jun Su Ha, and Poong Hyun Seong. 2011. Development of a qualitative evaluation framework for performance shaping factors (PSFs) in advanced MCR HRA. Annals of Nuclear Energy 38, 8 (2011), 1751–1759. https://doi.org/10.1016/j.anucene.2011.04.006
L. Mkrtchyan, L. Podofillini, and V.N. Dang. 2015. Bayesian belief networks for human reliability analysis: A review of applications and gaps. Reliability Engineering & System Safety 139 (2015), 1–16. https://doi.org/10.1016/j.ress.2015.02.006
Caroline Morais, Raphael Moura, Michael Beer, and Edoardo Patelli. 2019. Analysis and Estimation of Human Errors From Major Accident Investigation Reports. ASCE-ASME J Risk and Uncert in Engrg Sys Part B Mech Engrg 6 (10 2019). https://doi.org/10.1115/1.4044796
Caroline Morais, Ka Lai Yung, Karl Johnson, Raphael Moura, Michael Beer, and Edoardo Patelli. 2022. Identification of human errors and influencing factors: A machine learning approach. Safety Science (2022).
N.R. Nayak, S. Kumar, and D. Gupta. 2022. Network mining techniques to analyze the risk of the occupational accident via bayesian network.Int J Syst Assur Eng Manag 13 (Suppl 1), 633–641 (2022). https://doi.org/10.1007/s13198-021-01574-1
Ministry of Ecology Sustainable Development and Energy. 2014. Accident analysis of Industrial Automation. https://tinyurl.com/bdkr5wnf
Valentina Di Pasquale, Salvatore Miranda, Walther Patrick Neumann, and Azin Setayesh. 2018. Human reliability in manual assembly systems: a Systematic Literature Review.IFAC-PapersOnLine 51, 11 (2018), 675–680. https://doi.org/10.1016/j.ifacol.2018.08.396 16th IFAC Symposium on Information Control Problems in Manufacturing INCOM 2018.
Keith A. Stouffer, Victoria Pilitteri, Marshall Abrams, and Adam Hahn. 2015. Guide to Industrial Control Systems (ICS) Security. NIST Special Publication 800-82 Revision 2 (2015). https://doi.org/10.6028/NIST.SP.800-82r2
Weike Sun, Antonio R.C. Paiva, Peng Xu, Anantha Sundaram, and Richard D. Braatz. 2020. Fault detection and identification using Bayesian recurrent neural networks. Computers & Chemical Engineering 141 (2020), 106991. https://doi.org/10.1016/j.compchemeng.2020.106991
A. Sutcliffe, J. Galliers, and S. Minocha. 1999. Human errors and system requirements. In Proceedings IEEE International Symposium on Requirements Engineering (Cat. No.PR00188). https://doi.org/10.1109/ISRE.1999.777982
P. Trucco, E. Cagno, F. Ruggeri, and O. Grande. 2008. A Bayesian Belief Network modelling of organisational factors in risk analysis: A case study in maritime transportation. Reliability Engineering & System Safety (2008).
Esmaeil Zarei, Faisal Khan, and Rouzbeh Abbassi. 2021. Importance of human reliability in process operation: A critical analysis. Reliability Engineering & System Safety 211 (2021), 107607. https://doi.org/10.1016/j.ress.2021.107607
Kessie Zhang. Sep 15, 2020. Understanding and Choosing the Right Probability Distributions with Examples. https://towardsdatascience.com/understanding-and-choosing-the-right-probability-distributions-with-examples-5051b59b5211

FOOTNOTE

¹ https://www.omnicalculator.com/statistics/beta-distribution

CC-BY license image
This work is licensed under a Creative Commons Attribution International 4.0 License.

ARES 2024, July 30–August 02, 2024, Vienna, Austria