Academia.eduAcademia.edu

MACHINE LEARNING FOR DISEASE PREDICTION BY USING NEURAL NETWORKS

2019, INTERNATIONAL JOURNAL OF RESEARCH AND ANALYTICAL REVIEWS (IJRAR)

With the growth of data in biomedical and health care communities in large amounts, accurate analysis of medical data benefits early disease detection, patient care, and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. Machine learning algorithms are used for effective prediction of chronic disease like Cardiac Arrhythmia, is of a group of conditions in which the electrical activity of the heart is irregular or is faster or slower than normal. It is the leading cause of death for both men and women in the world. For optimization Stochastic Gradient Descent Algorithm is used.

© 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) MACHINE LEARNING FOR DISEASE PREDICTION BY USING NEURAL NETWORKS 1 P. Sunanda 1 Asst. Professor Department of Computer Science & Engineering, 1 G. Pulla Reddy Engineering College (Autonomous), GPREC, Kurnool, India 1 Abstract : With the growth of data in biomedical and health care communities in large amounts, accurate analysis of medical data benefits early disease detection, patient care, and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. Machine learning algorithms are used for effective prediction of chronic disease like Cardiac Arrhythmia, is of a group of conditions in which the electrical activity of the heart is irregular or is faster or slower than normal. It is the leading cause of death for both men and women in the world. For optimization Stochastic Gradient Descent Algorithm is used. IndexTerms - Machine Learning, Neural Networks, Stochastic Gradient Descent. I. INTRODUCTION 1.1 Machine Learning Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. The process of learning begins with observations of data, such as direct experience, or instruction in order to look for patterns in data and make better decisions in the future based on the examples that are provided. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly. Fig 1.1Machine Learning Classification Machine learning algorithms can be divided into 2 broad categories. 1. Supervised Learning 2. Unsupervised Learning 1. Supervised learning is useful in cases where a property (label) is available for a certain dataset (training set), but is missing and needs to be predicted for other instances. Some of the supervised learning algorithms include Decision trees, Naive Bayes classifier, Nearest Neighbor algorithm, Support Vector Machines etc. 1) Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). 2) Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. 3) K-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:  In k-NN classification, the output is a class membership.  In k-NN regression, the output is the property value for the object. 2. Unsupervised learning is useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset. Some of the unsupervised learning algorithms include k Means, Hierarchical, Hidden Markov models etc. 1) The k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. The k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into voronoi cells. 2) In data mining and statistics, hierarchical clustering also called hierarchical cluster analysis or HCA. It is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 712 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)  Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.  Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. 3) Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states. 1.1.1 Why Do Machine Learning On Different Data? Traditional analytics tools are not well suited to capturing the full value of data. The volume of data is too large for comprehensive analysis, and the range of potential correlations and relationships between disparate data sources — from back end customer databases to live web based click streams —are too great for any analyst to test all hypotheses and derive all the value buried in the data. Basic analytical methods used in business intelligence and enterprise reporting tools reduce to reporting sums, counts, simple averages and running SQL queries. Online analytical processing is merely a systematized extension of these basic analytics that still rely on a human to direct activities specify what should be calculated. And unlike traditional analysis, machine learning thrives on growing datasets. The more data fed into a machine learning system, the more it can learn and apply the results to higher quality insights. 1.2 Neural Networks Neural networks are an example of machine learning, where the output of the program can change as it learns. A neural network can be trained and improves with each example, but the larger the neural network, the more examples it needs to perform well - often needing millions or billions of examples in the case of deep learning. A network starts with an input, somewhat like a sensory organ. Information then flows through layers of neurons, where each neuron is connected to many other neurons. If a particular neuron receives enough stimuli, then it sends a message to any other neuron is it connected to through its axon. Similarly, an artificial neural network has an input layer of data, one or more hidden layers of classifiers, and an output layer. Each node in each hidden layer is connected to a node in the next layer. When a node receives information, it sends along some amount of it to the nodes it is connected to. The amount is determined by a mathematical function called an activation function, such as sigmoid or tanh. Neural networks work in very similar manner. It takes several input, processes it through multiple neurons from multiple hidden layers and returns the result using an output layer. This result estimation process is technically known as Forward Propagation. Next, compare the result with actual output. The task is to make the output to neural network as close to actual (desired) output. Each of these neurons is contributing some error to final output. To reduce the error, to minimize the value/ weight of neurons those are contributing more to the error and this happens while traveling back to the neurons of the neural network and finding where the error lies. This process is known as Backward Propagation. In order to reduce these numbers of iterations to minimize the error, the neural networks use a common algorithm known as Gradient Descent, which helps to optimize the task quickly and efficiently. 1.2.1 Components of Neural Networks       Weighting Factors: A neuron usually receives many simultaneous inputs. Each input has its own relative weight which gives the input the impact that it needs on the processing element's summation function. These weights perform the same type of function as do the varying synaptic strengths of biological neurons. Weights are adaptive coefficients within the network that determine the intensity of the input signal as registered by the artificial neuron. They are a measure of an input's connection strength. These strengths can be modified in response to various training sets and according to a network's specific topology or through its learning rules. Summation Function: The first step in a processing element's operation is to compute the weighted sum of all of the inputs. Mathematically, the inputs and the corresponding weights are vectors which can be represented as (i1, i2 . . . in) and (w1, w2 . . . wn). The total input signal is the dot, or inner product of these two vectors. This simplistic summation function is found by multiplying each component of the i vector by the corresponding component of the w vector and then adding up all the products. Input1 = i1 * w1, input2 = i2 * w2, etc., are added as input1 + input2 + . . . + input n. The result is a single number, not a multielement vector. Geometrically, the inner product of two vectors can be considered a measure of their similarity. If the vectors point in the same direction, the inner product is maximum; if the vectors point in opposite direction (180 degrees out of phase), their inner product is minimum. Transfer Function: The result of the summation function, almost always the weighted sum, is transformed to a working output through an algorithmic process known as the transfer function. In the transfer function the summation total can be compared with some threshold to determine the neural output. If the sum is greater than the threshold value, the processing element generates a signal. If the sum of the input and weight products is less than the threshold, no signal (or some inhibitory signal) is generated. Both types of response are significant. Output Function (Competition): Each processing element is allowed one output signal which it may output to hundreds of other neurons. This is just like the biological neuron, where there are many inputs and only one output action. Normally, the output is directly equivalent to the transfer function's result. Error Function and Back-Propagated Value: In most learning networks the difference between the current output and the desired output is calculated. This raw error is then transformed by the error function to match particular network architecture. The most basic architectures use this error directly, but some square the error while retaining its sign, some cube the error, while the other paradigms modify the raw error to fit their specific purposes. The artificial neuron's error is then typically propagated into the learning function of another processing element. This error term is sometimes called the current error. The current error is typically propagated backwards to a previous layer. Yet, this back- propagated value can be either the current error, the current error scaled in some manner (often by the derivative of the transfer function), or some other desired output IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 713 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) depending on the network type. Normally, this back-propagated value, after being scaled by the learning function, is multiplied against each of the incoming connection weights to modify them before the next learning cycle.  Learning Function: The purpose of the learning function is to modify the variable connection weights on the inputs of each processing element according to some neural based algorithm. This process of changing the weights of the input connections to achieve some desired result can also be called the adaption function, as well as the learning mode. 1.2.2 Multi Layer Perceptron and its basics Perceptron Just like atoms form the basics of any material on earth similarly the basic forming unit of a neural network is a perceptron. A perceptron can be understood as anything that takes multiple inputs and produces one output. The three ways of creating input output relationships: 1. By directly combining the input and computing the output based on a threshold value. For example: Take x1=0, x2=1, x3=1 and setting a threshold =0. So, if x1+x2+x3>0, the output is 1 otherwise 0. You can see that in this case, the perceptron calculates the output as 1. 2. Next, add weights to the inputs. Weights give importance to an input. For example, you assign w1=2, w2=3 and w3=4 to x1, x2 and x3 respectively. To compute the output, we will multiply input with respective weights and compare with threshold value as w1*x1 + w2*x2 + w3*x3 > threshold. These weights assign more importance to x3 in comparison to x1 and x2. 3. Next, add bias: Each perceptron also has a bias which can be thought of as how much flexible the perceptron is. It is somehow similar to the constant b of a linear function y =ax + b. It allows us to move the line up and down to fit the prediction with the data better. Without b the line will always goes through the origin (0, 0) and you may get a poorer fit. For example, a perceptron may have two inputs, in that case, it requires three weights. One for each input and one for the bias. Now linear representation of input will look like, w1*x1 + w2*x2 + w3*x3 + 1*b. But, all of this is still linear which is what perceptrons used to be. So, people thought of evolving a perceptron to what is now called as artificial neuron. A neuron applies non-linear transformations (activation function) to the inputs and biases. What is an activation function? Activation Function takes the sum of weighted input (w1*x1 + w2*x2 + w3*x3 + 1*b) as an argument and return the output of the neuron. In above equation, the representation is 1 as x0 and b as w0. 𝑛 a = 𝑓 ∑ wi ∗ xi 𝑖=0 The activation function is mostly used to make a non-linear transformation which allows to fit nonlinear hypotheses or to estimate the complex functions. There are multiple activation functions, like: Sigmoid, Tanh, and many other. Multi-layer perceptron Generally, a neural network has a single layer consisting of 3 input nodes i.e x1, x2 and x3 and an output layer consisting of a single neuron. But, for practical purposes, the single-layer network can do only so much. An MLP consists of multiple layers called Hidden Layers stacked in between the Input Layer and the Output Layer as shown below. Fig 1.2 Multilayer perceptron The image above just shows a single hidden layer in green but in practice can contain multiple hidden layers. Another point to remember in case of an MLP is that all the layers are fully connected that is every node in a layer (except the input and the output layer) is connected to every node in the previous layer and the following layer. Gradient Descent algorithm is used in the process for optimizing errors. Full Batch and Stochastic Gradient Descent Both variants of Gradient Descent perform the same work of updating the weights of the MLP by using the same updating algorithm but the difference lies in the number of training samples used to update the weights and biases. Full Batch Gradient Descent Algorithm as the name implies uses all the training data points to update each of the weights once whereas Stochastic Gradient uses 1 or more (sample) but never the entire training data to update the weights once. A simple example of a dataset of 10 data points with two weights w1 and w2. Full Batch uses 10 data points (entire training data) and calculate the change in w1 (Δw1) and change in w2 (Δw2) and update w1 and w2. Whereas SGD uses 1st data point and calculate the change in w1 (Δw1) and change in w2(Δw2) and update w1 and w2. Next, when you use 2nd data point, you will work on the updated weights. 1.2.3 Steps involved in Neural Network methodology The step by step building methodology of Neural Network (MLP with one hidden layer, similar to above-shown architecture) is as follows. At the output layer, only one neuron is used to solve a binary classification problem (predict 0 or 1). First look at the broad steps: We take input and output • X as an input matrix • y as an output matrix 1) Initialize weights and biases with random values (This is one time initiation. In the next iteration, we will use updated weights, and biases). Let us define: IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 714 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)  wh as weight matrix to the hidden layer  bh as bias matrix to the hidden layer  wout as weight matrix to the output layer  bout as bias matrix to the output layer 2) We take matrix dot product of input and weights assigned to edges between the input and hidden layer then add biases of the hidden layer neurons to respective inputs, this is known as linear transformation: hidden_layer_input= matrix_dot_product(X,wh) + bh 3) Perform non-linear transformation using an activation function (Sigmoid). Sigmoid will return the output as 1/(1 + exp(-x)). hiddenlayer_activations = sigmoid(hidden_layer_input) 4) Perform a linear transformation on hidden layer activation (take matrix dot product with weights and add a bias of the output layer neuron) then apply an activation function (again used sigmoid, but you can use any other activation function depending upon your task) to predict the output output_layer_input=matrix_dot_product (hiddenlayer_activations*wout) +bout output = sigmoid(output_layer_input) All above steps are known as ―Forward Propagation‖ 5) Compare prediction with actual output and calculate the gradient of error (Actual – Predicted). Error is the mean square loss = ((Y-t)^2)/2 E = y – output 6) Compute the slope/ gradient of hidden and output layer neurons ( To compute the slope, we calculate the derivatives of non-linear activations x at each layer for each neuron). Gradient of sigmoid can be returned as x * (1 – x). slope_output_layer=derivatives_sigmoid(output) slope_hidden_layer=derivatives_sigmoid(hiddenlayer_activations) 7) Compute change factor(delta) at output layer, dependent on the gradient of error multiplied by the slope of output layer activation d_output = E * slope_output_layer 8) At this step, the error will propagate back into the network which means error at hidden layer. For this, we will take the dot product of output layer delta with weight parameters of edges between the hidden and output layer (wout.T). Error_at_hidden_layer = matrix_dot_product(d_output, wout.Transpose) 9) Compute change factor(delta) at hidden layer, multiply the error at hidden layer with slope of hidden layer activation d_hiddenlayer=Error_at_hidden_layer * slope_hidden_layer 10) Update weights at the output and hidden layer: The weights in the network can be updated from the errors calculated for training example(s). wout=wout+matrix_dot_product(hiddenlayer_activations. Transpose, d_output)*learning_rate wh=wh+matrix_dot_product(X.Transpose,d_hiddenlayer)*learning_rate learning_rate: The amount that weights are updated is controlled by a configuration parameter called the learning rate) 11) Update biases at the output and hidden layer: The biases in the network can be updated from the aggregated errors at that neuron.   bias at output_layer =bias at output_layer + sum of delta of output_layer at row-wise * learning_rate bias at hidden_layer =bias at hidden_layer + sum of delta of output_layer at row-wise * learning_rate bh=bh+ sum(d_hiddenlayer, axis=0)*learning_rate bout = bout + sum(d_output, axis=0)*learning_rate Steps from 5 to 11 are known as ―Backward Propagation. II EXISTING SYSTEM The limitations of the existing system are:  Updating the model so frequently is more computationally expensive than other configurations of gradient descent, taking significantly longer to train models on large datasets.  The frequent updates can result in a noisy gradient signal, which may cause the model parameters and in turn the model error to jump around (have a higher variance over training epochs).  The supervised algorithms are implemented in existing system which is not that efficient when compared to neural networks. III PROPOSED SYSTEM However, in the existing work Supervised machine Learning Algorithm is used. But in proposed system, Neural Networks are implemented. For optimization, stochastic gradient descent algorithm is used. This algorithm is efficient for computations. For classification of data, Multi Layer Perceptron (MLP) Classifier is used. 3.1 Design and Implementation Design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 715 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) 3.1.1 Design The System Design Document describes the system requirements, operating environment, system and subsystem architecture, files and database design, input formats, output layouts, human-machine interfaces, detailed design, processing logic, and external interfaces. 3.1.2 Data Flow Diagram Fig 3.1 Data flow diagram 3.1.3 Activity Diagram Fig 3.2 Activity diagram 3.2 Implementation 3.2.1 Implementation Models    Optimization Classification of data using MLP Classifier Prediction Optimization  Stochastic Gradient Descent Algorithm Stochastic gradient descent is a variation of the gradient descent algorithm that calculates the error and updates the model for each example in the training dataset. The update of the model for each training example means that stochastic gradient descent is often called an online machine learning algorithm. Fig 3.3 Stochastic Gradient Descent Algorithm 3.2.2 Installation Of Python On Ubuntu Ubuntu 16.04 ships with both Python 3 and Python 2 pre-installed. To make sure that our versions are up-to-date, let‘s update and upgrade the system with apt-get: $sudo apt-get update $sudo apt-get -y upgrade The -y flag will confirm that we are agreeing for all items to be installed, but depending on your version of Linux, you may need to confirm additional prompts as your system updates and upgrades. Once the process is complete, we can check the version of Python 3 that is installed in the system by typing: $python3 -V IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 716 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) You will receive output in the terminal window that will let you know the version number. The version number may vary, but it will look similar to this: $Output Python 3.5.2 To manage software packages for Python, let‘s install pip: $sudo apt-get install -y python3-pip A tool for use with Python, pip installs and manages programming packages we may want to use in our development projects. You can install Python packages by typing: $pip3 install package_name Here, package_name can refer to any Python package or library, such as Django for web development or NumPy for scientific computing. So if you would like to install NumPy, you can do so with the command pip3 install numpy. There are a few more packages and development tools to install to ensure that we have a robust set-up for our programming environment: $sudo apt-get install build-essential libssl-dev libffi-dev python-dev KERAS Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Keras allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).It supports both neural networks and recurrent networks, as well as combinations of the two. It runs seamlessly on CPU and GPU. Keras is compatible with: Python 2.7-3.6. Guiding principles  User friendliness: Keras is an API designed for human beings, not machines. It puts user experience front and center. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.  Modularity: A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that can be combined to create new models.  Easy extensibility: New modules are simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.  Work with Python: No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility. Installation Before installing Keras, please install one of its backend engines: TensorFlow, Theano, or CNTK. We recommend the TensorFlow backend.  TensorFlow installation instructions.  Theano installation instructions.  CNTK installation instructions. You may also consider installing the following optional dependencies:  cuDNN (recommended if you plan on running Keras on GPU).  HDF5 and h5py (required if you plan on saving Keras models to disk).  graphviz and pydot (used by visualization utilities to plot model graphs). Then, you can install Keras itself. There are two ways to install Keras: 1. Install Keras from PyPI (recommended): 2. sudo pip install keras If you are using a virtualenv, you may want to avoid using sudo: pip install keras Using a different backend than TensorFlow By default, Keras will use TensorFlow as its tensor manipulation library. Follow these instructions to configure the Keras backend. Tensorflow installation through Nativepip $ sudo apt-get install python-pip python-dev # for Python 2.7 $ pip install tensorflow # Python 2.7; CPU support (no GPU support) pip install numpy pip install scipy pip install matplotlib IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 717 © 2019 IJRAR May 2019, Volume 6, Issue 2 IV RESULT ANALYSIS www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) Test Results All the test cases mentioned above passed successfully. No defects encountered. IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 718 © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) V CONCLUSION & FUTURE ENHANCEMENT Chronic Disease like Cardiac Arrhythmia is predicted by Machine Using Neural Networks (NN) and for optimization Stochastic Gradient Descent algorithm is used. For Optimization, different algorithms can be used for better accuracy. REFERENCES [1] M. Chen, Y. Hao, K. Hwang, L. Wang and L. Wang, "Disease Prediction by Machine Learning Over Big Data From Healthcare Communities," in IEEE Access, vol. 5, pp. 8869-8879, 2017.doi: 10.1109/ACCESS.2017.2694446 [2]P. Groves, B. Kayyali, D. Knott, S. van Kuiken, The‗Big Data‘Revolution in Healthcare: Accelerating Value and Innovation, 2016. [3] S. B. Kotsiantis,I. D. Zaharakis and P. E. Pintelas,‖Machine learning: a review of classification and combining techniques‖,JournalArtificial Intelligence Review archiveVolume 26 Issue 3, November 2006.doi:10.1007/s10462-007-9052-3 [4] S. Lawrence, C. L. Giles, Ah Chung Tsoi and A. D. Back, "Face recognition: a convolutional neural-network approach," in IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 98- 113,Jan 1997.doi: 10.1109/72.554195 IJRAR19K1534 International Journal of Research and Analytical Reviews (IJRAR)www.ijrar.org 719