Artificial intelligence (AI) is one of the most important and long-lived research areas in computing. It’s a broad area that intersects with philosophical questions about the nature of mind and consciousness. On the practical side, today’s AI is very much the field of machine learning (ML). Machine learning is concerned with software systems capable of changing in response to training data. A prominent style of architecture is known as the neural network, a form of so-called deep learning. This article is an introduction to neural networks and how they work.
Neural networks and the human brain
Neural networks are inspired by the structure of the human brain, the basic idea is that a group of objects called neurons are combined in a network. Each neuron receives one or more inputs and a single output based on the internal calculation. Neural networks are therefore a specialized type of directed graph.
Many neural networks distinguish between three layers of nodes: input, hidden, and output. The input layer has neurons that accept raw input; hidden layers modify that input; and the output layer produces the final result. The process of moving data across the network is called feedback.
The network “learns” to work best by consuming inputs, passing them through the rows of neurons, and then comparing its final output to the known results, which are then fed back through the system to alter how the nodes perform their computations. This inversion process is known as backpropagation and it’s a core feature of machine learning in general.
An enormous amount of variety is included within the basic structure of a neural network. Every aspect of these systems is open to refinement within specific problem domains. Backpropagation algorithms, likewise, have any number of implementations. A common approach is to use partial derivative calculus (also known as backgradient propagation) to determine the effect of specific steps on overall network performance. Neurons can have different numbers of inputs (1 – *) and different ways that they connect to form a network. Two inputs per neuron is common.
Figure 1 shows the general idea, with a network of nodes with two inputs.
Figure 1. High-level neural network structure
Let’s take a closer look at the anatomy of a neuron in such a network, shown in Figure 2.
Figure 2. A neuron with two inputs
Figure 2 analyzes the details of a two-input neuron. Neurons always have a single output, but they can have any number of inputs, two being the most common. As the input arrives, it is multiplied by a weight property that is specific to that input. Then all the weighted inputs are summed with a single value called the bias. The result of those calculations is then fed into a function known as the activation function, which gives the neuron’s final output for the given input.
Input weights are the main dynamic dials of a neuron. These are the values that change to give the neuron a different behavior, the ability to learn or adapt to improve its performance. Bias is sometimes a constant and immutable property, or sometimes a variable that also changes with learning.
The trigger function is used to bring the output within an expected range. This is usually some kind of proportional compression function. Sigmoid function is common.
What an activation function like sigmoid does is drive the output value between -1 and 1, with large and small values approaching but never reaching 0 and 1, respectively. This serves to give the output the form of a probability, with 1 being the highest probability and 0 being the lowest. So this kind of activation function says that the neuron gives north degree of probability of the result yes or no.
You can see the output of a sigmoid function in the graph in Figure 3. For a given x, the further from 0, the more damped the y output.
Figure 3. Output of a sigmoid function
So, the forward stage of neural network processing is to feed the external data to the input neurons, which apply their weights, bias, and activation function, producing the output that is passed to the hidden layer neurons that do the same. process, eventually reaching the output neurons which then do the same for the final output.
Machine learning with backpropagation
What makes the neural network powerful is its ability to learn based on input. This happens by using a training data set with known results, comparing the predictions against it, and then using that comparison to adjust the weights and biases on the neurons.
loss function
To do this, the network needs a function that compares its predictions with known good answers. This function is known as the error or loss function. A common loss function is the root mean square error function.
He root mean square error function it assumes that you are consuming two sets of numbers of equal length. The first set is the known true answers (correct output), represented by Y in the above equation. The second set (represented by y’) are the network conjectures (proposed output).
The root mean square error function says: for each element Yo, subtract the guess from the correct answer, square it, and get the mean of the data sets. This gives us a way to see how well the network is working and to check the effect of making changes to the neuron’s weights and biases.
gradient descent
Taking this performance metric and pushing it back through the network is the backpropagation phase of the learning cycle, and it is the most complex part of the process. A common approach is gradient descent, in which each weight in the network is isolated via a partial shunt. For example, based on a given weight, the equation is expanded via the chain rule and fine adjustments are made to each weight to reduce the overall loss of the network. Each neuron and its weights are considered as one part of the equation, going from the last neuron backwards (hence the name of the algorithm).
You can think of gradient descent this way: The error function is the graph of the output of the network, which we are trying to fit so that its overall shape (slope) lands as best as possible according to the data points. When doing gradient backpropagation, you stop at the function of each neuron (one point on the overall slope) and slightly modify it to move the entire graph a little closer to the ideal solution.
The idea here is that you consider the entire neural network and its loss function as a multivariate (multidimensional) equation that depends on the weights and biases. It starts at the output neurons and determines their partial derivatives based on their values. Then use calculus to evaluate the same for the next few neurons. Continuing the process, you determine the role each weight and bias plays in the final error loss, and can adjust each slightly to improve the results.
See Machine Learning for Beginners: An Introduction to Neural Networks for a good walkthrough of the math involved in gradient descent.
Backpropagation is not limited to derivatives of functions. Any algorithm that effectively takes the loss function and applies gradual, positive changes across the network is valid.
Conclusion
This article has been a quick dive into the general structure and function of an artificial neural network, one of the most important styles of machine learning. Look for future articles covering neural networks in Java and a closer look at the backpropagation algorithm.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment