Architecture of Neural Networks

#machinelearning #ai #datascience #computerscience

Introduction to Neural Networks

What is a Neural Network?

Neural networks are the fundamental machine learning algorithm responsible for spawning the field of deep learning. According to the International Business Machines Corporation (IBM), “A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work”. Sometimes referred to as Artificial Neural Networks (ANNs) to differentiate them from their biological influence, neural networks have become extremely popular for machine learning due to their versatility and ability to handle large and especially complex tasks.

While other algorithms are very useful for simple tasks, such as linear regression for price/cost prediction and support vector machines for binary classification, ANNs have paved the way for some of the largest and most impressive accomplishments in machine learning and Ai as a whole. These include: image classification like for Google Images, speech recognition like Apple’s Siri, and recommendation tasks like for videos on YouTube. The creation and widespread acceptance of neural networks has truly changed the field, and the world as a whole, and has helped shape what things we deem computationally plausible.

Biological Neurons

As can be extrapolated by their namesake, artificial neural networks are modeled after the neurons in the brains of animals, such as humans. Neurons are nerve cells that, according to the National Institute of Neurological Disorders and Stroke (NINDS) “allow you to do everything from breathing to talking, eating, walking, and thinking”. Each neuron has a long extension called an axon which branches off into tips that have what are known as synaptic terminals, or synapses.

These synapses are what connects to other neurons and allows them to exchange information. Neurons produce electrical impulses which travel down their axons and to the synapses, causing them to release chemicals called neurotransmitters to the other neurons. When a neuron receives enough neurotransmitters within a short span; it will either fire it’s own, or stop firing, depending on the type of neurotransmitter. This small action is the essential basis behind brain activity and the process that artificial neural networks intend to mimic.

From Biological to Artificial

The Artificial Neuron

The idea behind ANNs has been around for a multitude of years. They were first introduced by neuropsychiatrist Warren McCulloch and mathematician Walter Pitts in their landmark paper “A Logical Calculus Of The Ideas Immanent In Nervous Activity”, published in 1943. In the paper, they introduce the idea of a simple computational model that can mimic the function of neurons using propositional logic (true or false). The model of the neuron they created was comprised of one or more binary (on/off) inputs and one binary output. This paper was instrumental in demonstrating that, even with these relatively simple neurons, it was possible to create a network capable of computing any logical proposition.

The TLU

Building off of the early artificial neuron, the threshold logic unit, or TLU, was the next big step for ANNs. The TLU differs from McCulloch and Pitts’ original model in that it’s inputs and output are numbers instead of just binary on/off signals. This model associates values, known as weights, to each of it’s input values. It then calculates a linear function of it’s inputs and their weights, along with a bias term, and applies what’s known as a step function to the result. This step function introduces a threshold to the output of the function, making it positive if above the threshold and negative if below. A single TLU can perform simple binary classification tasks, however they become more useful when stacked together.

The Perceptron

Created by psychologist Frank Rosenblatt in 1957; the perceptron is comprised of one or more TLUs stacked in a layer, with each input connected to each unit. These layers are known as fully connected (or dense) layers with the layer of inputs taking the name input layer. A perceptron with just two inputs and three units can simultaneously classify instances of data into three different binary classes, making it useful for multilabel classification. It also became useful for multiclass classification for the same reason.

Another benefit the perceptron had was the ability to adjust the weights, or train, the model. In order to train it, the perceptron would be fed multiple training samples with each output being recorded. After each sample, the weights are adjusted to minimize the value between the output and the desired output. This allowed the model to get better, or learn, from each instance it was trained on.

The Multilayer Perceptron

One step up from the perceptron is the multilayer perceptron, or MLP. An MLP is comprised of an input layer, multiple TLU layers in the center (called hidden layers), and one more layer of units called the output layer. Neural networks with two or more hidden layers are known as deep neural networks, and the study of deep neural networks became known as d*eep learning.* These MLPs were found to do increasingly well at complex tasks. They could still do things such as binary classification and regression, but they also showed promise in more difficult jobs such as image classification. Over time, researchers were able to modify and adapt these deep neural networks for a plethora of different functions, including: speech recognition, sentiment analysis, and image recognition.

Common Types of Neural Networks

Feedforward Neural Networks

Feedforward Neural Networks are some of the most simple types of ANNs. They get their name from the fact that the data that is input into the model goes only one direction: forward. That is to say that the data comes from the input layer, is transferred through it’s hidden layers, and is then fed through the output layer. Every neuron in one layer is connected to every neuron in the next, and none of the perceptron are connected to any others in the same layer. These networks are the foundation for more complex and specialized networks.

Convolutional Neural Networks

Modeled after the visual cortex region of the brain, Convolutional Neural Networks, or CNNs, are networks specialized for image and audio inputs. They work by using a layer, known as the convolutional layer, to detect important features within image or audio files. The data is then fed through a pooling layer, which reduces the dimensions of the data, helping reduce complexity and increase efficiency. The data is then pushed through a fully connected layer, similar to a normal feedforward network. Convolutional neural networks are the backbone of Computer Vision, the field of Ai dedicated to enabling computers to derive information from digital images and videos. Computer Vision is used in many industries such as: radiology- allowing doctors to better and more efficiently identify cancerous tumors; security- allowing cameras to identify and mark possibly threats; and the automotive industry- aiding the detection in systems such as lane detection and even self driving capabilities.

Recurrent Neural Networks

Recurrent Neural Networks, or RNNs, are networks that use sequential or time series data. They are most popular for their use in speech recognition and natural language processing (NLP). They differ from other neural networks in that they have “memory”, they take information from prior inputs to influence the next output. This step is necessary for tasks like natural language processing, as the position of each character in a sentence is important in determining the purpose or sentiment of the sentence. Some of the most popular uses of RNNs are things like Siri for the iPhone, voice search, and Google translate.