Neural Networks: Feedforward and Backpropagation Explained

A neural network is one of the areas of artificial intelligence, the goal of which is to simulate the analytical mechanisms carried out by the human brain. The tasks that a typical neural network solves are classification, prediction and recognition. Neural networks are capable of learning and developing independently, building their experience on mistakes made.

Neural networks are a sequence of neurons connected by synapses. The structure of a neural network came to the world of programming straight from biology. Thanks to this structure, the machine gains the ability to analyze and even remember various information. Also, neural networks are capable of not only analyzing incoming information, but also reproducing it from their memory.

In other words, a neural network is a machine interpretation of the human brain, which contains millions of neurons transmitting information in the form of electrical impulses.

History of the creation of neural networks

What is the history of the development of neural networks in science and technology? It originates with the advent of the first computers or computers (electronic computers) as they were called in those days. So, back in the late 1940s, a certain Donald Hebb developed a neural network mechanism, which laid down the rules for teaching computers, these “proto-computers.”

The further chronology of events was as follows:

In 1954, the first practical use of neural networks in computer operation took place.
In 1958, Frank Rosenblatt developed a pattern recognition algorithm and a mathematical annotation to it.
In the 1960s, interest in the development of neural networks faded somewhat due to the weak computer power of that time.
And it was revived again in the 1980s; it was during this period that a system with a feedback mechanism appeared and self-learning algorithms were developed.
By 2000, computer power had grown so much that it could make the wildest dreams of scientists of the past come true. At this time, voice recognition programs, computer vision and much more appear.

GeekUniversity together with Mail.ru Group opened the first Artificial Intelligence faculty in Russia teaching neural networks. School knowledge is enough for studying. The program includes all the necessary resources and tools + a whole program in higher mathematics. Not abstract, as in ordinary universities, but built in practice. The training will introduce you to machine learning technologies and neural networks, and teach you how to solve real business problems.

How do neural networks work?

An artificial neural network is a collection of neurons interacting with each other. They are capable of receiving, processing and creating data. It is as difficult to imagine as the functioning of the human brain. The neural network in our brain works so that you can read this now: our neurons recognize letters and put them into words.

A neural network includes several layers of neurons, each of which is responsible for recognizing a specific criterion: shape, color, size, texture, sound, volume, etc.

Year after year, as a result of millions of experiments and tons of calculations, new and new layers of neurons were added to the simplest network. They work in turns. For example, the first determines whether a square is square or not, the second understands whether a square is red or not, the third calculates the size of the square, and so on. Not squares, not red, and inappropriately sized figures end up in new groups of neurons and are studied by them.

What are neural networks for?

Neural networks are used to solve complex problems that require analytical calculations similar to what the human brain does. The most common applications of neural networks are:

Classification
- distribution of data according to parameters. For example, you are given a set of people as input and you need to decide which of them to give credit to and which not. This work can be done by a neural network, analyzing information such as age, solvency, credit history, etc.
Prediction
- the ability to predict the next step. For example, the rise or fall of shares based on the situation in the stock market.
Recognition
is currently the most widespread application of neural networks. Used in Google when you search for a photo or in phone cameras when it detects the position of your face and highlights it and much more.

The scope of artificial neural networks is expanding every year; today they are used in such areas as:

Machine learning, which is a type of artificial intelligence. It is based on training AI using the example of millions of similar tasks. Nowadays, machine learning is actively implemented by the search engines Google, Yandex, Bing, and Baidu. So, based on the millions of search queries that we all enter into Google every day, their algorithms learn to show us the most relevant results so that we can find exactly what we are looking for.
In robotics, neural networks are used to develop numerous algorithms for the iron “brains” of robots.
Computer system architects use neural networks to solve the problem of parallel computing.
With the help of neural networks, mathematicians can solve various complex mathematical problems.

Now, to understand how neural networks work, let's take a look at its components and their parameters.

What is a neuron?

A neuron is a computational unit that receives information, performs simple calculations on it, and transmits it further. They are divided into three main types: input (blue), hidden (red) and output (green):

There is also a displacement neuron and a context neuron. In the case when a neural network consists of a large number of neurons, the term layer is introduced. Accordingly, there is an input layer that receives information, n hidden layers (usually no more than 3) that process it, and an output layer that outputs the result.

Each neuron has 2 main parameters:

input data
output data.

In the case of an input neuron: input=output. In the rest, the input field contains the total information of all neurons from the previous layer, after which it is normalized using the activation function (for now let’s just imagine it as f(x)) and ends up in the output field.

It is important to remember that neurons operate with numbers in the range [0,1] or [-1,1]. But how, you ask, then process numbers that fall outside this range? At this point, the simplest answer is to divide 1 by that number. This process is called normalization and it is very often used in neural networks. More on this a little later.

What is a synapse?

A synapse is a connection between two neurons. Synapses have 1 parameter - weight. Thanks to it, input information changes as it is transmitted from one neuron to another. Let's say there are 3 neurons that transmit information to the next one. Then we have 3 weights corresponding to each of these neurons. For the neuron with the most weight, that information will be dominant in the next neuron (example: color mixing).

In fact, the set of weights of a neural network or the weight matrix is a kind of brain of the entire system. It is thanks to these weights that the input information is processed and turned into a result.

It is important to remember that during the initialization of the neural network, the weights are placed in a random order.

Biological basis of neural connections

There are neurons in our brain. There are about 86 billion of them. A neuron is a cell connected to other such cells. The cells are connected to each other by processes. All this together resembles a kind of network. Here's a neural network for you. Each cell receives signals from other cells. Then it processes them and sends a signal to other cells.

Simply put, a neuron receives a signal (information), processes it (decides something, thinks) and sends its response further. The arrows represent branch connections through which information is transmitted:

This is how the neural network passes signals to each other and comes to a decision. And we thought that we alone decide everything! No, our solution is the result of the collective work of a billion neurons.

In my picture, the arrows indicate the connections of neurons. There are different connections. For example, the arrow below between neurons 2 and 5 is long. And that means the signal from neuron 2 to neuron 5 will take longer to travel than, for example, the signal from neuron 3 where the arrow is half as long. And in general, the signal may fade and come in weak. There is a lot of interesting things in biology.

But IT did not begin to consider all this - how the neuron thinks, whether the signal will fade when it arrives or when it does not arrive. Why bother? And they simply built a simplified model.

There are two main components in this model:

Algorithm . In biology, a neuron thinks. In programming, “thinking” is replaced by an algorithm—that is, a set of commands. For example, if the input received 1, send 0. That’s all the “brains” of our neuron.
Decision weight . All connections, attenuation, etc. decided to replace it with “weight”. Weight is like the power of a decision, its importance. It's just a quantity, usually a number. Our neuron receives a decision with a certain weight, our neuron receives a number. And if it is greater than the other number that came, then it is more important. This is just an example.

Total: there is an algorithm and there is the weight of the decision. This is all you need to build a simple neural network.

How does MNS work?

Any changes in the page interface are immediately saved without reloading and the result of the changes is immediately displayed.

Although you can show your model to someone else by simply sharing its name for download, the other person will not be able to modify it and must save the model as their own, under a different name, after which its functionality will be available.

The first is the “Sensor Matrix” table, which is the same for all layers. For simplicity, the mechanism of lateral inhibition, which provides a change in the sensitivity of the sensors and the contrast of the activity profile, is not implemented here.

In the sensor matrix, you can click on combinations of active (green) sensors and remember this image (the number of images is not limited). The meaning of this or that sensor can be conditionally set to any one, for example, the top few are a visual image, and the bottom is an auditory image, etc.

Next is a table of connections. Using the SET tab with a drop-down list, you can set a ready-made configuration of connections or make your own by entering the session number in each cell, then the number of the neuron in the layer and setting the efficiency (weight) of the connection immediately.

The matrix of the layer of neurons is initially all gray, which indicates immature neurons that are not ready for specialization. You can click on those that are intended to be made mature (neurons in a layer also do not have the same maturation time, which plays a certain role) or, by clicking on the ALL tab, make the entire matrix ready.

If you immediately specify the connection weights, the results will be immediately obtained, which can be seen in the graphical interpretation of the system below. In this case, you can create ready-made pattern recognition configurations and build a control circuit layer-by-layer.

Unfortunately, for reasons of compactness, the total number of possible connections is limited to 60.

Or you can not set the connection, but start the training mode by clicking the “Start training” button, after which the training time will begin counting and the changes that are occurring will be shown. Training can be interrupted at any time with the same button, or it will stop itself when all neurons are specialized. In this case, each specialized neuron will stop learning after slightly exceeding its response threshold, which will mean that during the learning period the connections have become sufficiently effective to the extent and proportion of their weights that ensure recognition of a given activity profile at the neuron input.

Active non-neurons in the network mutually inhibit neighboring ones, and the degree of such influence can be optimized by setting the “Mutual inhibition of neighbors” value. This is a rather little-studied, but very significant factor.

You can also set the general threshold for firing neurons in a layer, which in nature depends on certain factors and is very important for the accurate operation of the neural network (which is why the constancy of the brain environment is especially carefully maintained).

If you now change the activity on the sensor matrix, you will see which recognition functions have been formed.

The results that are closest to reality can be obtained if you define a layer of overlapping connections, define several images of the sensory matrix, and during training begin to switch them, simulating the real appearance of certain images.

Unlike perceptron models of artificial neural networks, negative connections (inhibitory connections) cannot be specified here, which corresponds to a natural neural network when forming a recognition profile in the primary zones. During the classes, the principles of why exactly this happens were explained in detail (the principle of a sieve, which can only sift what is finer than the diameter of the hole, i.e. the sieve has only a positive selection parameter, but a set of different sieves ensures the selection of the desired fraction).

Artificial neural network

A neural network is an attempt to use mathematical models to reproduce the functioning of the human brain to create machines with artificial intelligence.

An artificial neural network is usually trained with a teacher. This means having a training set (dataset) that contains examples with true values: tags, classes, indicators.

For example, if you want to create a neural network to assess the sentiment of a text, the dataset will be a list of sentences with emotional ratings corresponding to each. The tone of the text is determined by features (words, phrases, sentence structure) that give a negative or positive connotation. The weights of the features in the final assessment of the sentiment of the text (positive, negative, neutral) depend on a mathematical function that is calculated during training of the neural network.

Previously, people generated features manually. The more features and more accurately the weights are selected, the more accurate the answer. The neural network has automated this process:

An artificial neural network consists of three components:

Input layer;
Hidden (computational) layers;
Output layer.

Training of such neural networks occurs in two stages:

Direct error propagation;
Error backpropagation.

During forward error propagation, a prediction of the response is made. With backpropagation, the error between the actual response and the predicted one is minimized.

For a deeper study, we recommend watching 2 videos from TED Talks: Video 1, Video 2 (videos in English).

Error

Error is a percentage that reflects the difference between the expected and received responses.
The error is formed every era and must decline. If this doesn't happen, then you are doing something wrong. The error can be calculated in different ways, but we will consider only three main methods: Mean Squared Error (hereinafter MSE), Root MSE and Arctan. There is no restriction on use like there is in the activation function, and you are free to choose any method that will give you the best results. You just have to keep in mind that each method counts errors differently. With Arctan, the error will almost always be larger, since it works on the principle: the greater the difference, the greater the error. The Root MSE will have the smallest error, so it is most common to use an MSE that maintains balance in error calculation. MSE

Root MSE Arctan

The principle of calculating errors is the same in all cases. For each set, we count the error by subtracting the result from the ideal answer. Next, we either square it or calculate the square tangent from this difference, after which we divide the resulting number by the number of sets.

Types and classification of neural networks

Over the period of development, neural networks have been divided into many types, which are intertwined in various tasks. At the moment, it is difficult to classify any network based on only one criterion. This can be done according to the principle of application, type of input information, nature of training, nature of connections, scope of application.

Neural network	Application principle	Training with a teacher (+) or without (-) or mixed (with)	Scope of application
Rosenblatt Perceptron	Pattern recognition, decision making, forecasting, approximation, data analysis	+	Almost any area of application, except information optimization
Hopfield	Data compression and associative memory	—	Structure of computer systems
Kohonen	Clustering, data compression, data analysis, optimization	—	Finance, databases
Radial basis functions (RBF network)	Decision making and control, approximation, forecasting	With	Management structures, neurocontrol
Convolutional	Pattern recognition	+	Graphics Data Processing
Pulse	Decision making, pattern recognition, data analysis	With	Prosthetics, robotics, telecommunications, computer vision

What learning with a teacher is is written in the next section. Each network has its own characteristics that can be applied in one case or another. Let us consider in more detail two types of networks, which are practically primary sources for many derived types of neural networks.

Convolutional

One of the most popular types of network, often used to recognize certain information in photos and videos, language processing, and recommendation systems.

Main characteristics:

Excellent scalability – image recognition of any resolution (no matter how large) is carried out.
Using volumetric 3D neurons – within a layer, the neurons are connected by a small field called the receptive layer.
Spatial localization mechanism - adjacent layers of neurons are connected by such a mechanism, which ensures the operation of nonlinear filters and coverage of an increasing number of pixels of the graphic image.

The idea of a complex system of this type of neural network arose from a careful study of the visual cortex, which in the cerebral hemispheres is responsible for processing the visual component. The main criterion for choosing in favor of the convolutional type is that it is part of deep learning technologies. A similar type to a perceptron, but the difference is that it uses a limited weight matrix, shifted across the processed layer, instead of a fully connected neural network.

Recurrent

This is a type of neural network in which the connections between elements can process a series of different events in time or work with sequential chains in space. This type is often used where something whole is broken into pieces. For example, speech or handwritten text recognition. From it came many types of networks, including Hopfield, Elman and Jordan.

The number of neural connections in the brain improves a person's quality of life

For many years, scientists thought that the adult brain remained unchanged. However, science now knows for sure: throughout our lives, more and more synapses are formed in our brain - contacts between neurons or other types of cells that receive their signals. In total

neurons and synapses form a neural network, the individual elements of which are constantly in contact with each other and exchange information.

The neuron is responsible for memory

Researchers have uncovered how episodic memory and associations occur at the level of a single neuron and how...

02 July 13:40

It is neural connections that help different areas of the brain transmit data to each other, thereby ensuring vital processes for us: memory formation, production and understanding of speech, control of the movements of our own body. When neural connections are disrupted, which can happen as a result of diseases such as Alzheimer's disease or physical trauma, certain areas of the brain lose the ability to communicate with each other. As a result, it becomes impossible to perform any action, both mental (memorizing new information or planning one’s actions) and physical.

A team of researchers led by Stephen Smith from the Center for Functional Magnetic Resonance Imaging of the Brain at the University of Oxford decided to find out whether the total number of neural connections in the brain could somehow influence its overall functioning. During the study, scientists used data obtained as part of the Human Connectome Project

- a project launched in 2009. Its goal is to compile a kind of “map” of the brain, with the help of which it will be possible to understand which area of the brain is responsible for a particular process or disease, as well as how different areas of the brain interact with each other.

What was unique about the work of Stephen Smith's research group was that the scientists did not focus on connections between specific areas of the brain or on specific functions, but rather studied processes as a whole.

read more about the findings in the journal Nature Neuroscience

The study used the results of magnetic resonance imaging of 461 people. For each of them, a “map” was created that showed the total number of neural connections between all areas of the brain. In addition, each study participant filled out a questionnaire about their education, lifestyle, health, marital status and emotional state. In total, the questions touched on 280 aspects of human life.

The paralytic got up and walked

For the first time in history, a man whose both legs were paralyzed for five years regained the ability...

September 24 11:42

As a result of the work, it was possible to find out: the greater the number of neural connections present in the human brain, the more “positive” it is.

People whose brains were rich in connections between neurons tended to have higher education, had no problems with the law, strived to lead a healthy lifestyle, were in good psychological health and generally showed high levels of life satisfaction.

According to the authors of the study, the relationship between the number of neural connections and the quality of human life was so clear and strong that the scientists themselves were amazed by it.

The science department was able to contact the lead author of the work, Stephen Smith, and talk to him about the details of the work.

— Is it possible to give an accurate explanation of why the number of neural connections in the brain has a direct impact on a person’s quality of life: for example, to say that the number of connections somehow affects brain activity?

— No, it’s too early to talk about such cause-and-effect relationships, since all this is the subject of complex and multivariate correlation analysis. Therefore, we cannot yet say that a brain with many neural connections makes a person study several years longer (or vice versa - that many years of study increases the number of neural connections).

By the way, at the moment it is indeed possible to extend cause-and-effect relationships in both directions - this can be called a “vicious circle”.

- In that case, how are you going to break this “vicious circle”?

“The work we have done now—scanning the brain using magnetic resonance imaging—can only show how closely certain areas of the brain are connected to each other. It also reflects many other biological factors of lesser importance - for example, showing the exact number of neurons connecting these areas. But understanding how these connections affect a person’s behavior, mental abilities, and lifestyle is the main question facing the staff of the Human Connectome Project.

Why you should sleep on your side

To cleanse the brain of toxins accumulated during the day, you need to sleep on your side, scientists have found. Science Department...

16 August 12:40

— Stephen, is there a correlation between the number of neural connections in the brains of parents and children?

- But here I can answer unequivocally - yes. There is a lot of evidence that the number of neural connections, so to speak, is inherited. As part of our project, we are going to study this phenomenon in more depth. Although, of course, there are other important factors that affect the functioning of the brain and the formation of neural connections.

— Is it possible, at least theoretically, to somehow influence the number of neural connections and thus change the quality of a person’s life?

“It’s very difficult to talk about this in general terms. However, there are many examples where interventions in the functioning of the brain changed a person’s behavior or improved some individual indicators of his work. You can read about such an experiment, for example, in the journal Current Biology

: the article states that scientists, using micropolarization (a method that allows one to change the state of various parts of the central nervous system by the action of direct current. - Gazeta.Ru), managed to improve the mathematical abilities of subjects.

Another, simpler and more ordinary example can be given: we all know that training and practice in any type of activity help improve the performance of this very activity.

But learning, by definition, changes the neural connections of the brain, even if sometimes we are not able to detect it.

Regarding your question, the problem of global change in human behavior or abilities remains a large-scale and extremely interesting object of study.

Neural network training

One of the main and most important criteria is the ability to train a neural network. In general, a neural network is a collection of neurons through which a signal passes. If you apply it to the input, then after passing through thousands of neurons, the output will be who knows what. To convert, you need to change the network parameters so that the desired results are obtained at the output.

The input signal cannot be changed, the adder performs the summation function and changing something in it or removing it from the system will not work, since it will cease to be a neural network. There is only one thing left to do - use coefficients or correlating functions and apply them to the weights of the connections. In this case, we can give a definition of training a neural network - this is a search for a set of weighting coefficients that, when passing through the adder, will allow us to obtain the desired signal at the output.

Our brain also applies this concept. Instead of weights, it uses synopses to amplify or attenuate the input signal. A person learns due to changes in synapses during the passage of an electrochemical impulse in the neural network of the brain.

But there is one caveat. If you manually set the weight coefficients, the neural network will remember the correct output signal. In this case, the output of information will be instantaneous and it may seem that the neural network was able to quickly learn. And as soon as you change the input signal a little, incorrect, non-logical answers will appear at the output.

Therefore, instead of specifying specific coefficients for a single input signal, generalization parameters can be created using sampling.

Using such a sample, you can train the network so that it produces correct results. At this point, we can divide neural network training into supervised and unsupervised training.

Tutored training

Training in this way implies the concept: you give a sample of input signals to a neural network, get the output and compare it with a ready-made solution.

How to prepare such samples:

To identify faces, create a sample of 5000-10000 photographs (input) and independently indicate which ones contain people’s faces (output, correct signal).
To predict the rise or fall of stocks, sampling is done by analyzing data from past decades. Input signals can be either the state of the market as a whole or specific days.

The teacher is not necessarily a person. The network needs to be trained for hundreds and thousands of hours, so in 99% of cases the training is done by a computer program.

Unsupervised learning

The concept is that input signals are sampled, but you cannot know the correct output responses.

How does learning happen? In theory and in practice, the neural network begins clustering, that is, it determines the classes of input signals supplied. Then, it produces signals of various types that are responsible for the input objects.

Neural network 2021

Neural networks in everyday life

The concept of “neural network” was invented quite a long time ago and is synonymous with a machine learning algorithm.
The first version of a formal neuron, a neural network cell, was proposed by Warren McCulloch and Walter Pitts in 1943. And already in 1958, Frank Rosenblatt developed the first neural network. But the first pancake turned out to be lumpy, and other machine learning algorithms relegated neural networks to the background for some time. Neural networks began to gain popularity after 2010. In order to understand why this happened, we must understand how a neural network is structured and what its peculiarities are.

Neural networks use hardware and software to mimic the web of neurons in the human brain. But since we are talking about machine learning, the result of all calculations is zero (no) or one (yes): that is, the machine gives us an answer that is close to the value “no” or “yes”. Accordingly, in order to train a machine to recognize numbers, we will need a neural network consisting of sectors, each of which will be responsible for its own number from 0 to 9. The more possible answers there are, the larger our neural network will be.

Read the full text of the article “Neural networks in everyday life” >>>

The neural network will predict your date of death

Artificial intelligence in Pennsylvania was taught to predict the increased risk of death of patients... based on ECG results. The system tells you who will live and who will die over the next year.

The system developers claim that this forecasting method works more accurately than other existing ones.

Full text “The neural network will predict your date of death” >>>

Solutions for industrial enterprises in the fight against COVID-19

The COVID-19 epidemic has forced companies to reconsider their employee work management system to minimize the risk of the spread of coronavirus infection among staff. It was especially important to find solutions for continuous cycle enterprises whose employees cannot be transferred to remote work. Deletron LLC specialists developed and used a whole range of solutions, due to which the number of cases of infection of employees at enterprises has significantly decreased.

In March 2021, we were faced with the task of preventing employees with elevated temperatures from entering the territory of a large industrial enterprise. Considering that more than 30 thousand people pass through the checkpoint in the morning, the main goal was to maintain the necessary throughput. At the same time, it was necessary to ensure a quick temperature measurement using a non-contact method. That is why technical specialists chose flow-type thermal imagers to implement a thermal imaging control system with specified parameters.

When setting up a thermal imaging monitoring system, Deletron specialists encountered a number of problems. Almost all flow thermal imagers have relay outputs, and theoretically it should not have been difficult to include them in the turnstile’s operating logic. If it is detected that the temperature has been exceeded, the passage through the turnstile must be blocked. However, the number of relay outputs and the internal logic of the thermal imager itself did not allow it to be used at more than one passage point, otherwise all the turnstiles in the camera’s field of view (up to three in the frame) would be blocked, which is unacceptable with such a flow of people.

Full text of the article “Solutions for industrial enterprises in the fight against COVID-19” >>>

^ Up

Convolutional Neural Network

Convolutional neural network (CNN) is a special architecture of artificial neural networks proposed by Yann LeCun and aimed at effective pattern recognition. This architecture is able to recognize objects in images much more accurately, since, unlike a multilayer perceptron, the two-dimensional topology of the image is taken into account. At the same time, convolutional networks are resistant to small displacements, changes in scale, and rotations of objects in input images. This is largely why architectures based on convolutional networks still dominate image recognition competitions, such as ImageNet.

Convolutional neural network is the main tool for classification and recognition of objects, faces in photographs, and speech recognition. There are many applications of CNN such as Deep Convolutional Neural Network (DCNN), Region-CNN (R-CNN), Fully Convolutional Neural Networks (FCNN), Mask R-CNN and others.

CNN today is the “workhorse” in the field of neural networks. It is used primarily for solving computer vision problems, although it can also be used to work with audio and any data that can be represented in the form of matrices.

Features of convolutional networks

We know that neural networks are good at image recognition. Moreover, good accuracy is achieved by conventional feed-forward networks, however, when it comes to processing images with a large number of pixels, the number of parameters for a neural network increases many times over. And so much so that the time spent on their training becomes unimaginably large.

So, if you need to work with color images of size 64x64, then for each neuron of the first layer of a fully connected network you will need 64 64 3 = 12288 parameters, and if the network must recognize images 1000x1000, then there will be already 3 million input parameters! And in addition to the input layer, there are other layers on which, often, the number of neurons exceeds the number of neurons in the input layer, which is why 3 million easily turns into trillions! Such a number of parameters simply cannot be calculated quickly due to the insufficient computing power of computers.

The main feature of convolutional networks is that they work specifically with images, and therefore it is possible to identify features specific to them. Multilayer perceptrons work with vectors, and therefore it makes no difference to them whether some points are nearby or at opposite ends, since all points are equivalent and are calculated in exactly the same way. Images have local connectivity. For example, if we are talking about images of human faces, then it is quite logical to expect that the points of the main parts of the face will be nearby, and not scattered across the image. Therefore, it was necessary to find more efficient algorithms for working with images, and they turned out to be convolutional networks.

Unlike feedforward networks, which work with data in the form of vectors, convolutional networks work with images in the form of tensors. Tensors are 3D arrays of numbers, or, more simply, arrays of matrices of numbers.

Images in a computer are represented as pixels, and each pixel is the intensity values of the corresponding channels. In this case, the intensity of each channel is described by an integer from 0 to 255.

Most often, color images are used, which consist of RGB pixels - pixels containing brightness in three channels: red, green and blue. Various combinations of these colors allow you to create any color from the entire spectrum. That is why it is quite logical to use tensors to represent images: each tensor matrix is responsible for the intensity of its channel, and the totality of all matrices describes the entire image.

What are convolutional networks made of?

Convolutional neural networks are made up of basic blocks, so they can be assembled like a construction kit, adding layer after layer to create increasingly powerful architectures. The main blocks of convolutional neural networks are convolutional layers, subsampling (pooling) layers, activation layers and fully connected layers.

For example, LeNet5, one of the first convolutional networks that won ImageNet, consisted of 7 layers: a convolution layer, a pooling layer, another convolution layer, another pooling layer, and a three-layer fully connected neural network.

Convolutional layer

The convolutional layer of a neural network is the application of a convolution operation to the outputs from the previous layer, where the weights of the convolution kernel are the trainable parameters. Another trainable weight is used as a constant bias. There are several important details:

There can be multiple convolutions in one convolutional layer. In this case, for each convolution, the output will be its own image. For example, if the input had dimension w×hw×h, and the layer had nn convolutions with a kernel of dimension kx×kykx×ky, then the output will have dimension n×(w−kx+1)×(h−ky+1)n ×(w−kx+1)×(h−ky+1);
Convolution kernels can be three-dimensional. The convolution of a three-dimensional input with a three-dimensional kernel occurs in a similar way, but the scalar product is also calculated over all layers of the image. For example, to average information about the colors of the original image, a convolution of dimension 3×w×h3×w×h can be used on the first layer. The output of such a layer will be one image (instead of three);
You can notice that applying the convolution operation makes the image smaller. Also, pixels that are located on the border of the image participate in fewer convolutions than internal ones. In this regard, image padding is used in convolutional layers. The outputs from the previous layer are padded with pixels so that the image size is preserved after convolution. Such convolutions are called identical (English: same convolution), and convolutions without image addition are called correct (English: valid convolution). Among the ways you can fill new pixels are the following: zero shift: 00[ABC]00;
border extension: AA[ABC]CC;
mirror shift: BA[ABC]CB;
cyclic shift: BC[ABC]AB.

Another parameter of the convolutional layer is the shift. Although convolution is usually applied sequentially for each pixel, sometimes a shift other than unity is used - the dot product does not count with all possible positions of the kernel, but only with positions that are multiples of some shift ss. Then, if the input had dimension w×hw×h, and the convolution kernel had dimension kx×kykx×ky and the shift ss was used, then the output would have dimension ⌊w−kxs+1⌋×⌊h−kys+1⌋⌊w −kxs+1 ×⌊h−kys+1⌋.

Pulling layer

The pooling layer is designed to reduce the dimensionality of the image. The original image is divided into blocks of size w×hw×h and for each block a certain function is calculated. The most commonly used functions are max pooling or weighted average pooling. This layer has no trainable parameters.

The main purposes of the pooling layer are:

Reduce the image so that subsequent convolutions operate on a larger area of the original image;
increasing the invariance of the network output with respect to small input transfer;
acceleration of calculations.

Inception module

The Inception module is a special neural network layer that was proposed in [2], in which the GoogLeNet network was presented. The main purpose of this module is as follows. The authors assumed that each element of the previous layer corresponds to a specific area of the original image. Each convolution on such elements will increase the area of the original image until the elements on the last layers correspond to the entire image. However, if at some point all the convolutions become 1x11x1 in size, then there will be no features that cover the entire original image, so it would be impossible to find large features in the image.

To solve this problem, the authors proposed the so-called inception module - concatenation of outputs for convolutions of size 1x11x1, 3x33x3, 5x55x5, as well as max pooling operations with a 3x33x3 kernel.

Unfortunately, such a naive approach (naive inception module) leads to a sharp increase in image layers, which does not allow building a deep neural network using it. To do this, the authors proposed using a modified inception module with additional dimensionality reduction - in addition to each filter, they added a 1×11×1 convolution layer, which collapses all image layers into one. This allows you to save a small number of layers, while preserving useful information about the image.

Residual block

Two major problems in training deep neural networks are the vanishing gradient and the exploding gradient. They arise due to the fact that when differentiating according to the chain rule, a very small gradient value reaches the deep layers of the neural network (due to repeated multiplication by small values on previous layers). To combat this problem, a so-called residual block was proposed.

The idea is to take a couple of layers (for example, convolutional ones), and add an additional connection that goes past these layers. Let z(k) be the output of the kth layer before applying the activation function, and a(k) be the output after. Then the residual block will perform the following transformation: a(k+2)=g(z(k+2)+a(k)), where g is the activation function.

In fact, such a neural network is trained to predict the function F(x)−x, instead of the function F(x), which originally needed to be predicted. To compensate for this difference, this shortcut connection is introduced, which adds the missing x to the function.

The assumption of the authors who proposed the residual block was that such a difference function would be easier to train than the original one. If we consider extreme cases, then if F(x)=x, it is always possible to train such a network to zero, in contrast to training many nonlinear layers to linear transformation.

System of Fundamental Principles

A short list of the basic circuit principles that interact in a neural network system:

The principle of exceeding the threshold is found everywhere.
Comparison or the principle of using the comparison result is the most important for the organization of control sequences, associated with the principle of exceeding a threshold.
Feedback, positive and negative.
The principles of self-generation and damping conditions directly follow from this. And direct use: if you see self-oscillations of any kind and any implementation, then there is positive feedback.
Ohm's law, strange as it may seem, is applicable to any system where there is a driving potential, there is something that moves by it, and there is resistance (or the inverse phenomenon of conductivity) to this movement.
The principle of identifying the constituent elements of influencing factors (the more components the perception parameter is divided into, the more accurate and correct the perception) also gives a practical rule: the most important thing is to highlight important signs in the environment, so as not to wave your fists blindly.
The principle of the most universal recognition is also indispensable for effective control systems and it shows how one should learn so as not to miss important links of understanding, without which one can only trust ready-made recipes or try at random.

These lines are perceived by non-schematicians as nonsense, as obvious absurdity to them, but how these principles arise and how they ensure causality is very clearly and consistently shown in the classroom. And, most importantly, the use of these principles allows us to holistically and correctly generalize a huge amount of research data into a fairly specific model of what happens in neural networks.

For the neural network, the following picture is obtained.

1. A neuron as a specialized cell was formed on the basis of pre-existing receptor cells that are capable of producing an electrical signal for stimulation and on the basis of effector cells that, in response to an electrical signal, perform a certain action (muscle contraction or the production of hormones). The neuron has become an intermediate control element based on the patterns of direct influence of receptors on effectors.

2. Like any cell, a neuron goes through a stage of development before maturation, when it becomes capable of exhibiting the necessary functionality, namely: under the influence of an electrical signal (the potential on its membrane that triggers regenerative transformation), it also produces an electrical signal and in this is similar to functions of electrical control devices. And, just as in electrical engineering in the case of sensitive elements, it turns out to be in an unstable state if its inputs are not connected to anything. It is similar to the yoke of a sensitive scale, without any weights at the ends. Such a neuron exhibits spontaneous electrical activity, which is confirmed by research data. This is a very important property.

3. When a neuron is formed, it releases processes towards receptors that exhibit electrical activity and, thereby, characteristically change the chemical composition around them. These processes or dendrites directly touch the receptor axons and synapses—potential electrical connectors—appear at the site of contact. This is the first stage of adaptability to environmental conditions, which manifests itself in the activity of receptors. In this case, the resulting potential connections overlap for neighboring neurons of the layer. This is a very important point: as a result, not single detector neurons arise, but a number of detectors at once, partially or even completely complementing each other. Unlike classical control schemes, where each link is unique and its damage is fatal, multiple duplication occurs in a neural network. In the presented model, this is carried out by the specified profiles of the SET tab, where neurons overlap with neighbors’ connections.

4. Synapses gradually increase their electrical efficiency when there is electrical activity on either side of their cleft. This is the simplest condition for the formation of a bond. A neuron at the stage of pacemaker activity spontaneously generates signals and if this coincides with the activity of the receptors associated with them, then the efficiency of communication increases (due to an increase in the number of neurotransmitter vesicles, but the natural mechanism is not fundamental). The important thing is that the process does not happen instantly, but takes time and during this time the connection strengthens.

5. Due to the dual role of neurons, it acts as an effector for the previous layer of receptors and, at the same time, as a receptor for subsequent ones.

6. In all layers of receptors and effectors, it turned out to be evolutionarily effective to have additional neurons between them, which provide mutual inhibition of active neighboring neurons (lateral or lateral inhibition). This provides qualitative advantages for specialization.

7. The next layer matures after the previous one has specialized. This is fundamentally important. In a natural neural network, each subsequent layer has a critical development period that exceeds the previous one because the frequency of stimuli at this level is significantly lower (and for some other reasons).

It is additionally important that all output signals of any receptors are identical (both in the natural neural network and in the MNS) and themselves do not encode anything, differing only in the duration of activity and timeliness. Their purpose is where they came from and where they went, just like signals in digital devices.

This concept is auto-generating specialization.

This is all modeled in the MNS.

Where can I get education on neural networks?

GeekUniversity together with Mail.ru Group opened the first Artificial Intelligence faculty in Russia teaching neural networks.

School knowledge is enough for studying. You will have all the necessary resources and tools + a whole program in higher mathematics. Not abstract, as in ordinary universities, but built in practice. The training will introduce you to machine learning technologies and neural networks, and teach you how to solve real business problems.

After studying you will be able to work in the following specialties:

Artificial intelligence,
Machine learning,
Neural networks,
Big data analysis.

Features of studying at GeekUniversity

After a year and a half of practical training, you will master modern Data Science technologies and acquire the competencies necessary to work in a large IT company. Receive a professional retraining diploma and certificate.

Training is conducted on the basis of state license No. 040485. Based on the results of successful completion of training, we issue graduates with a diploma of professional retraining and an electronic certificate on the GeekBrains and Mail.ru Group portals.

Project-based learning

Training takes place in practice; programs are developed jointly with specialists from market leading companies. You will solve four data science project problems and apply the skills you learn in practice. A year and a half of training at GeekUniversity = a year and a half of real world big data experience for your resume.

Mentor

During the entire training you will have a personal assistant-curator. With it, you can quickly sort out all the problems that would otherwise take weeks. Working with a mentor doubles the speed and quality of learning.

Thorough mathematical training

Professionalism in Data Science is 50% ability to build mathematical models and another 50% ability to work with data. GeekUniversity will improve your knowledge in mathematical analysis, which will definitely be tested during an interview at any serious company.