The emergence of machine learning

1. From MacCulloch and Pitts to Deep Learning

There is a remarkable continuity from the early model of McCulloch and Pitts [74] to the latest developments of machine learning in the area of deep learning . This section will retrace the main steps in this evolution, emphasizing the historical ups and downs of the same concept.

The perceptron’s creation

McCulloch and Pitts’ model has has changed the way we look at neural networks. Their approach was key to detach the study of artificial neural networks as a branch of computer science rather than biology. The first significant improvement was brought by Rosenblatt in 1958 [100], who introduced a learning technique called the perceptron (Figure 30). This corresponds to a new type of neuron model which differs from the original in a few ways. First of all, perceptron neurons are linked with updatable weights whereas McCulloch and Pitts’ only had exhibitory or inhibitory inputs. By doing this, Rosenblatt removes the need to define a priori inhibitory or exhibitory neurons. This step moved the field further away from physiological neural networks. The perceptron is able to learn by adapting its input weights in order to minimise the error observed at the output whereas McCulloch and Pitts’ model was purely static and unable to adapt. Rosenblatt’s perceptron can be seen as the predecessor of the learning process of artificial neural network whereas McCulloch and Pitts’ are responsible of their structure.

Fig. 30 Rosenblatt’s neuron [100]. This model represents the behaviour of a perceptron neuron

Rule-based system over neural networks

Rosenblatt’s perceptron still had many limitations, one of them being the incapacity of learning non-linear functions, and thus the inability of solving the “XOR” problem. Indeed this problem can be seen as associating one of two class to the four

vertices of a square. The vertices on a same diagonal are of the same class. It is thus impossible to find a single line which would separate perfectly the two classes. A non-linear function is needed in order to solve the problem. After Rosenblatt’s

perceptron model, progress was very slow and very few enhancements were made. People’s expectations were deceived and this resulted in the first “AI winter”. In the 1970s funds for machine learning were very limited and the field itself quite despised.

Nevertheless, the 1980's saw the arising of new methods based on decision rules (further called decision trees). The first of these techniques was proposed by Quilan in 1986 [92] and was updated in 1993 [93]. Overall, these methods were quite successful and revived the interest in machine learning. Due to lack of computational power and data, neural networks were surpassed by rule-based systems in the 90’s and thus remained overlooked for a while even though machine learning as a field continued to evolve.

Multilayer perceptron and backpropagation

Despite the lack of interest in neural networks in the 80s, the key backpropagation technique was discovered. It was introduced by Rumelhart, Hinton and Williams in 1986 [103] and allowed the learning of multiple layer perceptron. Another important simplification had to be introduced. Indeed, the neural networks must be feedforwad for the algorithm to work, this was another step away from biological networks since cycles are not allowed in feedforward architectures whereas many cycles appear in physiological neural networks. This was nonetheless a huge breakthrough for artificial neural networks. A milestone paper came from Yann Lecun in 1989 [67]. He used backpropagation with a convolutional neural network to carry out digit recognition (Figure 31). Convolution is another feature that can be added on neural networks in order to introduce linear filters at the network’s entry. This has proven to be especially effective on signal processing problems (image recognition, speech recognition, time-series prediction, ...). However, even if Lecun had great success with neural networks for his digit recognition problem, neural network were not regarded at the same level than decision rule systems. they suffered from stabilisation and overfitting problems. Learning was far more difficult with neural networks than it was with decision tree methods, often leading to better results.

Fig. 31 Lecun’s neural architecture for digit recognition. This architecture uses convolutional layers (which are used to introduce structure into the network) as well as backpropagation to be trained. This was taken from [67].

Kernel-based algorithms : support vector machines

A third type of algorithm appeared in the 90s and was immediately adopted by the machine learning community, further diminishing the interest in neural networks. Kernel-based algorithms were in fact an enhancement of support vector machines

(SVM) which had been introduced in 1963. In 1992, Bernhard E. Boser, Isabelle M. Guyon and Vladimir N. Vapnik suggested a way to create nonlinear classifiers by applying the kernel trick to maximum-margin hyperplanes ([14]). This technique

can be seen as bending a plane in order to modify distances between points, allowing to learn non-linear features. This was well received since it was based on a solid mathematical model that could be analysed and understood very easily with mathematical and statistical tools. From thereon, decision trees, random forests and SVM reigned as masters in the machine learning community whereas neural networks only had a backstage role. This situation lasted until very recently.

Deep learning : artificial neural networks’ new life

There are two main factors that contributed to the comeback of artificial neural networks. Surprisingly they are not linked to the learning model itself. The recent popularity surge of neural networks is primarily due to the huge increase in computation

power and data availability. Thanks to the ability of using GPUs to great effect, the training of networks with a huge amount of hidden layers is now possible, leading to the name of “deep learning”. Amazing results were achieved using these

models, and one of them is the 2012 imagenet competition where neural networks have achieved a score two times better than the next best method. This was the last turning point in machine learning. Since 2012, an enormous amount of attention

has been given to artificial neural networks and the interest around other machine learning techniques is at an all-time low.

Enhancements and current evolution

The deep learning model is in fact very close from the multi-layer perceptron model, but a few changes have been made recently. One of them is the use of rectified linear units (ReLU) as activation functions instead of sigmoids, the other one is the

dropout method ([111]) which is now widely used. As we can see, neural networks have had their up and downs, and are currently at a higher esteem than they ever were. This incentives scientists to carry out their research in the field and has led

to a few enhancements recently even though nothing major had happened since the 1990s. Machine learning is now discussed by in the wider public thanks to major companies investing in the field.

Will deep learning maintain its promises ?

We will now need to see if deep learning is able to maintain its promises or if it disappoints the general opinion. If that is the case, maybe a new winter for artificial neural network will be witnessed. However, the funding in AI and machine learning

has never been so high and we are now able to use machine learning algorithms to introduce knowledge that the human is unable to explain to the computer. Such algorithms have been used to great effect in a multitude of fields, some of them

being about imitating humans. The study and the implementation of the nerve net model described by McCulloch and Pitts in one of the central paper ”A logical calculus of the ideas immanent in nervous activity” [74] have had, over the years, a domino effect in many fields. It has been, for instance a solid base regarding the first approach of ANNs, which is considered to be one of the most popular algorithm for Machine Learning.

As explained here ANNs have described a long way. For instance, in order to obtain the current level of iteration several implementations have been tried. Thanks to the increase of computational power and data availability, it has been possible to implement the ANN’s algorithm to tackle many applications in fields such as Aerospace, Biomedical, Business, etc.. Now the question is: Is the generic nature of machine learning algorithms an advantage or a disadvantage?

2. Successes and limitations of machine learning

ANNs are considered to be very efficient computing models which have shown their power in solving hard problems in artificial intelligence. However, one of their major flaws is their black box nature.

Indeed a black box can be seen as a system from which inputs and outputs are known but the knowledge regarding the internal working is missing. An input S has an effect on the box and therefore on the reaction R which emerges as an output [18]. The structure and the constitution of the box is considered irrelevant, in other words only the input-output relations of the system will be accounted for.

Due to the difficulty of understanding their underlying process, ANNs are somehow and sometimes classified as black box model and therefore numerous researchers refuse to use them. In fact, this aspect is a significant weakness for them.

Actually, whitout the ability to produce comprehensible decisions, it is hard to trust the reliability of networks addressing real-world problems [57]. The problem of extracting the knowledge learned by the network, and representing it in a comprehensible form, received a great deal of attention in the literature [6] [99] [26]. In order to fill the gaps, a fuzzy rule-based system (FRBS’s) has been developed in the last few years using fuzzy logic. The algorithms proved their strengths in challenges such as the control of complex systems, producing fuzzy control. Fuzzy set theory also provides an excellent way of modelling knowledge [57]. In particular FRBS’s enable the use and manipulation of expert knowledge stated using natural language. Thus, the knowledge is easy to understand, verify, and, if necessary, refine. Recently, a great deal of research has been devoted to the design of hybrid intelligent systems that fuse sub-symbolic and symbolic techniques for information processing [121]. Overall, it creates a synergy between ANNs and FBRS’s which permits the combination among the features such as the robustness and learning capabilities of ANN’s with the white box (1) character of FBRS’s. Currently other machine learning algorithms, such as random forest, are used to extract rules from data. However, these algorithms often do not offer the same performances that neural networks provide and are thus less prone for being used in wrong situations.

During the last decades, Machine Learning (ML) has been an active research area achieving many successes in different applications. It can help for instance alleviating problems typically suffered by researchers in the field of medicine, such as saving time for practitioners and providing unbiased results [32]. Moreover, the large amount of data in medicine mixed with the common reduced sample size of pathological cases, makes indispensable the use of sophisticated techniques of machine learning for clinical interpretation and analysis. ML is a breach of Artificial Intelligence that appeared from the evolution of pattern recognition, probability theory optimization and statistic. Its main goal is to permit computer programs to learn from data, building models that recognize common patterns and being able to make smart and accurate predictions [117]. However learning from data is not an easy task. Data sets are quite often characterized by incompleteness, i.e. often many values are missing. Most of the time there is random or even systematic noise in the data which causes in-correctness. It could also happen that the parameters are not adequate or at all complete for certain tasks [117]. All things considered, it can be said that even if ML in medical environment could be very useful and could achieve target easier and faster, a huge amount of data is needed in order to let the algorithms work. Regretfully, in a clinical field, it is quite hard and expensive obtaining data from humans and therefore it is though to let ML algorithms work for such applications.

Another big field where ML is rather involved with, is surely robotics. In the last decades robots have performed sophisticated manipulation tasks such as grasping convoluted objects, tying knots, and carrying objects around complex obstacles. These actions need control algorithms that most of the time employ search algorithms to find satisfactory solutions, such as a certain trajectory to a goal state or a sequence of contact points. Most of those demonstrations have already an environment set which permits the performance of the motions [24]. However, current algorithms are not able to be adaptable to different situation. Therefore, it is presumable that robot makes mistakes while handling situation for which it has never seen the setting while learning. For instance, if a robot is trained to pick up a pen which lies at given coordinates, it will surely manage to learn how to grab it. However, if the pen is moved from the initial point, the robot will completely fail its task. One of the main reason why this happens is the high noise space. Robotics challenges can inspire and motivate new Machine Learning research as well as being an interesting field of application of standard ML techniques. Researchers think that one way to solve the fundamental problems of robots is to let them be able to make proper use of perceptual stimuli such as vision, proprioceptive and tactile feedback and translate these into motor commands [58]. Of course there are several important issues among those steps before even thinking to make it realizable, for example perceptuo-action coupling, imitation learning, movement decomposition, probabilistic planning problems, motor primitive learning, reinforcement learning, model learning and motor control [58].

It has been demonstrated that one field where humans can rely on the usage of ANN is surely in terms of predictions . For example the Paper:”Research on the Application of Artificial Neural Networks in Tender Offer for Construction Project”

by Q. Shanshan Z. Minli, treats the Implementation of ANN using Matlab for predicting the trend of the offers in order to solve the problem of tender offer in the construction industry from various perspectives. The principle starts with the identification of the different data that drive the tender offer, which will successively constitute the input nodes of the network to construct iterative operations. Another great example where ML successes obtaining good results is surely imagenet competition , which is well known in the domain of AI.

In conclusion, if one wants to go beyond the actual status of ML needs first to understand very well what ML is not. Second, if ML algorithm needs to be used, one must remember that the used data have to be as cleanest as possible. Indeed, ML algorithm are considered ”data hungry” in order to achive good results [128]. Considering then the large number of successes that ML had and still has in numerous science; in astronomy, algorithms sift through millions of images from telescope surveys to classify galaxies and find supernovas. In biomedicine, it is possible to predict protein structure and function from genome sequences and discern optimal diets from patients clinical and microbiome profiles. The same methods will open up vast new possibilities in medicine. A striking example: algorithms can read cortical

activity directly from the brain, transmitting signals from a paralysed human’s motor cortex to hand muscles and restoring motor control [15].

These progresses would have been unimaginable without ML to process realtime, high-resolution physiological data. However an high level of caution is required. Even though ML algorithms have been very successful for different applications, limitations (i.e. data missing, not enough knowledge on the problem, not having meaningful variables to predict the output) need to be considered and evaluated.

Back to IMPACTS

footnote

1. Subsystem from which the internal working is known but usually modifications are not allowed

INGE 0012

R.Sepulchre

2017