The imitation game

Nearly 70 years after turing’s paper, the imitation game remains a fundamental benchmark of artifical intelligence. However, at the time where chatting bots nearly pass the Turing test, such as the well known CleverBot (1), the limitations of this famous test are in sight as no one considers this one bot to be intelligent. One may also wonder whether or not it is desirable to produce intelligent machines, and what limitations they still bear nowadays.

The following section attempts to address several questions: we first illustrate situations in which machines that are indistinguishable from humans by other humans are readily available, and study them to see if their behavior matches the one of humans and whether the mere fact that they cannot be spotted is desirable.

A review of the limitations of Turing’s imitation game follows, to underline the current challenges that we would prefer to assess our technology on. As of today, is Turing’s test becoming obsolete?

1. The memex machine

There are several tasks in which machines already outperform humans. As envisaged in [20] with the memex concept, machines finely tuned for specific tasks can rapidly exceed human performance. The original draft of the memex described a machine which is able to infer hyperlinks from its sole utilization, and it is interesting to notice that such a device is still not available as of today. One could argue that Google and other search engines are trying to build such a topology, but we are not yet at the level described, in which the whole management is done by the machine, including link addition or removal.

Stock exchange bots

Other areas of interest include the stock exchange, where fast micro transactions made by robots now drive much of the world economy [38]. In such a situation, one can question the benefit of human performance and mankind’s ability to take over a machine-driven system. The bots, called “trading bots”, do not think either: the first generation of such

bots used an empirical model based on how humans usually act in stock exchange, but now different models (even self-learning models [42]) have been developed to keep on resembling how humans place orders, or at least be most of the time unnoticeable, without exaggeratedly acting like them. Those robots wouldn’t surely pass any generalized Turing test, but allegedly are better than us for more dedicated tasks.

Playing bots: manly enough?

A modified Turing test developed by Hingston in [86], whose main research topics include video games AI, is deemed more promising. The goal of his work is to discriminate bots from human players in video games, although he still criticizes

the Turing test itself, arguing that it does not prove a machine to be intelligent or not.

Fig. 32 High Frequency Trading now constitutes the majority of the volume exchanged on the stock market, and the amount so exchanged has only been rising. Source: Nanex, U.S. Global Investors

section 4

The test, called 2K BotPrize (2) , is as follows: “Could you tell, solely from playing a game, whether your opponent is controlled by a human or by a bot? Passing the test is a bot that goes undetected.”. A competition has been held each year since 2008 to try to develop such bots, and as outlined in [86], the test has been passed in 2012. Obviously, the proposed bot has not proved to fool all humans, especially recurrent players of the underlying game, but a non-expert jury was not able to discriminate the bots. The whole contest strongly resembles Turing’s imitation game, because the whole

process is built in the same way ; a human must distinguish between a human and a machine.

This contest itself has a historical background upon which it builds: the Loebner Prize [91]. The goal was here to assess whether a chatbot could pass the real - or near-real - Turing test, and as of 2008, no chatbot ever had succeeded. However, because of the widely accepted view that bots which are the most ready to pass the test are not intelligent, but only behave like humans within special conditions, the question for a Turing test for video game bots becomes more legitimate.

In order to imitate humans gaming, several techniques have been developed across the years: behavior-based techniques from robotics [94], hidden semi-Markov models [51], or other “semiotics” models [112]. However, some learning techniques have also been applied, such as reinforcement learning in [9] to try to emulate human-like behavior. Early results showed that playing against more human-like bots was more fun for real players, thus motivating the industry.

In the case of playing bots, the first question, whether machines should act as humans or not, can be positively answered. Clearly, developing AI that go unnoticed in video games is something that even the community is asking for.

It is interesting to further analyze the different approaches taken by some participants in the 2K BotPrize. Whereas for the Loebner prize a chatbot receives the same inputs as the human does (text), the representation of the game is not the same for a human and a robot: we have a visual representation of the world, whereas the robot only uses information of the volumes and positions. In order to increase the challenge, game bots will have to use some dedicated messages that will allow communication between the game player they control and themselves, as much as possible in the way humans would play. They will also be notified through code callbacks of events such as game starting, player hurt, etc. Obviously, with such information as “a bullet is coming towards you”, a bot could almost always dodge it, but using this information perfectly can also lead to being detected.

The competitors of the first session had used several different techniques to try to convince the judges:

U Texas

A bot that used mathematical models to minimize life lost while dodging missiles, and maximizing chances of killing by selecting the most efficient weapon based on the distance to enemy, the precision of weapons and their power.

AMIS

A modification from a “death match unbeatable bot” (understand a bot which uses all knowledge to accurately always hit), that learns how weapons are used by playing. When playing, this information contains imprecisions, and the bot can exaggerate the mistakes.

Underdog

which tried to copy often seen human behavior, such as waiting random

delays, seeking players even if eye contact was lost, etc.

based on a state machine with the discovery and fighting states, and using

tricks for decreasing accuracy.

uses reinforcement learning for the selection of weapons, and for deciding

when to transition from four internal states describing the several possible activities

in the game.

The sources underline “the fatal mistake” all bots made: they at least once behaved in a way a human is really not much likely to play, and this was enough to forge the opinion of the judges. The conclusions of those judges for improving the quality of the bots are to “eliminate obvious repetitive behaviors”, “appear to adapt their play”, or “exhibit sound usage”. These relate back to the original Turing test, in the sense that lacking such behaviors is a clear sign of non-intelligence. These improvements might require the use of learning capabilities to be fully achieved, and this is really positive for the Turing’s paper whose conclusions remain quite unchallenged.

CAPTCHA’s

In the imitation game, machines can also take the place of the judges: this is called CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) in [70]. Proving that a computer will never be able to pass

a CAPTCHA-like test is not possible according to most authors. Indeed, it would mean that there exists tasks that a human can do but computers will never be able to. The authors strongly believe that one day the computers will be more or at less

as good as humans in almost every aspect. Instead of proving such a statement, they limit themselves at the strict requirements: showing that, given the current state of technology, it would be very hard to write a program that passes the test.

Indeed, with the rise of Convolutional Neural Networks, some scientists consider that mankind has finally solved Polanyi’s 1966 paradox: we as humans have always been able to do more than we can tell. So could computers, that learn processes we cannot understand and even less explain, but create models that actually do work when applied.

CAPTCHAs are not inviolable per se. There exist techniques to go around them (for instance OCR [44]). One of them is a method called stealing human cycle. Imagine a company that owns a website wants to create thousands of free email

accounts. It will develop a bot that will register those accounts automatically with an email service provider. Suppose further that the account creation process is protected with a CAPTCHA. The bot, unable to solve the CAPTCHA, will send the test to the company website. Once a user of this website has solved the test himself, the result will be sent to the bot which will then be able to proceed to the account creation. This issue (also known as human farm) is still an open question nowadays. As one can see, the work of Alan Turing had many impacts, many of which, like the CAPTCHA, as still widely used nowadays.

Turing’s famous imitation game can therefore also be used for the very purpose of discriminating people from machines. It is often not desirable for machines to behave as humans, but they are definitely able to do so for an increasing deal of actions.

Fig. 33 How humans can be used to solve CAPTCHA. This is called human farming.

2. Learning and behaving : the machine and the animal

The imitation game has shaped many scientific great achievements that could not have been reached without a global abstraction of the notion of intelligence. There are many other fields that used his notion (one may think of medicine, with disease detection in [62], etc.). However great might have been his impact, it is increasingly believed today that it does not discriminate between humans and machines sufficiently accurately, and this limitation definitely motivates further efforts.

In [75], researchers try to develop “emotional Turing test”, by creating a chatting bot that will drive a human to interact at an emotional level with it. The virtual intelligence should then use social media models to have at least a bearable conversation for some time.

The study shows that, for as long as people think that they are still interacting with a real human, they have greater emotional arousal, which drive the authors to believe that intelligence could embed the notion of social network, and require several brains to be fully achieved.

Finally, the very question of designing a test that assesses intelligence cannot be answered. Why is that so? Mainly because intelligence is not yet well-defined by the scientific community. Turing defined intelligence in his own way, so that he was able to design a test, and to open the doors to the spiral of applications studied nowadays. But to go further, we will have to step back not one step, but two, to define intelligence including emotion and maybe even some relational, social factors. Only then will we be able to build outstanding applications until the next difficulty, which we still cannot even embrace yet, eventually kicks in.

Integrating intelligence into machines is a major challenge in artificial-intelligence related fields. More specifically, researchers aim at incorporating the intelligence of animals (especially the one of human beings) into machines, so that machines could benefit from core advantages of their way of thinking and learning to assist people even better in their lives. In order to achieve this very ambitious goal, researchers and engineers first need to grasp how animals reason and learn and what features make their intelligence so unique and superior to the way current machines work.

The recent work by Lake et al. [16] illustrates the continuing interest to compare human and machine intelligence. In this paper, the authors perform an in-depth comparison between the learning process employed by humans and the one used by machines and explain what are the key aspects we have to take into account when building human-like machines. To this end, Lake et al. base themselves on advances in cognitive science. The most important points underlined by the authors are the fact that humans can learn with much fewer examples at hand than a device and that the humans’ range of thinking and learning skills is broader as well as more flexible. Machines are still limited to the task they have been trained for.

Challenges for building more human-like machines

Lake et al. consider two challenges, namely the Characters Challenges and the Frostbite Challenge, to underline the differences between humans and machines when solving problems and learning. The Characters Challenge is the classic problem of handwritten character recognition, which includes almost all of the most important aspects of artificial intelligence. Even though the best results obtained with deep convolutional networks are very close to the human’s performance (error rate of 0.2%), there are at least two striking differences between the way humans and machines

face this challenge: (i) humans only need a handful of examples in order to learn how to discriminate between the different characters [17, 37], whereas neural networks rely on large datasets for training, and (ii) people do not simply carry out

pattern recognition but learn a kind of concept, which allows them to even generate new examples. In the end, people are still better learners for this kind of task than the best character recognition algorithms.

The Frostbite Challenge consists in playing the Atari game Frostbite. In a nutshell, one has to build an igloo piece-by-piece within a time limit, by jumping on ice floes in water. Again, the authors compare humans and machines by analysing how they learn to play Frostbite. They point out that a deep Q-learning network (DQN) achieved similar performance compared to a professional gamer [129]. However, a striking fact is that the human player only trained for about two hours while 924 hours were necessary for the DQN. The performance with respect to the training times for different DQN are depicted in Figure 34. Another interesting observation is that the DQN heavily relies on the incremental rewards one can earn while building its igloo to progress (this is a typical characteristic of reinforcement learning),

whereas this component is not vital for human players.

Fig. 34 Frostbite Challenge – While humans only need two hours of training to reach top scores, we can see that the different versions of the DQN need a significant number of hours to only reach mediocre performance. Only the DQN++ reaches similar performance, after 924 hours of training.

(Courtesy of Lake et al. [16]).

How to build machines that can think and learn like people?

First of all, the authors argue that machines should rely on model building rather than simply on pattern recognition to give them the possibility to understand and explain their environment. The main difference between model building and pattern

recognition is the way a task at hand is considered: solving a problem with a model implies using explanation and understanding of the world (like the humans do [80, 33]), while solving a problem with pattern recognition means seeing the task as a classification or regression problem without trying to understand why the solution fits the task. Second, machines should incorporate intuitive physics as well as intuitive psychology , similarly to infants. Infants have primitive object concepts and also very early understand that people have goals and beliefs, which strongly shapes their learning process [36, 98, 8, 110]. The third and last ingredient considered in [16] is learning-to-learn.

Human-like machines need to learn rich and informative models of the world so that they can learn how to perform new tasks much faster and generalize knowledge. Once again, the authors present the example of children, who are driven by the desire to uncover the causes of rare events and to use this knowledge to go beyond the paucity of the data. Lake et al. suggest compositionality , i.e. the concept stating that new representations can be constructed through the combination of primitive elements, to enhance learning-to-learn. According to the authors, compositionality is an essential asset for productivity, as an infinite number of elements can be built with the help of a finite set of primitives, just as the mind can think an infinite number of thoughts or understand an infinite number of sentences. Deep neural networks integrate a limited notion of compositionality.

Back to IMPACTS

footnote

INGE 0012

R.Sepulchre

2017