Precup grew up in Romania, at the time a communist state. She explained that, in her youth, Sci-Fi movies gave her the desire to understand and create intelligent machines. This ambition has driven her interest in computers from a young age. During her Computer Science undergrad studies in Romania, she was tasked with building a chess-playing program by programming it based on how she would usually play chess. Being somewhat of an amateur at the game, she tried to make the program learn which moves were correct by itself. Later on, Precup moved to study at the University of Massachusetts for her M.S. and Ph. D. and was then hired as an assistant professor at McGill. She specializes in Reinforcement Learning, which boils down to teaching a behavior to a computer by reinforcing it with good or bad payoffs. For example, a chess program might play millions of unassisted matches against itself and be rewarded +1 for a win and -1 for a loss. Over the long run, these programs outperform almost all other Machine Learning techniques, especially for strategy games.
Sci-Fi movies gave her the desire to understand and create intelligent machines.
During the talk, Precup outlined the basics of machine learning today. The two main pillars of Machine Learning are Supervised and Unsupervised Learning. The former consists of feeding the program large sets of data (input) and specifying the desired outcome (output). For example, the task might be to choose whether a given image is an animal, a human or a car. Here, the goal would be to find a prediction of the category a new image belongs in. This task would be too difficult to tackle directly for a computer, as it has too many layers. Deep Learning breaks down these higher-level layers into many lower-level ones that can be executed by the computer. Back to the image example, this means the program would look at color patterns, contours and edges to recognize forms of objects and their color. After a series of tasks, the program would be able determine the new picture’s category. This technique requires much human involvement, which makes it harder to scale and reuse. It is at the core of the technology behind Google’s self-driving cars.
The other approach to machine learning is Unsupervised Learning, in which the program is fed raw data that is usually unlabeled and with no precise desired output. For example, the data might be accelerometer information from a phone, which is highly chaotic. The computer’s implicit task is usually just to find interesting patterns. This can be used in parallel to test a supervised algorithm. It is frequently used for chatbots and facial recognition, or any other tasks necessitating independence from the computer.
Reinforcement Learning (RL) strikes a balance between the two by offering a reward system that guides behavior. This is sort of an extension of Pavlov’s Classical Conditioning ideas to the field of machines. The machine gets trained by trial-and-error over many trials. AlphaGo was a famous pioneering program that used RL to beat an extremely skilled human player at the board game Go, a game that is highly challenging for computers to comprehend and master. AlphaGo used built-in knowledge about thousands of games to develop proficiency and master the game.
This October, Deepmind published a paper that detailed a new algorithm that simply plays itself and gets better over time with rewards. Without built-in knowledge, the processing time was significantly cut down and performance increased. This new program beat the old AlphaGo program 100 times out of 100. AlphaZero was also able to create lethal moves that its predecessor had never seen before. These programs were developed by DeepMind, a company based out of London that made its name by creating programs that perform extremely well in games. It was acquired by Google in 2014 and since then has opened a new research lab at McGill that is led by Professor Precup. Her expertise goes hand-in-hand with DeepMind, since one of the company’s specialties is Reinforcement Learning.
But RL is not used only to beat humans in games. It is also applicable for solving real-world optimization problems. Precup and her team were recently tasked with optimizing power production for Hydro-Quebec’s new hydroelectric dams. Under three constraints regarding turbine speed and amount of water in the reservoir, their algorithm outperformed the Dynamic Programming algorithm. The mean production was higher and the constraints were very rarely violated. Thus, we can see that RL algorithms apply well to many fields, and have the potential to empower decision-making in a variety of applications.
Many observers have recently questioned the ability of these algorithms to function ethically. When put through the test, a Microsoft-developed Machine Learning-based chatbot called Tay was put on Twitter to interact with users and learn from her interactions with others. Quickly, the experiment turned sour and Tay started spewing racist and misogynistic tweets, as a reflection of her interactions with humans. The question remains: If humans are violent and flawed, then how can we create intelligent life that will not display those dangerous features? And if we are not able to suppress these features, then how can we constrain AI so that it won’t endanger humans?
If there is one takeaway from this event, it’s that the future of artificial intelligence and machine learning is bright, and that women, such as Prof. Precup, will play a critical role in the formulation and integration of this powerful innovation.
The opinions expressed in this article are the writer’s own and do not necessarily represent those of The Bull & Bear.