Since July, a computer in Pittsburgh has been doing nothing but looking at millions of pictures, 24/7. Each minute, it flips through another thousand images of mundane, everyday things like cars and airplanes, turtles and geese. And little by little, this machine is learning about what it sees.
The Never Ending Image Learner (NEIL) at Carnegie Mellon University is part of a rapidly expanding new model of computing in which relying on human input is so 1990s. Computers of this new era, scientists say, can think and recognize the world for themselves. In about four months of running through images, NEIL has already identified 1,500 types of objects and gleaned 2,500 concepts related to the things it sees. With no input from programmers, NEIL can tell you that trading floors are crowded, the Airbus A330 is an airplane, and babies have eyes.
“The more it goes on, the more it will learn and come close to a child [in intelligence],” Abhinav Gupta, an assistant research professor in Carnegie Mellon’s Robotics Institute, tells Newsweek. Gupta and his two doctoral student partners call the machine “never ending” for a reason; they’re curious to see how long it can keep building on the information it learns, which they compare to the “common sense” knowledge humans acquire all the time.
Gupta’s work is part of a field known to robo-researchers as computer vision. It began in the 1970s, when military scientists became interested in whether computers could recognize airplanes or tanks on the battlefield. The first attempts, says Pietro Perona of the California Institute of Technology, focused on edges. Because a digital image is just numbers, it would be a big deal if a machine could find “the distinguishing patterns of lines,” Perona tells Newsweek.
By 1995, machines were laboring to process a single image, says Lisa Brown, a computer vision researcher at IBM, who earned her Ph.D. at that time. “Is the road at an angle? We were just trying to answer that,” Brown recalls.
As the technology became more advanced, computers could be programmed to understand categories of objects – lamps or coffee mugs, for example – based on shapes and textures. Humans typically recognize 30,000 categories of objects out of a growing number of perhaps more than 1 million, Perona says.
Today, computer vision is becoming a bigger part of our lives. Facebook and Google Glass, for example, have the power to recognize faces. In the future, scientists predict computer vision will help doctors spot tumors, enable military bombs to find their targets, and allow retailers to find out just what kind of handbags you prefer to post on Pinterest.
But there’s one problem: Nobody wants to sit around all day labeling “face,” “tumor,” “Prada” for each new image variation. That would make computer vision too expensive (and exhausting) to be practical. That’s where artificial intelligence comes into the picture. In the past 10 years, the race has been to find better, faster ways to help computers learn visual recognition on their own.
“Initially, we were not sure whether this would happen,” Gupta says. But in the end all it took was a small trick: using Google image search, where the pictures come already labeled and by the millions. After Gupta’s group plugged in the initial algorithms and told NEIL what to learn about, machines did the rest. NEIL and its 200 central processing units have combed through millions of pictures, analyzing scenes, objects, attributes and the relationships between them.
The only time NEIL needs help, they say, is when it runs up against tricky homonyms. Pink is a color and a pop singer; apple is a fruit and a company. NEIL can’t tell the difference until a human being intervenes. But, “this is the first step in common-sense knowledge,” Gupta explains.
A world of humanoid robots that can understand the environment and interact with us may not be far off. Voice recognition is in your smartphone. Machines can mimic the properties of taste. At MIT, a whole department called Brain and Cognitive Sciences is devoted to understanding the neurological programming that lets humans learn, and the faculty is full of computer scientists.
Meanwhile, at IBM, Brown says one of her colleagues is building a robot that can learn by sight and sound and is intended to help elderly people. It could understand, for example, “This is the medicine I need to take in the morning,” she says.
“What we’re trying to figure out,” Brown says, “is how to learn how to learn.”