Computers can now know what were doing

Scientists have developed a program called NeuralTalk, capable of analyzing images and describing them with sentences. The program and the accompanying study is the work of the Stanford Artificial Intelligence Laboratory. The software is capable of looking at pictures of complex scenes and identifying what’s happening.

A picture of a man in a black shirt playing guitar, for example, is picked out as "man in black shirt is playing guitar," while pictures of a black-and-white dog jumping over a bar, a man in a blue wetsuit surfing a wave, and little girl eating cake are also correctly described with a single sentence. In several cases, it’s unnervingly accurate.

Like Google’s Deep Dream, the software uses a neural network to work out what’s going on in each picture, comparing parts of the image to those it’s already seen and describing them as humans would. Neural networks are designed to be like human brains, and they work a little like children.

Once they’ve been taught the basics of our world, that’s what a window usually looks like, that’s what a table usually looks like, that’s what a cat who’s trying to eat a cheeseburger looks like, then they can apply that understanding to other pictures and video.

It’s still not perfect. A fully-grown woman gingerly holding a huge donut is tagged as "a little girl holding a blow dryer next to her head," while an inquisitive giraffe is mislabeled as a dog looking out of a window. A cheerful couple in a garden with a birthday cake appears under the heading "a man in a green shirt is standing next to an elephant," with a bush starring as the elephant and, weirdly, the cake standing in for the man.

But in most cases, these descriptions are secondary guesses, alongside the elephant suggestion, the program also correctly identifies the cake couple as "a woman standing outside holding a coconut cake with a man looking on."

The incredible amount of visual information on the internet has, until recently, had to be manually labeled in order for it to be searchable. When Google first built Google Maps, it relied on a team of employees to dig through and check every single entry, humans given the task of looking at every number captured in the world to make sure it denoted a real address.

When they were done, and sick of the tiresome job, they built Google Brain. Where it had previously taken a team weeks of work to complete the task, Google Brain could transcribe all of the Street View data from France in under an hour.

See original article for full details…