Teaching computers to recognize objects better

Object-recognition software (which tries to identify objects in digital images) is still fairly limited. So, in an attempt to improve it, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have created a system that, in effect, allows humans to see the world the way an object-recognition system does.

The system takes an ordinary image, translates it into the mathematical representation used by an object-recognition system and then, using inventive new algorithms, translates it back into a conventional image.

The researchers report that, when presented with the retranslation of a translation, human volunteers make classification errors that are very similar to those made by computers.

That suggests that the learning algorithms are just fine, and throwing more data at the problem won’t help; it’s the feature selection that’s the culprit.

The researchers are also hopeful that, in addition to identifying the problem, their system will also help solve it, by letting their colleagues reason more intuitively about the consequences of particular feature decisions.

Today, the feature set most widely used in computer-vision research is called the histogram of oriented gradients, or HOG (hence the name of the MIT researchers’ system: HOGgles). HOG first breaks an image into square chunks, usually eight pixels by eight pixels. Then, for each square, it identifies a “gradient,” or change in color or shade from one region to another. It characterizes the gradient according to 32 distinct variables, such as its orientation — vertical, horizontal or diagonal, for example — and the sharpness of the transition — whether it changes color suddenly or gradually.