Kaggles algorithms show machines are getting too good at judging humans

A San Francisco-based startup that hosts data science competitions, has uncovered some disconcerting insights about human behavior in its two-year run. At times, its founders have been surprised by the accuracy of an algorithm, and the competitions continue to evoke controversy.
In short, data can be dangerous. I caught up with the company’s founder and CEO, Anthony Goldbloom, to find out more about recent data-driven discoveries that have rocked the boat.
1) “The Essay Scoring Competition”

Sponsor: Hewlett Foundation / Prize: $100,000
Goal: To get the computer to give an essay the same score a human grader would.
The idea was that by analyzing spelling and punctuation, as well as sentence structure, an algorithm could give an essay a reliable score, perhaps even more consistent than a human grader.
Martin O’Leary, a glacier scientist at the University of Michigan, was one of hundreds of competitors from around the world. He told Reuters that he discovered that human graders are rarely in agreement. They are swayed by irrelevant, aesthetic factors like how neatly a student writes. Unlike an algorithm, they award scores that seem random.
“The reality is, humans are not very good at doing this,” said Steve Graham, a Vanderbilt University professor who has researched essay grading techniques in an interview with Reuters.
A controversial study, indeed. Reflecting on the competition, Goldbloom was not initially convinced it could be done. ”I remember thinking: Are we going to be falling flat on our face? It’s really hard to take an essay and give it a grade,” he recalled. The biggest obstacle was to find a team with the requisite machine learning expertise and the ability to deal with unstructured data, including text.