AI system solves SAT geometry questions as well as average American 11th-grade student

An AI system that can solve SAT geometry questions as well as the average American 11th-grade student has been developed by researchers. The system, called GeoS, uses a combination of computer vision, natural language processing to read and understand text, and a geometric solver.

If these results were extrapolated to the entire Math SAT test, the computer roughly achieved an SAT score of 500 (out of 800), the average test score for 2015. These results, presented at the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) in Lisbon, Portugal, were achieved by GeoS solving unaltered SAT questions that it had never seen before and that required an understanding of implicit relationships, ambiguous references, and the relationships between diagrams and natural-language text.

The best-known current test of an AI’s intelligence is the Turing test, which involves fooling a human in a blind conversation. “Unlike the Turing Test, standardized tests such as the SAT provide us today with a way to measure a machine’s ability to reason and to compare its abilities with that of a human,” said Oren Etzioni, CEO of AI2. “Much of what we understand from text and graphics is not explicitly stated, and requires far more knowledge than we appreciate.”

GeoS is the first end-to-end system that solves SAT plane geometry problems. It does this by first interpreting a geometry question by using the diagram and text in concert to generate the best possible logical expressions of the problem, which it sends to a geometric solver to solve. Then it compares that answer to the multiple-choice answers for that question.

This process is complicated by the fact that SAT questions contain many unstated assumptions. For example, in top example in the SAT problem above, there are several unstated assumptions, such as the fact that lines BD and AC intersect at E. GeoS had a 96 percent accuracy rate on questions it was confident enough to answer. AI2 researchers said they are moving to solve the full set of SAT math questions in the next three years.

An open-access paper outlining the research, “Solving Geometry Problems: Combining Text and Diagram Interpretation,” and a demonstration of the system’s problem-solving are available. All data sets and software are also available for other researchers to use.

The researchers say they are also building systems that can tackle science tests, which require a knowledge base that includes elements of the unstated, common-sense knowledge that humans generate over their lives.