I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

Ecclesiastes 9:11

Abstract art grandmasters score like class D amateurs

Recently Hawley-Dolan and Winner [1] reported their experiment where they had shown to 32 art students 30 pairs of paintings. In each pair, one painting was by a renowned abstract expressionist and another by a child or by an animal (monkey, gorilla, chimpanzee, or elephant). They asked the subjects which painting is better. In the case when the paintings were unlabeled, the art students in 67% of the cases said that the painting by a renowned artist is better. Hawley-Dolan and Winner concluded that their findings “challenge the common claim that abstract expressionist art is indistinguishable from (and no better than) art made by children.” Do they really?

Four years ago, I published [2] the results of my “True art, or fake?” quiz [3]. It consists of a dozen pictures. Some of them are masterpieces of abstract art, created by famous artists.  The rest are fakes, produced by myself. The takers are to tell which is which. The average score received by fifty-six thousands quiz-takers is 66% correct. Though, on average, masterpieces won over fakes, one fake was selected as masterpiece more often than one real masterpiece. Hawley-Dolan and Winner do not report the statistics for separate pairs of paintings, only the average number. Since their average number is close to mine, I suspect that in Hawley-Dolan and Winner experiment some monkeys had won over some artists. In such case, their argument is very unconvincing. For if there is a gap between apes and artists – there is a quantitative difference. If there is a quantitative difference, one can argue that it is big enough to be qualitative.  However when artists and apes overlap – it is obvious that there is no qualitative difference.

Now let us turn to the magnitude of the quantitative difference. What does it say about the difference in quality? In [2] I answered the question using a comparison with one classic study [4]. A hundred years ago psychologist F.M. Urban performed an experiment on just perceptible differences [4]. He asked the subjects to compare a hundred-gram weight with a set of different weights. When the weights were close, the judgment was poor. However, statistically, the lighter weight appeared to be heavier in less than half of the cases.   For example, a 100-gram weight appeared to be heavier than 96 and 108-gram weights in 72% and 9% of trials respectively.

An abstractionist is judged better than a child/animal in 67% of the trials, while a 100-gram weight is judged heavier than 96-gram weight in 72% of the trials. This means that there is less perceptible difference between an abstractionist and child/animal than between 100 and 96 gram. Thus, if we assign to the artistic heavyweights used in [1] the weight of 100 artistic grams the weight of child/animal paintings is more than 96 artistic grams. The difference in their weights is 4%.

The boxers are divided into seventeen weight classes according to their weight [5]. The heavyweights are those weighting above 190 pounds. The category just below heavyweights is called cruiserweights and includes people weighting between 175 and 190 pounds. The difference between upper and lower bound of cruiserweights is about 8%. Since the difference between abstract art heavyweights and monkeys is only 4%, the monkeys must either be artistic heavyweights as well or belong to the weight category, which is just below heavyweights.

More simple and instructive is the comparison with chess. In 1960s, Arpad Elo developed a rating system for chessplayers, which is now in common use [6]. Elo rating is a number computed based on player’s performance, which can be used to compare players’ relative strength. When the difference in rating between players is 200 points the probability that a higher rated player will win the game is 77%. When the difference is 400 or 600 the corresponding probabilities are 93% and 98% respectively. The players are divided into nine categories (see Table 1), from world championship contenders to novices, with the difference between bordering categories of 200 points.

Table 1. Elo rating of chessplayers.

Elo rating

Player category

2600 and up

World championship contenders

2400 - 2600


2200 - 2400

National masters

2000 - 2200

Candidate masters

1800 - 2000

Amateurs – Class A

1600 - 1800

Amateurs – Class B

1400 - 1600

Amateurs – Class C

1200 - 1400

Amateurs – Class D

Below 1200


Since abstract art grandmasters win over a monkey only in 67% of cases, the difference in their artistic Elo ratings is less than 200. This means that apes are either grandmasters or national masters of abstract art. However, being serious, one cannot class a gorilla above a novice. I conclude that abstract art grandmasters are at best class D amateurs.

Mikhail Simkin
June 26, 2011

This article appeared in Significance, a magazine of The Royal Statistical Society.


  1. Hawley-Dolan, A. and Winner, E.  (2011) “Seeing the Mind Behind the Art : People Can Distinguish Abstract Expressionist Paintings From Highly Similar Paintings by Children, Chimps, Monkeys, and Elephants” Psychological Science, published online 3 March 2011 DOI: 10.1177/0956797611400915; http://pss.sagepub.com/content/early/2011/03/01/0956797611400915
  2. Simkin, M. (2007) “My statistician could have painted that!” Significance, vol. 4, pp 94-96.  http://ecclesiastes911.net/properly_prescribed.html
  3. Simkin, M. (2003) “True art, or fake?”  http://reverent.org/true_art_or_fake_art.html
  4. Urban, F.M. (1908) “The application of statistical method to problems of Psychophysics”, The Psychological Clinic Press, Philadelphia.
  5. Britannica online encyclopedia, “Weight divisions” http://www.britannica.com/EBchecked/topic/76377/boxing/229625/Weight-divisions
  6.  Elo, A.E. (1978) “The rating of chessplayers, past and present”, Arco publishing, New York.

Some press about this research

Response to criticism

Most writers describe my research correctly. However, several of them either misunderstand or misrepresent my arguments.

Alasdair Wilkins writes in io9:

Simkin looked at one fairly well-known experiments in which people are asked to pick up and compare different weights. As the weights get closer together, people have more difficulty determining which is heavier, and the subjects could only distinguish a 100 kg weight from a 96 kg weight 72% of the time.

That 72% clip is, of course, better than any of the success rates for abstract art. Simkin points out that, if you were to translate the results from one study to another, you could say professional abstract art is the 100 kg weight, and stuff done by kids is the 96 kg weight. And that is the scientific proof that abstract art is 4% better than what a kid or an animal could do.

It is 100 and 96 g, not kg. And I never claimed to give a "scientific proof that abstract art is 4% better than what a kid or an animal could do." I suggested a new way to understand the meaning of the experimental result that abstract art was judged better than animal art in two thirds of the cases. This is not a strict mathematical proof, but a new metaphysical method of argument, which combines science, art, and sport.

Alasdair Wilkins continues:

Simkin's paper also offers another comparison point. Using that same 4% difference, he compares the difference between professional abstract art and children's random drawings to that between a chess novice and the next-lowest ranking, a D-class amateur.

No, "that same 4% difference" was not used, was not needed for and could not be used in the comparison with chess players.

An article in Популярная механика says:

…подход ученого вызывает удивление – как сравнение прямого угла с точкой кипения воды на том лишь основании, что и тот, и другая меряются в градусах.

This translates as

the approach of the scientist is surprising - as a comparison of the right angle to the boiling point of water on the grounds that both are measured in degrees.

Apparently, the author of the remark got used to measure temperature in Celsius, where water boils at 100 degrees - only ten more than the number of degrees in a right angle. In my article I compared percentages of "better" judgements with percentages of "heavier" judgements. They are not merely designated by the same word, but are essentially same things. It is equivalent to comparison of two angles (for example in a square and in a triangle) or of two temperatures (for example of the author of the Популярная механика article and of an ape) .

Robert Czepel writes in ORF.at:

Nur: Fälschungen gibt es, daran darf auch einmal erinnert werden, auch in der Wissenschaft. Der koreanische Stammzellenforscher Hwang Woo Suk hat etwa jahrelang Fake-Studien im renommierten Fachblatt "Science" publiziert, ohne dass den Fachkollegen etwas aufgefallen wäre. Hwang war bis zu seinem Fall ein wissenschaftlicher Weltstar und die fachliche Plausibilität seiner Arbeiten wurde nicht nur per Online-Quiz geprüft.

Soweit der Stand in der Stammzellenforschung, die nicht seriöser oder unseriöser als andere Wissenschaftsdisziplinen sein dürfte. Was folgt daraus? Beziehungsweise: Folgt überhaupt etwas daraus? Simkin könnte die Antwort kennen. Vielleicht hat sie etwas mit Gorillas oder Gewichthebern zu tun.

This translates as

But there are fakes even in science. The Korean stem cell researcher Hwang Woo Suk has for years published fake studies in the prestigious journal "Science", without the peers notice anything. Hwang until his indictment was a scientific superstar and the technical viability of his work was not only checked by an on-line quiz.

As far as the status of stem cell research, it is probably not serious as other science disciplines. Was folgt daraus? What follows? Does anything follow? Simkin might know the answer. Maybe it has something to do with gorillas or weightlifters.

The comparison is wrong since Dr. Hwang's papers only described his experiments but were not experiments themselves. The scientist merely trusted his word. Equivalent art test would be to judge paintings without actually seeing them, but by the descriptions given by the artists. So Hwang case is far less devastating, than the scandallous results of the quizzes.

Mikhail Simkin
July 12, 2011

Feedback   Discuss   RSS   Newsletter