I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

Ecclesiastes 9:11

The music that is better than itself

In 1930s Pitirim A. Sorokin and J.W. Boldyreff conducted the most interesting experiment [1]. They told the participants that they are going to play two variations of the same theme by two prominent composers. They declared that musical critics say that the first variation is a masterpiece, while the second is its exaggerated imitation totally deficient in self-subsistence and beauty. They said that that the aim of the experiment is to see whether laymen agree with the experts. Afterward experimenters played the same record twice [2]. The participants completed a questionnaire which had three possible answers: prefer first record, prefer second record, and no preference. There was also a “why” field where people could explain their choice. Out of 863 experimental subjects 501, or 58.1%, agreed with the opinion of the experts and 66, or 7.6% disagreed [3]. A 296, or 34.3%, answered that they do not prefer any of the records, and 80 out of those (this is only 9.3% of all test subjects) wrote in the “why” field that the two records were identical. The people who preferred the record to itself also made comments in the “why” field. For example, one of them wrote that the first record meant “philosophical contemplation of life” and the second record was a “jerky composition that spoils mood.”

Sorokin and Boldyreff did work at Harvard University and 17 of their test subjects were Harvard students. Out of those 8, or 47%, agreed with the opinion of the experts, 5, or 29% disagreed, and 4, or 24%, indicated no preference. Nobody noticed that the records were identical. These percentages differ from general crowd ones, but some of the differences are not statistically significant since the Harvard sample is small. For example, when only 9.3% notice that the records are identical the probability that no one of 17 people will notice this is 21%. However, there is one statistically significant difference, and for some reason Sorokin and Boldyreff did not pay attention to it. The percentage of Harvard students who disagreed with the opinion of the experts is almost four times higher than such percentage among the general population. This difference is statistically significant. If the rate of disagreement is 7.6% the probability that 5 or more people out of 17 will disagree is only 1%. In fact the probability that Harvard disagreement rate is at least twice higher than the general disagreement rate is 95%. It is revealing that Harvard showed not a higher percentage of people who noticed that the records were identical, but a higher percentage of people who disagreed with the experts on the question of which of the two identical records is better.

Sorokin and Boldyreff wrote that their research helps to understand how people can come to see a great book in a rotten best-seller or a masterpiece in a rotten stuff. I suggest a slightly different perspective. As we all know, rotten best-sellers are read by uneducated masses, while great books are read by people educated at Harvard. The choice of books made by people educated at Harvard disagrees with the choice made by uneducated masses. I think that the results of Sorokin and Boldyreff suggest that these great books could be of the same quality as the rotten stuff. Indeed, in a recent experiment [4] I discovered that Ivy League and Oxbridge people can’t tell the prose of Charles Dickens from the writings of Edward Bulwer-Lytton, a ridiculed novelist mostly known for a wretched writing contest named after him. In another experiment Ivy League and Oxbridge folks could not tell immortal masterpieces of modern art from the pictures produced by a bored scientist [5].

Sorokin and Boldyreff speculated that a large fraction of opinions and judgments of scholars and scientists are of the same nature as the judgments of people in their experiment. I described a couple of such cases in my recent article which tells of the scientists who managed to “understand” nonsensical hoax lectures.

Mikhail Simkin
October 28, 2013

This article appeared in Significance, a magazine of The Royal Statistical Society.


  1. Sorokin, P. A. & Boldyreff, J. W. (1932). An Experimental Study of the Influence of Suggestion on the Discrimination and the Valuation of People. The American Journal of Sociology. 37, 720-737.
  2. It was an excerpt from Brahms, Op. 68, Symphony No. 1 in C minor performed by Philadelphia Symphony Orchestra conducted by Leopold Stokowski.
  3. 3. A diligent reader may notice the difference between my percentages and those given by Sorokin and Boldyreff. This is because they calculated percentages within each experimental group and afterward calculated the unweighted average of those percentages. This is despite the groups varied in size from 4 to 299. I simply combined all groups in one, which corresponds to the weighted average. I discarded the data that Sorokin and Boldyreff marked as unreliable, just as they did.
  4. Simkin, M. (2013) Scientific Evaluation of Charles Dickens. Journal of Quantitative Linguistics, 20, 68-73.
  5. Simkin, M. (2007) My statistician could have painted that! A statistical inquiry into modern art, Significance, 4, 93-96.

Feedback   Discuss   RSS   Newsletter