Reverent Entertainment → Ecclesiastes 9:11 → Sounds like Faulkner |
I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.
Ecclesiastes
9:11
Sounds like Faulkner
The results of the Machine translation or Faulkner quiz
By Mikhail Simkin
William Faulkner’s book “The sound and the
fury” is number six on Random House list of the 100 best novels of the
twentieth century[1].
Faulkner got a Nobel Prize "for his powerful and artistically unique
contribution to the modern American novel"[2]
. His artistic uniqueness is in his trademark stream of consciousness narration
style. However, at one point it appeared to me that machine translation
produces text indistinguishable from this stream. To check whether it is just
an oddity of my comprehension or a reflection of intrinsic properties of
Faulkner’s prose I wrote and put online the “Machine translation or Faulkner”
quiz[3].
It consists of a dozen literary passages some of which I took from Faulkner’s
“The sound and the fury” and the other from machine translated German text. The
takers are to tell the prose that won a Nobel Prize from the prose that needs
editing.
This turned out to be a real challenge. “I scored 33% and I like Faulkner!” wrote one of respondents. “Despite "The sound and the
fury" being my favorite Faulkner novel, I only scored a 67%,”– wrote
another. Figure 1 shows the statistics
of the results obtained by over twenty thousands quiz-takers. The average score
is 4.68 out of 12 or 39% correct. This means that our quiz-takers would have
been better of if they were guessing at random. Table 1 shows for each
individual passage the percentage of quiz-takers who selected it as Faulkner.
Figure 1. The histogram of the scores received by 20,114 quiz-takers. The average score is 4.68 out of 12 or 39.0% correct. The standard error is 0.1%.
Table 1. Statistics of
selection as Faulkner for individual passages.
Passage number |
Selected as Faulkner |
Author |
Book |
Translated by |
6 |
72.5% |
Heinrich Federer |
Vater und Sohn im Examen |
Promt |
11 |
72.1% |
Johann Wolfgang von Goethe |
Die |
Promt |
3 |
69.7% |
Johann Wolfgang von Goethe |
Die |
Systran |
1 |
63.9% |
William Faulkner |
The sound and the fury |
|
2 |
54.5% |
William Faulkner |
The sound and the fury |
|
5 |
54.0% |
Helmut Wördemann |
Der unzufriedene
Schatten |
Promt |
7 |
51.4% |
Helmut Wördemann |
Der unzufriedene
Schatten |
Promt |
10 |
51.0% |
Thomas Mann |
Bruder Hitler |
Systran |
8 |
37.6% |
William Faulkner |
The sound and the fury |
|
12 |
37.4% |
William Faulkner |
The sound and the fury |
|
9 |
24.9% |
William Faulkner |
The sound and the fury |
|
4 |
20.8% |
William Faulkner |
The sound and the fury |
|
However, I also got the following message
from one of quiz-takers: “It all sounds like machine
translations to a Finnish ear.” Thus, the poor results of our quiz takers could
be a result of poor knowledge of English language. To check whether this is
indeed the culprit I separated the scores received by people, who downloaded
the quiz from English-speaking universities. I identified over fifteen hundred
of such quiz-takers using IP addresses. Their results you can see in Figure 2. The
average score is 41.8% correct. This is almost three percent above the general
crowd, but still below random guessing.
Figure 2. The histogram of the scores received by 1,530 quiz-takers from American,
British, Australian and
But may be Faulkner’s prose is too advanced for an average university student? May be only the best minds can fully comprehend its wonderful qualities? To check for this I selected the subset of the scores earned by people from Oxbridge and Ivy League There were over two hundred of such people. The histogram of their scores you can see in Figure 3 and their distribution by elite universities in Table 2. The average score is 44.8% correct. Another three percent increase but still a poor result.
Figure
3. The histogram of the
scores received by 227 quiz-takers from Ivy League and Oxbridge. The average
score is 5.33 out of 12 or 44.8%
correct. The standard error is 1.3%. The distribution of the quiz-takers by
elite universities is in Table 1
Table 2. The
distribution of the elite quiz-takers by elite universities.
|
Number of test takers |
score |
||
minimum |
maximum |
average |
||
Brown |
10 |
2 |
9 |
4.80 |
|
22 |
1 |
10 |
4.95 |
|
31 |
1 |
10 |
5.65 |
Cornell |
24 |
2 |
9 |
5.00 |
|
4 |
4 |
7 |
5.00 |
Harvard |
41 |
2 |
11 |
5.95 |
|
22 |
1 |
10 |
5.45 |
Penn |
25 |
1 |
10 |
4.92 |
|
1 |
4 |
4 |
4.00 |
Yale |
47 |
2 |
9 |
5.45 |
Total |
227 |
1 |
11 |
5.37 |
Yes, many of our quiz-takers are from universities, some from the elite. But could it be that they are nerds from CS and Physics departments? I looked closely at the elite sample to separate English majors. Unfortunately, in almost all of the cases I could not extort anything beyond the university from domain names. However, I could identify two people from the University of Pennsylvania Center for Programs in Contemporary Writing (50% and 41.7%), one person from the English Department from the same university (41.7%) and one person, again from Penn, from the Department of Linguistics (83.3%). In addition, I got the following confession in my discussion forum: “I'm a freshman at Stanford, and I won a national-level writing award in high school. I scored 50% - i.e., I might as well have guessed randomly.”
The average score is low but 31 out of 20,114 people got 100%. This is six times more than what random guessing would give. How did they manage to do that? One of high-scores wrote to me: “If you select all passages absent of commas as Faulkner, and all passages containing commas as machine, you will score a 92% ... Faulkner hated commas.” Note that this of criterion does not use any literary quality.
People call
machine translation “babble”[4].
However, the results of the quiz show that it produces Nobel Prize quality
prose. To be fair, literary critics should stop loathing machine translation
and nominate machine translation engines for Nobel Prize in Literature. And if they are not willing to do so they
should admit that Faulkner got a Nobel Prize for babble.
I already reported similar results for two other quizzes. In one of them, the takers had to tell the masterpieces of abstract art from my doodles[5]. In the other, they had to tell Charles Dickens from the worst writer in history of letters[6].
April, 22, 2013
[1] Random House, Modern Library, 100 Best Novels http://www.modernlibrary.com/top-100/100-best-novels/
[3] Simkin, M.(2004) “Quiz: Machine translation or Faulkner?” http://reverent.org/sounds_like_faulkner.html
[4] C.S.W. (2012) “
[5] Simkin, M. (2007). My statistician could have painted that! A
statistical inquiry into modern art. Significance, l 4, 93-96; http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2007.00239.x/abstract ; Also available at http://ecclesiastes911.net/properly_prescribed.html
[6] Simkin, M. (2013). Scientific Evaluation of Charles Dickens. Journal of Quantitative Linguistics, 20, 68-73; http://www.tandfonline.com/doi/abs/10.1080/09296174.2012.754602 ; Also available at http://ecclesiastes911.net/they_have_more_readers.html .