I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

Ecclesiastes 9:11




Opium for Scholars


Balashev found Davout seated on a barrel in the shed of a peasant's hut, writing – he was auditing accounts. Better quarters could have been found him, but Marshal Davout was one of those men who purposely put themselves in most depressing conditions to have a justification for being gloomy. For the same reason they are always hard at work and in a hurry. "How can I think of the bright side of life when, as you see, I am sitting on a barrel and working in a dirty shed?" the expression of his face seemed to say. The chief pleasure and necessity of such men, when they encounter anyone who shows animation, is to flaunt their own dreary, persistent activity. Davout allowed himself that pleasure when Balashev was brought in. He became still more absorbed in his task when the Russian general entered, and after glancing over his spectacles at Balashev's face, … he did not rise or even stir, but scowled still more and sneered malevolently.

 Lev Tolstoy, War and Peace

Tolstoy's description of Marshal Davout had surely reminded the reader a typical college professor. What are these folks so busy with? They are writing scientific papers. A major parameter used to measure the productivity of a scientist is the annual number of papers he publishes.  To survive in the scientific jungle he has to keep this number high. Some of them even get addicted, developing insanabile scribendi cacoethes (mad itch to write)[1].  

Are there any barriers limiting the number of published scientific papers, any quality control?

In 1996 New York University Physicist Alan Sokal wrote a deliberately foolish paper and got it published by a premier cultural studies journal, Social Text. Sokal advertised his hoax in a number of newspaper articles and in the book Fashionable Nonsense[2]. He argued that his parody paper was accepted by the editors because it was indistinguishable from the serious papers in the field of social studies.

During 2001-2002 popular French TV presenters Igor and Grichka Bogdanoff published five papers, consisting of incoherent stream of the buzzwords of modern physics in a number of respectable journals including Classical and Quantum Gravity and Annals of Physics[3]. Unlike Sokal, Bogdanoffs do not admit to hoax, but claim that their papers are true science. They even used research reported in those papers to get doctoral degrees in theoretical physics from the University of Bourgogne in Paris, France

In 2005 Jeremy Stribling, Max Krohn, and Dan Aguayo of MIT wrote a computer program, which uses Noam Chomsky’s context-free grammar to generate random research papers in the field of Computer Science.  One of the outcomes of that program was accepted to a scientific conference[4].

These   examples demonstrate that essentially every written paper can get published. In absence of natural predators and with the rate of birth being only limited by the speed of write, scientific papers breed like rabbits once did in Australia.

Does anyone read them?

As a rule, scientists mention the articles of their colleagues which they had read and which had influenced their thinking in bibliography list. Such mentions are called references or citations. Fellow of Royal Society John M. Ziman wrote [5]:

"... citations not only vouch for the authority and relevance of the statements they are called upon to support; they embed the whole work in context of previous achievements and current aspirations. It is very rare to find a reputable paper that contains no reference to other research. Indeed, one relies on the citations to show its place in the whole scientific structure just as one relies on a man's kinship affiliations to show his place in his tribe."

In theory we can estimate the number of people who read the paper using the number of people who cited it. In practice this is not so easy.

In school the reader had certainly written essays. Did it happen that your teacher would say: “include at least three sources in your paper”? And you would use just one source and copy references to the other two sources from it? When scientific paper is evaluated for publication one of criteria used is that it has sufficient number of citations (typically between 20 and 40, depending on the journal). Large number of references in a paper demonstrates that its author had studied relevant literature in the field. To pile up their bibliography lists the scientists do the same thing you did in high school. And they can do this and get away with it until one day they copy a citation, which carries in it a DNA of someone else’s misprint. In such case they can be identified and brought to justice (similar to how biological DNA evidence helps to convict criminals, who committed more serious offences than that).

Recently Simkin and Roychowdhury [6] performed comprehensive study of misprints in citations to one high-profile scientific paper. Among 4300 citations to the paper in question 196 contained misprints, out of which only 45 were distinct. The most popular misprint in the page number appeared 78 times. One can estimate the ratio of the number of readers to the number of citers, as the ratio of the number of distinct misprints, D, to the total number of misprints, T. Indeed: we know that among T citers, T - D copied, because they repeated someone else’s misprint. For the D others, with the information at hand, we don't have any evidence that they copied, so according to the presumed innocent principle, we assume that they read. The fraction of citations which were read by the citing authors, R, is thus R = D / T = 45 / 196 = 0.23. A more careful analysis, which used stochastic modeling, gave very close number [6], [7].

During the “Manhattan project” (the making of nuclear bomb), physicist Enrico Fermi asked Gen. Leslie Groves, the head of the project, what is the definition of a “great” general. Groves replied that any general who had won five battles in a row might safely be called great. Fermi then asked how many generals are great. Groves said about three out of every hundred. Fermi conjectured that the chance of winning one battle is Ѕ and the chance of winning five battles in a row is (1/2)5 = 1 / 32.  “So you are right, General, about three out of every hundred. Mathematical probability, not genius.”  

The existence of military genius was also questioned by Tolstoy in War and Peace, this time on basic philosophical grounds:

“Prince Andrew, listening to this polyglot talk and to these surmises, plans, refutations, and shouts, felt nothing but amazement at what they were saying. A thought that had long since and often occurred to him during his military activities – the idea that there is not and cannot be any science of war, and that therefore there can be no such thing as a military genius – now appeared to him an obvious truth. … And why do they all speak of a 'military genius'? … It is only because military men are invested with pomp and power and crowds of sycophants flatter power, attributing to it qualities of genius it does not possess. The best generals I have known were, on the contrary, stupid or absent-minded men.”

Great scientists feverishly leaf through the articles by their colleagues with trembling inpatient fingers. Soon, soon they will be … ah … here they are … references. Now they stop leafing and start searching for their names among the cited authors. Earlier we mentioned that one of parameters used to measure scientific productivity is the number of papers published by a scientist. However, the success of a scientific animal in far greater degree depends on another parameter: the number of citations to his papers. In fact, this number is the commonly accepted measure of “greatness” for a scientist. The reasoning behind this is that scientists cite papers which they found useful. Thus, if a paper was never cited – it is useless, and if it was cited by a lot of people than surely it is very useful.

For example, SPIRES[8], the High-Energy Physics literature database, maintained by Stanford Linear Accelerator Center, divides papers into six categories according to the number of citations they receive. The top category, “Renowned papers” are those with 500 or more citations. Let us have a look at the citations to roughly 24 thousands papers, published in Physical Review D in 1975-1994 . As of 1997 there where about 350 thousands of those: fifteen per published paper on average. However, forty-four papers were cited five hundred times or more. Could this happen if all papers are created equal? If they indeed are then the chance to win a citation is one in 24,000. What is the chance to win 500 cites out of 350,000?  The calculation is slightly more complex than in the militaristic case, but the answer is one in 10500, or, in other words, it is zero. One is tempted to conclude that those forty-four papers, which achieved the impossible, are great.

Not exactly so when most of citations are copied from the lists of references used in other papers. This way a paper that already was cited is likely to be cited again, and after it is cited again it is even more likely to be cited in the future. In other words, “unto every one that hath shall be given, and he shall have abundance" [9]. We are now in a perfect position to launch a frontal attack on the whole institution of scientific publishing. The model of random-citing scientists[10] is as follows. When a scientist is writing a manuscript he picks up three random articles, cites them, and also copies a quarter of their references. The model accounts quantitatively for empirically observed citation distribution (see the figure below). Now: what is the probability for an arbitrary paper to become “renowned”, i.e. receive more than five hundred citations? Calculation shows that this probability is one in 600. This means that about 40 out of 24,000 papers should become renowned by the ordinary law of chances. Simple mathematical probability, not genius, can explain why some papers are cited a lot more than the other.

Figure 1. The outcome of the model of random-citing scientists compared to the actual citation data. 

Some see derision in applying random models to scientific citing. In reality, the random-citing model was used not to ridicule the scientists, but because it can be exactly solved using available mathematical methods. This is similar to the random-phase approximation in the theory of an electron gas.   Of course, the latter did not face such public outcry, as the model of random-citing scientists, - but this is only because electron does not have voice. What is an electron? - Just a green trace on the screen of an oscilloscope. Meanwhile within itself electron is very complex and is as inexhaustible as the universe. When an electron is annihilated in a lepton collider the whole universe dies with it. And as for the random-phase approximation: Of course, it accounts for the experimental facts - but so does the model of random-citing scientists.

Similar to the military genius, the scientific genius just appears to exist. In his book Social sciences as sorcery [11] sociologist Stanislav Andreski even described a great scientist in a way which reminds us Tolstoy’s description of a great general.

“A renowned author would have to have a most extraordinary character … to be able to write prolifically in the full knowledge that his works are worthless and that he is a charlatan whose fame is entirely undeserved and based solely on the stupidity and gullibility of his admires. Even if he had some doubts about the correctness of his approach at some stage of his career, success and adulation would soon persuade him of his own genius and the epoch-making value of his concoctions. When in consequence of acquiring a controlling position in the distribution of funds, appointments and promotions, he becomes surrounded by sycophants courting his favours, he is most unlikely to see through their motivation; and, like wealthy and powerful people in other walks of life, will tend to take flattery at its face value, accepting it as a sincere appreciation (and therefore confirmation)”

The book of Ecclesiastes says: “Is there any thing whereof it may be said, See, this is new? It hath been already of old time, which was before us.” The discoveries described in this article are no exception. Thirty years ago historian  Derek de Sola Price[12], who studied scientific  citations,  proposed  cumulative advantage model in which the rate of citing a particular paper is proportional to the number of citations the paper had already received. Using this model he explained the observed distribution of citations. However, Price did not explain why the rate of citing is proportional to the cumulative number of citations.  This was accomplished only recently with the formulation and solution of the model of random-citing scientists [6], [7], [10] .

Now we quote Tolstoy’s War and Peace for the last time.

“A locomotive is moving. Someone asks: "What moves it?" A peasant says the devil moves it. Another man says the locomotive moves because its wheels go round. A third asserts that the cause of its movement lies in the smoke which the wind carries away.

            The peasant is irrefutable. He has devised a complete explanation. To refute him someone would have to prove to him that there is no devil, or another peasant would have to explain to him that it is not the devil but a German, who moves the locomotive. Only then, as a result of the contradiction, will they see that they are both wrong. But the man who says that the movement of the wheels is the cause refutes himself, for having once begun to analyze he ought to go on and explain further why the wheels go round; and till he has reached the ultimate cause of the movement of the locomotive in the pressure of steam in the boiler, he has no right to stop in his search for the cause. The man who explains the movement of the locomotive by the smoke that is carried back has noticed that the wheels do not supply an explanation and has taken the first sign that occurs to him and in his turn has offered that as an explanation."

And, returning back to our citations,- seeing a genius behind a highly cited paper is akin to seeing devil in a locomotive.  Price’s cumulative advantage process is akin to those wheels going round. And citation copying is that ultimate pressure of steam in the boiler which pushes forward the locomotive of scientific publication establishment.

Mikhail Simkin
July 15, 2006

Opium for Scholars is the fourth essay in the Ecclesiastes 9:11 series. To be posted of our future releases subscribe to our newsletter.


Citations

[1] As diagnosed in R. K. Merton, “The Matthew Effect in Science”, Science, 159, 56 (1968).

Fashionable Nonsense: Postmodern Intellectuals' Abuse of Science (Picador, New York, 1998)

[3] Andrew Orlowski ,“Physics hoaxers discover Quantum Bogosity” , The Register, 1st November 2002, http://www.theregister.co.uk/2002/11/01/physics_hoaxers_discover_quantum_bogosity/

John Baez, “The Bogdanoff Affair”, http://math.ucr.edu/home/baez/bogdanoff/

[4] SCIgen - An Automatic CS Paper Generator: http://pdos.csail.mit.edu/scigen/

[5] J.M. Ziman, “Information, communication, knowledge”, Nature, vol. 324 p.318 (1969).

[6] M.V. Simkin and V.P. Roychowdhury, “Read before you cite!”, Complex Systems, 14,  269 (2003), http://arxiv.org/abs/cond-mat/0212043

[7] M.V. Simkin and V.P. Roychowdhury, “Stochastic modeling of citation slips”, Scientometrics, 62, No.3, pp. 367-384 (2005),  http://arxiv.org/abs/cond-mat/0401529

[8] SPIRES: High-Energy Physics Literature Database http://www.slac.stanford.edu/spires/hep/

[9] Mathew [25:29]. Sociologist Robert Merton [1] christened this phenomenon “Matthew Effect”. However, similar saying appears in two other gospels: “For he that hath, to him shall be given…” [Mark 4:25], “…unto every one which hath shall be given…” [Luke 19:26] and belong to Jesus. Nonetheless, the name “Mathew Effect” was repeated by thousands of scholars who did not read the Bible.

[10] M.V. Simkin and V.P. Roychowdhury, “Copied citations create renowned papers?”, Annals of Improbable Res. vol. 11 No. 1, pp. 24-27, 2005;  http://arxiv.org/abs/cond-mat/0305150

[11] Stanislav Andreski, Social sciences as sorcery (Andre Deutsch, London, 1972). See the foreword.

[12] D. de S. Price, "A general theory of bibliometric and other cumulative advantage process", Journal of American Society for Information Science, 27, 292 (1976).

Some press about this research


Feedback   Discuss   RSS   Newsletter