Posted on February 11, 2008 by Peter Turney
The Seven Secrets of Highly Cited Scientists
A couple of years ago, I discussed with some colleagues the topic of maximizing citations for academic research papers. Here is a summary of the discussion.
Why should we want our papers to be highly cited? I assume here that we want our work to influence other researchers, and that citation count is a reasonable estimate of influence.
Survey/review papers and methodology papers are often highly cited, but the focus of this discussion is on papers that present original work, although there is certainly merit in survey, review, and methodology papers. (By the way, there is evidence that citation counting is biased towards survey/review papers.)
It seems to me that these are the main factors that characterize more highly cited papers:
1. Reusability: The core idea should be relatively simple, so
that other researchers can easily understand it and especially so that
they can easily use it in their own research. This factor might also be
called simplicity, elegance, or fertility, but I think reusability best captures what I mean. I will cite your paper if I can reuse your ideas in my own research.
3. Effectiveness: There should be some experimental evidence that the core idea works better than past ideas or better than reasonable baselines. Reviewers care deeply about this. I will want to use your idea if you can show me that it works on tasks that I care about.
4. Venue: If all else is equal, a paper in a respected conference or journal will be cited more than a paper in a less respected venue. I prefer to cite respected conferences and journals, hoping that the respect for my citations will increase the respect for my own paper.
5. Accessibility: If all else is equal, online papers will be more cited. I will cite your paper if I can read it without walking to the library.
6. Timeliness: Turbo codes, a class of error correction codes, were invented in 1993. These codes approach the theoretical maximum performance (the Shannon limit). It turns out that they are similar to a class of codes called LDPC codes, invented in 1963, but ignored until the invention of Turbo codes. The LDPC codes were ignored because the hardware of the 1960s was not good enough to make LDPC codes practical. This illustrates the importance of timeliness for maximizing citation counts. I will cite your paper if I can use your ideas now.
7. Positivity: Negative results are not as popular as positive results, although there have recently been some efforts to correct this. I will cite your paper if you show me what I can do, instead of telling me what cannot be done.
It seems that venue is not as important as the other factors. When I look at the citation counts for my own papers, they are not highly correlated with the average citation counts of the venues. When I look at my favourite highly cited papers, it seems that reusability and originality are the most important factors. These are what we should strive for in our research. For increased accessibility, putting papers online is easy and makes sense. I use both arXiv and Cogprints.
Regarding negative results, when an algorithm succeeds at a task, a large number of factors have to be right. We usually don’t even know what all of the factors are, at least until years later, if ever. A researcher can publish a positive result, listing a few of the factors that were involved, and other researchers can try to replicate the result, knowing some of these factors, and knowing that a positive result is possible (i.e., we have an existence proof). When an algorithm fails at a task, any one of these factors may be responsible. Locating the exact factor may be very difficult. A negative result may scare researchers away from a whole approach, even though (for all they know) only one factor was wrong.
This is known as the Credit Assignment Problem. When a long chain of steps leads to success, we can simply distribute the reward evenly over all of the steps in the chain. But what are we to do when a long chain of steps leads to failure? How can we discover the step that caused the failure (the weakest link in the chain)? A negative result can lead to the rejection of the whole chain, due to one bad link. Instead of distributing a penalty evenly over all of the steps in the chain, it might be better to just forget about the negative result.
There is an interesting discussion of journals versus conferences in Academic Careers for Experimental Computer Scientists and Engineers (Appendix B). The big problem with journals is the delay. You can often expect two years from submission to publication. The advantages of journals are more prestige, more space to explain your work, and feedback from reviewers results in a much better final paper.
For me, the decision of conference versus journal is based on how much I have to talk about. I think most people these days (including myself) would rather read an eight page conference paper than a thirty page journal paper. When I read a paper, I’m looking for good ideas that I can use in my own work (i.e., reusability). Most good ideas can be expressed in eight pages (or less). For me, a journal paper is a last resort, to be used only when I have so much that I want to say, that it’s just impossible for me to fit it into eight pages.
Thanks to Joel Martin, David Nadeau, Daniel Lemire, Roland Kuhn, and Pierre Isabelle for their comments and contributions.