OUP user menu

Sense and nonsense about the impact factor

Tobias Opthof
DOI: http://dx.doi.org/10.1016/S0008-6363(96)00215-5 1-7 First published online: 1 January 1997

Abstract

The impact factor is based on citations of papers published by a scientific journal. It has been published since 1961 by the Institute for Scientific Information. It may be regarded as an estimate of the citation rate of a journal's papers, and the higher its value, the higher the scientific esteem of the journal. Although the impact factor was originally meant for comparison of journals, it is also used for assessment of the quality of individual papers, scientists and departments. For the latter a scientific basis is lacking, as we will demonstrate in this contribution.

Keywords
  • Impact factor
  • Quality assessment
  • Citation analysis

1. Introduction

The impact factor is a bibliometric parameter based on the number of times that papers in a particular journal are cited by all journals. It is considered a parameter of scientific quality of a journal. We will define the impact factor more exactly and we will also explain the details of its calculation in a following section. The Institute for Scientific Information (ISI) in Philadelphia (USA) has published the Science Citation Index (SCI) since 1961. The SCI covers all journals related to the field of clinical science and life sciences. It counts the citation of individual papers based on the list of references of all papers in journals that are indexed by the SCI itself, by the Social Sciences Citation Index (SSCI) and by the Arts and Humanities Citation Index (A&HCI). The SCI used to be published in roughly 20 ‘telephone book’ volumes on a yearly basis, but with the development of more powerful computers, it is now released monthly with accumulating data during the year.

The impact factor was intended primarily as a bibliographic research tool for retrieval of overlapping research for the benefit of scientists who worked in relative isolation to contact colleagues with comparable interests. Later it also developed as a research tool for the social sciences and more recently administrators appear to have discovered the impact factor as a parameter for quality of work of (groups of) scientists.

2. Does citation reflect quality?

The assumption behind the use of the impact factor is that citation reflects quality. Efforts have been made to correlate ‘peer esteem’ with the ‘citation rate’ of selected individual authors [1]. The ‘peer esteem’ was scored by means of questionnaires. The correlation coefficient between the two parameters varied from 0.53 for the field of physics to 0.70 for the field of biochemistry, with chemistry, psychology and sociology in between [1]. The problem with these correlations is that the two parameters (‘peer esteem’ and ‘number of citations’) are probably not independent.

Another problem with the interpretation of the number of citations is that simple counting of citations does not take into account the context of the citation. It is obvious that citations like “we confirmed previous data of Opthof et al.….” and “by misinterpretation of their own data Opthof et al. erroneously suggest that.….rd or “the fraudulous work of Opthof has retarded the field of autonomic influences on heart rate for decades” constitute different qualifications even if they all are scored as one citation.

A considerable amount of published scientific work is never cited. De Jong and Schaper [2] analysed 137019 papers on clinical cardiovascular science published by authors from the G7 countries (Canada, France, Germany, Italy, Japan, United Kingdom, USA) and by 7 smaller European countries (Belgium, Denmark, Finland, Netherlands, Norway, Sweden, Switzerland) between 1981 and 1992. Despite the fact that these papers had an average period of 6 years for citation, 46% of them were never cited, with the best score for Norway (31% not cited) and the worst score for Japan (69% not cited). It is not possible to discriminate between two possible explanations. There may be many redundant or low-quality publications within this field.

Cole [1] reported a number of citations of 5.5 for all cited authors in the SCI of 1961 (Fig. GR1; left bar). This does not reflect the citation rate of all authors, because we saw in the previous paragraph that even over a much longer period than 1 year (1 to 12 years!) about half of the papers (and thus also a lot of authors) are never cited [2]. Still, if we take the citation rate of 5.5 for all cited authors in 1961 as a reference value, it is of interest that the number of citations of Nobel Prize winners in Physics who were awarded the prize between 1955 and 1965 was 58 in 1961 (Fig. GR1; middle bar). The number of citations of a subgroup of laureates who were awarded the Nobel Prize between 1962 and 1965—that is, after the count of their number of citations—was 62 in that same year (Fig. GR1; right bar). The latter excludes that their higher-than-average citation rate was caused by the fact that they were awarded the Nobel Prize. These data corroborate the point of view that citation indeed reflects scientific quality.

Fig. GR1

Number of citations obtained in 1961 (data from the SCI 1961). Left bar: all cited authors (this is not the same as all authors; see text for explanation). Middle bar: number of citations obtained by Nobel Prize winners in physics between 1955 and 1965. Right bar: number of citations obtained by Nobel Prize winners in physics between 1962 and 1965 (i.e., after scoring of the citations in 1961). Data taken from [1].

3. Definition and calculation of the impact factor

The impact factor of journal X in year Y equals the average number of citations in year Y scored in all journals of papers published by journal X in the years (Y-1) and (Y-2). Table 1 shows the calculation of the impact factor of Journal A in 1994. The first column lists the first authors (1,2,3,4,…, 557,558,559,560) of papers published between January 1992 and December 1993. The second column lists the months of publication. The third column gives the number of citations of the individual papers in all SCI, SSCI and A&HCI journals. For brevity, the months of February 1992 to November 1993 have been summed. In total, these 560 authors obtained 3493 citations (Table 1). The impact factor will then be 3493/560 = 6.24 (Table 1). As a matter of fact, the ISI does not score citations to individual papers, but to journals. This generates 411 extra citations, increasing the total number of citations to 3904 and the (official ISI) impact factor of journal A to 6.97. Obviously, this discrepancy results from inaccuracies by citing authors. For example, if the name of the first author of the first paper in the Table is ‘Breithardt’ and a citing author by mistake refers to the paper as ‘Breithart’, the citation would still go to journal A, although to the non-existing author ‘Breithart’. Interestingly, this type of mistake occurs in about 10% (411 of 3904) of the citations.

View this table:
Table 1

Analysis of the citation in 1994 of papers published by Journal A from the cardiovascular category of the SCI

AuthorIssueCit.1994Impact1994
1Jan-9212
2Jan-924
3Jan-923
4Jan-920
Jan-92
TotalJan-92159
From
Feb-92
to
TotalNov-933677
Dec-93
557Dec-931
558Dec-932
559Dec-930
560Dec-936
TotalDec-9357
Sum
56034936.24
Errors411
56039046.97
  • Author = listing of all papers by names of the first authors. Issue = all issues between January 1992 and December 1993. Cit. 1994 = citations of the first 4 and last 4 papers and of all issues published between January 1992 and December 1993. Impact 1994 = impact factor obtained by dividing the total number of citations by the total number of papers (560). The term ‘errors’ reflects citations that were correctly scored for Journal A, but were not correctly scored for the first author.

4. Use and abuse of the impact factor

It is tempting to use the impact factor as a tool for quality assessment not only for journals, but also for individual papers and for (groups of) scientists. One should be aware that time must elapse after publication of a paper before a meaningful citation analysis can be made. The latter is a major issue that makes citation analysis (which constitutes by definition an a posteriori measurement) less attractive. It is mainly for this reason that a priori systems were suggested for quality assessment. By assigning a ‘quality label’ to papers in the form of the impact factor of the journal at the time of publication, in theory a much faster quality assessment could be made. It must be emphasized, however, that the basic assumption is that an article published by a journal adequately represents the quality of the journal as a whole [1, 3, 4]. For individual parties such as (groups of) scientists, the outcome of quality assessment may have severe consequences. Therefore, analysis based on a priori determination of quality will provoke discussion on which (type of) paper will qualify for the a priori assessment. For example, how is a paper resulting from collaboration between groups graded? Do Editorials or (invited) Letters to the Editor qualify as scientific output? The following sections will discuss the suitability of a priori determination of quality for individual papers and (groups of) scientists. First we will focus on the significance of the impact factor for assessment of scientific journal quality.

4.1. Does the impact factor permit assessment of the quality of journals?

Calculation of the impact factor of a journal is performed by the ISI by counting the citations to the journal and not to the individual authors. In the example of Journal A (Table 1) we have seen that a ‘journal count’ produced an impact factor of 6.97. The total number of citations (3904) was obtained as a lump sum and it was divided by the number of papers (560). The ‘author count’ produced a lower impact factor (6.24). In the latter case the individual scores of the papers are known. Therefore it is possible to calculate not only the average, but also the standard deviation and the standard error of the mean (sem). This provides an estimate of the accuracy of the mean. For Journal A these figures were 6.24 ± 0.32 (s.e.m.). In Fig. GR2 the same has been done for another Journal B. The result was 2.69 ± 0.19. The difference between the impact factors of Journals A and B was highly significant (see legend, Fig. GR2). Therefore, it may be concluded that the impact factor indeed permits assessment of the quality of journals.

Fig. GR2

Impact factors in 1994 for Journals A and B, both from the cardiovascular category of the SCI. Data are given as mean ± one s.e.m. S.e.m. values could be calculated by comparing the individual author citations for Journals A and B. The impact factors were 6.24 for Journal A and 2.69 for Journal B. The official impact factors were 6.97 and 2.89, respectively (including the ‘errors’ caused by misspellings of the names of the first authors). The distribution of citations over papers was not normal (see also Fig. GR3). For the sake of simplicity the data are given as means ± one s.e.m. Statistical analysis was, however, performed with a non-parametric test (Wilcoxon: Q = 129622; u = 12.65; P < 0.0005).

Fig. GR3

Citation of papers from Journals A and B in 1994. Abscissa: Number of citations 0 to 9 and a bin with all numbers of citations ≥ 10. Ordinate: number of papers in bins divided by total number of papers (560 for Journal A; 484 for Journal B). Impact factors: Journal A, 6.97; Journal B, 2.89. Despite the significant difference between the impact factors of Journal A and B (see legend, Fig. GR2), 35% of the papers in Journal A were less frequently cited than indicated by impact factor of Journal B.

4.2. Does the impact factor permit assessment of the quality of individual papers?

Fig. GR3 shows a comparison of the number of citations of individual articles in Journals A and B (same journals as inFig. GR2). The abscissa shows the number of citations obtained in 1994 from 0 to 9. Papers with more than 10 citations were grouped into one bin. The ordinate shows the fraction of papers for each number of citations. Although the papers in Journal A were more frequently cited than those in Journal B (see also Fig. GR2), it may be appreciated that 35% of the papers in Journal A (summation of papers cited 0, 1 and 2 times) were actually less frequently cited than as indicated by the impact factor of Journal B. Also, both journals published very successful papers that were cited more frequently than 10 times, albeit at different percentages (Fig. GR3). Thus, the impact factor does not permit quality assessment of an individual paper.

4.3. Does the impact factor permit assessment of the quality of individual scientists?

Although comparison of single papers of different authors is not permitted on the basis of assigning an a priori quality label in the form of the impact factor of the journal in which the work was published, one might argue that this does not necessarily hold for comparison of many papers of one or more authors. Fig. GR4 shows an analysis of the work of a single author over a 17-year period [4]. On the abscissa is the impact of the journals in which the work of this author was published expressed as the citations per article of the journal per year. On the ordinate is the actual citation rate of the papers published by this author expressed as the citations per article of the author per year. On average, this particular author has published papers in a journal with a journal impact of 3.1, whereas his article impact was 7.0. In fact, the author was more often cited than the other papers in the journals in which he published. Thus, one cannot use the impact factor for the assessment of the quality of the work of an individual author. Fig. GR4 also shows that there was no relation between the impact of the journals and the eventual citation rate of the papers of this author, because the correlation coefficient was virtually zero.

Fig. GR4

Comparison of the impact of the journals in which one author published over 17 years and the impact of the individual papers. On average the impact factor of the journal was 3.1, but the averaged article impact was 7.0. Also, there was no relation between the journal impact and the article impact (see the correlation coefficient). Reproduced with permission from the author and Elsevier Science Publishers from [4].

4.4. Does the impact factor permit assessment of the quality of groups of scientists (departments, institutes, universities)?

To study the influence of the number of papers for a priori quality assessment an experiment was performed. A listing of the contents of Journals A and B (Table 1, Fig. GR2 and Fig. GR3) was made. In 1992 and 1993 Journal A published 560 papers, whereas Journal B published 484 papers. By throwing dice (twice, 4 times and 6 times) particular papers were selected from the contents list and the associated number of citations in 1994 was scored. In addition, each 50th paper was scored in a separate subset. These ‘samples’ produced the 4 data points on the left in Fig. GR5 for Journals A (upper set) and B (lower set). As could be predicted from basic statistical rules, a sample size of about 15% (more than 50 papers) was needed before the ‘impact factor of the samples’ equalled the impact factor of journals. The implication is that even with random samples, more than 50 papers published in two previous years are required before the a priori assignment of impact factors to groups of papers can be performed for quality assessment. It is obvious that a selection of papers of a group of scientists is far from a random selection. Therefore, the number of papers needed for the assessment of quality of groups of scientists is probably much larger than the amount they normally would be able to produce. More research is needed to corroborate this point of view, but we propose that also for groups of scientists citation analysis is to be preferred to a priori labeling of papers. Possibly the a priori technique can provide a reliable estimate of the scientific quality of very large groups (such as universities).

Fig. GR5

The relation between sample size and impact factor of Journals A and B (see also Table 1, Fig. GR2 and Fig. GR3). The impact factors of Journals A and B were 6.24 and 2.69 based on scoring of the citation to first authors (see Table 1). A list of contents was made for both journals according to the example in Table 1. By throwing dice (twice, 4 times and 6 times) particular papers were selected from the contents lists of both journals and the number of citations was scored. The same was done for each 50th paper. This procedure yielded the 4 left data points for both journals. One may appreciate that when the random sample is smaller than 15% (about 75 papers) serious discrepancies occur between the impact factor of the sample and the impact factor of the journal. Thus, even with random samples, as many as 100 papers are needed before impact factors assigned as quality label to individual papers can be exchanged without serious mistakes against the results of citation analysis.

5. Citation bias: papers simultaneously published in more than one journal

The question whether the impact factor reflects quality can be answered by studying cases in which the same papers are—simultaneously and on purpose—published in more than one journal. This may apply to the reports of (combined) Working Groups of the European Society of Cardiology and of the American College of Cardiology or the American Heart Association [5–7]. The papers were published in the European Heart Journal and in either Circulation or the Journal of the American College of Cardiology or in all three journals. These papers render in the consecutive years after publication a small, but unique set of data to study citation bias [5–7]. If scientific quality were the sole determinant of citation, one would expect that the number of citations would be equal in all journals yielding a citation ratio of 1 (Fig. GR6; left bar). If the quality would be unimportant and citation would follow the impact factors of the journals, then the citation ratio would equal the ratio of the impact factors of the journals at the time of publication (Fig. GR6; right bar). Fig. GR6 shows that the observed citation ratio was about 1.8 (Fig. GR6; middle bar). Therefore, the quality of the paper is more important than the impact factor of the journals, but on the other hand the visibility of a journal may increase the citation rate of a paper by as much as 80% (compare the left and middle bars in Fig. GR6; the difference was significant).

Fig. GR6

Comparison of the citation ratio of [5–7] over 1991–1995. The middle bar shows the number of citations in the Journal of the American College of Cardiology and/or Circulation divided by the number of citations of the same papers in the European Heart Journal (middle bar). The left bar is the expected ratio of 1.0, if quality would be the sole determinant of citation. The right bar indicates the ratio of the impact factors of the journals at the time of publication.

6. Citation and grants

It has previously been shown that there is a poor relation between the ‘immediate past performance’ measured with citation analysis of research groups in the fields of chemistry and biology and the peer judgement of two Dutch ‘National Survey Committees’ [8]. Needless to say, the peer judgements of such committees have important consequences for grant applications. The lack of agreement between the two parameters in itself permits no preference between the two possible explanations. One explanation is that citation analysis simply does not reflect quality differences as peer judgement is supposed to do. The other possibility is that peer judgement insufficiently takes into account the past performance and the international position of research groups. On the basis of previous sections we are inclined to have more confidence in the latter explanation.

7. International aspects of citation

De Jong and Schaper [2] analysed the citation of papers published in the clinical cardiology category of the SCI (see also Section 2). The 137019 papers were published between 1981 and 1992 by the G7 countries and 7 smaller European countries (see Section 2for details). This type of analysis renders useful information on the success of research in clinical cardiology in those countries. Although the average number of citations (between 1981 and 1992) of these papers was just below 6 over the 12 years of analysis, there are major differences between the countries with the USA at the top with 7.5 and Japan in the bottom position with an average number of citations of only 2.0 per paper during 12 years (Fig. GR7). De Jong and Schaper[2] correlated these results also with economic data on investments in research which may further differentiate the data of Fig. GR7.

Fig. GR7

Citation of 137019 clinical cardiological papers published between 1981 and 1992. Data are taken from [2]. The citations were scored over the same years (1981 to 1992) and the actual numbers can therefore not be compared with impact factors. A geographical comparison can be made. The average for all countries was just below 6. The USA scored 7.5 citations per paper, Japan 2.0 citations per paper. All countries had large percentages of papers that were not cited at all, ranging from Norway 31% to Japan 69% with 46% for all countries together. USA = United States of America; CAN = Canada; NOR = Norway; NED = Netherlands; SWI = Switzerland; SWE = Sweden; UK = United Kingdom; DEN = Denmark; BEL = Belgium; FIN = Finland; ITA = Italy; GER = Germany; FRA = France; JAP = Japan.

At a national level one already starts to encounter difficulties when one compares peer judgement of national committees with results of citation analysis as we saw in the previous section [8]. Applying citation analysis to even smaller entities such as research groups or individual scientists should be performed with care and can certainly not be substituted for by a priori quality labeling of individual papers by means of the impact factor of scientific journals.

8. Conclusions

1. The impact factor is a valid tool for the quality assessment of scientific journals.

2. The impact factor is not valid for the assessment of the quality of individual papers.

3. The impact factor is not valid for the assessment of the quality of individual scientists.

4. The impact factor is not valid for the assessment of the quality of groups of scientists if they produce fewer than 100 papers in 2 years.

5. For quality assessment of individual papers, individual scientists and groups of scientists, citation analysis should be preferred to a priori assumptions on the quality of papers.

6. Citation analysis does not necessarily agree with peer judgement.

7. Citation analysis may render useful a posteriori information on the success of governmental and university research policy.

Footnotes

  • 1 Tel. +31 20 5663265; Fax +31 20 6975458; E-mail: t.opthof@amc.uva.nl

References

View Abstract