Monday, June 29, 2009

A Principal Component Analysis of 39 Scientific Impact Measures

A Principal Component Analysis of 39 Scientific Impact Measures

Bollen J, Van de Sompel H, Hagberg A, Chute R, 2009 A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE 4(6): e6022. doi:10.1371/journal.pone.0006022


The impact of scientific publications has traditionally been expressed in terms of citation counts. However, scientific activity has moved online over the past decade. To better capture scientific impact in the digital era, a variety of new impact measures has been proposed on the basis of social network analysis and usage log data. Here we investigate how these new measures relate to each other, and how accurately and completely they express scientific impact.


We performed a principal component analysis of the rankings produced by 39 existing and proposed measures of scholarly impact that were calculated on the basis of both citation and usage log data.


Our results indicate that the notion of scientific impact is a multi-dimensional construct that can not be adequately measured by any single indicator, although some measures are more suitable than others. The commonly used citation Impact Factor is not positioned at the core of this construct, but at its periphery, and should thus be used with caution.

Received: May 14, 2009; Accepted: May 26, 2009; Published: June 29, 2009



A variety of impact measures can be derived from raw citation data. It is however highly common to assess scientific impact in terms of average journal citation rates. In particular, the Thomson Scientific Journal Impact Factor (JIF) [1] which is published yearly as part of the Journal Citation Reports (JCR) is based on this very principle; ... .

The JIF has achieved a dominant position among measures of scientific impact for two reasons. First, it is published as part of a well-known, commonly available citation database (Thomson Scientific's JCR). Second, it has a simple and intuitive definition. The JIF is now commonly used to measure the impact of journals and by extension the impact of the articles they have published, and by even further extension the authors of these articles, their departments, their universities and even entire countries. However, the JIF has a number of undesirable properties which have been extensively discussed in the literature [2], [3], [4], [5], [6]. This had led to a situation in which most experts agree that the JIF is a far from perfect measure of scientific impact but it is still generally used because of the lack of accepted alternatives.

The shortcomings of the JIF as a simple citation statistic have led to the introduction of other measures of scientific impact. Modifications of the JIF have been proposed to cover longer periods of time [7] and shorter periods of times (JCR's Citation Immediacy Index). Different distribution statistics have been proposed, e.g. Rousseau (2005) [8] and the JCR Citation Half-life (​s/citationanalysis/citationrates/ ). The H-index [9] was originally proposed to rank authors according to their rank-ordered citation distributions, but was extended to journals by Braun (2005) [10]. Randar (2007) [11] and Egghe (2006) [12] propose the g-index as a modification of the H-index.


Since scientific literature is now mostly published and accessed online, a number of initiatives have attempted to measure scientific impact from usage log data. The web portals of scientific publishers, aggregator services and institutional library services now consistently record usage at a scale that exceeds the total number of citations in existence. In fact, Elsevier announced 1 billion fulltext downloads in 2006, compared to approximately 600 million citations in the entire Web of Science database. The resulting usage data allows scientific activity to be observed immediately upon publication, rather than to wait for citations to emerge in the published literature and to be included in citation databases such as the JCR; a process that with average publication delays can easily take several years. Shepherd (2007) [19] and Bollen (2008) [20] propose a Usage Impact Factor which consists of average usage rates for the articles published in a journal, similar to the citation-based JIF. Several authors have proposed similar measures based on usage statistics [21]. Parallel to the development of social network measures applied to citation networks, Bollen (2005, 2008) [22], [23] demonstrate the feasibility of a variety of social network measures calculated on the basis of usage networks extracted from the clickstream information contained in usage log data.

These developments have led to a plethora of new measures of scientific impact that can be derived from citation or usage log data, and/or rely on distribution statistics or more sophisticated social network analysis. However, which of these measures is most suitable for the measurement of scientific impact?

This question is difficult to answer for two reasons. First, impact measures can be calculated for various citation and usage data sets, and it is thus difficult to distinguish the true characteristics of a measure from the peculiarities of the data set from which it was calculated. Second, we do not have a universally accepted, golden standard of impact to calibrate any new measures to. In fact, we do not even have a workable definition of the notion of “scientific impact” itself, unless we revert to the tautology of defining it as the number of citations received by a publication. As most abstract concepts “scientific impact” may be understood and measured in many different ways. The issue thus becomes which impact measures best express its various aspects and interpretations.

Here we report on a Principal Component Analysis (PCA) [24] of the rankings produced by a total of 39 different, yet plausible measures of scholarly impact. 19 measures were calculated from the 2007 JCR citation data and 16 from the MESUR project's log usage data collection ( We included 4 measures of impact published by the Scimago ( group that were calculated from Scopus citation data. The resulting PCA shows the major dimensions along which the abstract notion of scientific impact can be understood and how clusters of measures correspond to similar aspects of scientific impact.


PDF Available at


See Also

MESUR For Measure: MEtrics from Scholarly Usage of Resources

1 comment:

Anonymous said...

Thank you: very helpful indeed!

- B. Van der Veer Martens