A user-friendly method for generating overlay maps

The Integrated Impact Indicator I3

One is inclined to conceptualize impact in terms of citations per publication, and thus as an average. The so-called Impact Factor of journals, for example, is also an average. However, citation distributions are skewed and the average has the disadvantage that the number of publications is used in the denominator. Thus, a principal investigator has a higher average citation rate than s/he and her junior team together. However, the impact of the group is larger than that of the individual. In other words, size matters for impact.

Leydesdorff & Bornmann (2011) therefore replaced averaging with integration of the citation curve, but not after qualifying the underlying publications in terms of their respective percentiles: a top-1% publication obtains 100 percent points whereas an average publication gets only 50 points. This rescaling from zero to hundred makes it possible to compare different sets and different citation distributions in terms of their impact. The results of the measurement can be used as input to non-parametric statistics which are, for example, available in SPSS.

This website provides routines to compute I3 for a set of papers downloaded from the Web-of-Science (v5). First, this set can be organized in a relational database using ISI.exe. ISI.exe uses as input the download in the tagged format of the WoS which is available in the same folder and named “data.txt”. The output is a set of databases (.dbf) which can be read using Excel or SPSS. For example, authors are organized into au.dbf and email addresses into em.dbf. (The various files are related in terms of the field “nr;” MSAccess can be used for relational database management.)

The resulting files can be used by isi2i3.exe as input. This program will transform core.dbf into i3core.dbf, au.dbf into i3au.dbf, and cs.dbf into i3cs.dbf. The program may take a while; in the case of large files, one can perhaps leave it over night. When one uses the c-prompt, the routine either finishes (after a while) or provides an informative error message.

The resulting files (e.g., i3core.dbf) are only different from the input files in a number of additional fields: the field i3f provides the value of i3 normalized as percentiles in relation to the set under study (“the field”), and i3j is normalized at the level of each journal. Analogously, r6f and r6j provide these values for the six percentile ranks used by the NSF: top-1%, top-5%, top-10%, top-25%, top-50%, and bottom-50%. The transformations are performed for the document types Articles, Reviews, Letters, and Proceedings Papers.

The attributions to the percentile classes are based on quantile values; the quantile value of a paper is equal to the number of papers with “times cited” in the reference set smaller than the “times cited” of the paper under study divided by the total number in the set (times 100).

(In order to prevent distortions for sets smaller than 100, this value can be augmented with 0.9 if the quantile value is larger than zero; Leydesdorff & Bornmann (2011) used this correction. For example, if a journal publishes only ten reviews in a single year, the most-highly cited one can only have nine values lower than its own and thus would be rated in the 90^th percentile without this correction; with the correction it is rated in the 99^th percentile; etc., mutatis mutandis. For sets larger than 100, the difference of 0.9 disappears in the rounding. Rousseau (2011; Rousseau, 2012) proposed another correction. We reacted to that in Leydesdorff & Bornmann (in press; at http://arxiv.org/abs/1112.6281). In order to prevent confusion, option 1 provides the pure quantiles as defined in the previous paragraph; option 2 uses Rousseau’s counting rule.

The third option—which is now (April, 2012) the default—solves the issue by using “fractional attribution” of the times cited to the percentile ranks (Schreiber, 2012; Leydesdorff & Bornmann, in preparation). Schreiber (2012) also proposed a way to correct for tied ranks that is implemented in this version in option 3. However, the option is computationally intensive and would make the program slow. The program therefore only uses this counting rule for the whole set (the “field”) and not for subsets (“journals”). For the latter, option 1 was not changed. When one is not interested in this multi-level problem, one is strongly advised to use option 3.)

Isi2i3.exe furthermore generates a number of summary tables that one can use: i3so.dbf summarizes the data after aggregation at the journal level (“so” for source); i3cntry.dbf for aggregation at the country level; i3inst at the and institutional level; i3au at the level of authors. These aggregations can also be made by using pivot tables in Excel or “Aggregate cases” in SPSS. Note that the results for authors and addresses are “integer counted”: each record is counted as one, whereas fractional counting would imply attributing credit proportionally in the case of multi-authored papers.

I3cs.dbf can be used as input for the generation of overlays to Google Maps strictly analagous to the procedures used by Leydesdorff & Persson (2010) <at http://www.leydesdorff.net/maps> and Bornmann & Leydesdorff (2011) <at http://www.leydesdorff.net/topcity>. Instead of cities1.exe and cities2.exe, one uses i3cit1.exe and i3cit2.exe. Instead of inst1.exe and inst2.exe, one uses i3inst1.exe and i3inst2.exe. I3cit2.exe and i3inst2.exe directly produce the various output files among which ztest.txt. A third step is not needed; between the first and second step cities.txt or inst.txt has to be geocoded. An example is provided at http://www.leydesdorff.net/nano2011/nano2011.htm which shows a Google Map with the performance of cities worldwide in the field of 15 core journals of nanotechnology (Leydesdorff, in preparation).

References

The following paper explains the concept of integrated impact indicators:

Loet Leydesdorff & Lutz Bornmann (2011), Integrated Impact Indicators (I3) compared with Impact Factors (IFs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology 62(11) 2133-2146.
Loet Leydesdorff & Lutz Bornmann (in press). Percentile Ranks and the Integrated Impact Indicator (I3). Journal of the American Society for Information Science and Technology; preprint available at http://arxiv.org/abs/1112.6281.
Loet Leydesdorff & Lutz Bornmann (in preparation), Accounting for the Uncertainty in the Evaluation of Percentile Ranks.
Ronald Rousseau (2011), Percentile rank scores are congruous indicators of relative performance, or aren’t they? ; at http://arxiv.org/pdf/1108.1860
Ronald Rousseau (2012), Basic properties of both percentile rank scores and the I3 indicator, Journal of the American Society for Information Science and Technology, 63(2), 416-420; DOI: 10.1002/asi.21684.
Schreiber, M. (in press). Inconsistencies of Recently Proposed Citation Impact Indicators and how to Avoid Them. Journal of the American Society for Information Science and Technology; preprint available at http://www.arxiv.org/abs/1202.3861.

For the use of Google maps:

Lutz Bornmann and Loet Leydesdorff, Which cities produce worldwide excellent papers more than expected? A new mapping approach—using Google Maps—based on statistical significance testing. Journal of the American Society for Information Science and Technology (in press); [software & manual]
Loet Leydesdorff & Olle Persson, Mapping the Geography of Science: Distribution Patterns and Networks of Relations among Cities and Institutes, Journal of the American Society for Information Science & Technology 61(8) (2010) 1622-1634; <pdf-version> <software and manual>

A study entitled “An Evaluation of Impacts in ‘Nanoscience and Nanotechnology:’ Steps towards standards and statistics for citation analysis” integrates the various tools and is forthcoming in Scientometrics.

Amsterdam, April 7, 2012 (revised).