Homepage  |  Publications  |   Software; data  |   Courseware; indicators  |   Animation  |   Geo  |   Search website (Google)


Referenced Publication Years Spectroscopy (RPYS)

    Werner Marx, Lutz Bornmann, Andreas Barth, and Loet Leydesdorff, Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS). Journal of the Association for Information Science and Technology 65(4) (2014) 751-764.


Loet Leydesdorff, Lutz Bornmann, Werner Marx, and Staša Milojevič, Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST , Journal of Informetrics 8(1) (2014) 162-174.


The program rpys.exe enables the user to generate a spectrogram of cited references in a document set harvested from the citation indices at the Web-of-Science (WoS). Using yearcr.exe, one can zoom in on specific peaks representing publication years. The results can be disambiguated using RefMatchCluster.jar. The three programs are explained below.




The input data is downloaded from Web-of-Science in the format “Other File Formats” (that is, with tags such as “AU” for authors, and “TI” for titles). The program assumes a file with this input and the name “data.txt”. Note that the default file name of WoS is “savedrecs.txt”, etc. Please, aggregate downloads and rename the resulting file into “data.txt”.


The output is the organization of all information in files which allow for relational database management (e.g., MS Access) or separate analysis as spread-sheets. These files are in the dBASE-format. See also at http://www.leydesdorff.net/software/isi for more details about the file organization. All files need to be in and are written to a single folder (e.g., C:\temp\ …). I advise to run the program from the C-prompt in this folder because one then obtains error messages if something goes wrong.


Specifically, this routine generates RPYS.DBF if the cited references were also downloaded. When opened in Excel, RPYS.DBF can be used for “Referenced Publication Years Spectography” (Marx et al., 2014) by drawing a scatterplot.


In Excel:

   1. File > Open > <rpys.dbf>;

   2. Ctrl-A;

   3. Insert > Scatter > Scatter with Smooth Lines; (no markers!)

   4. Adjust the x-axis to an appropriate time scale.


Figure 1: 733 references from before 1970 in my publications.


All cited references from which the file rpys.dbf is derived, can be found in cr.dbf (in the same folder).


For advanced users: A similar file median.dbf is generated that contains in addition to the field “rpys” a field “median” which de-trends and normalizes the value by subtracting the median of the years (y-2, y-1, y, y+1, and y+2). One can use this file for drawing a picture in Excel, analogously. Another field “quantile” with quantile values is based on a transformation of the values in the column rpys into quantile values that can be used for generating a heat map in Excel (Bornmann, Thor, Marx, & Leydesdorff, in preparation; Comins & Hussey, 2015). In order to generate a heatmap in Excel use Home > Conditional Formatting > Color Scales > etc. See also: http://www.excel-university.com/heat-maps-in-excel/


The routine rpys.exe overwrites files from previous runs. Save results elsewhere! 

Without sufficient diskspace an error is generated and the routine stops.



The current version is a beta-version. Please, provide feedback for further improvements if bugs are encountered. Carefully check the output on errors! [The source code (written for Flagship v7/Clipper 87) is available from here. It can also be compiled for the Unix or OS X using Harbour at http://harbour.github.io/ ]


I acknowledge Lutz Bornmann Werner Marx, and Staša Milojević, for the collaboration during the development of this routine.



Inspection of the spectrogram or heat map leads to a focus on specific years. Using the files produced by rpys.exe, one can run yearcr.exe in the same (sub)folder. This routine generates a ranked frequency listing of cited references in the document set(s) under study for a specific year or a range of years. The user is prompted to specify an initial year (e.g., 1965) and a last year to be used in the analysis if so wished.


Default values are "1965" for both the initial year and the current year, but one can change this interactively. The initial year and last year are included into the analysis. The result is a file yearcr.dbf that contains the cited references for the years of interest.




The file yearcr.dbf may contain different variants of the same reference. This java applet matches references and clusters them in the case of sufficient similarity. A manual is available at http://www.leydesdorff.net/software/rpys/refmatchcluster.txt . The following command from the C-prompt will do the job for most practical cases:


java -jar RefMatchCluster.jar -input=yearcr.dbf -matcher=journal_short,Levenshtein,0.75 -matcher=lastname,Levenshtein,0.75 -match=yearcr_match.csv -cluster=yearcr_cluster.dbf -aggregate=cleaned.csv


The output files can be specified as either csv (comma-separated variables) or dbf-files. The input file is yearcr.dbf generated by yearcr.exe as specified above. The file cleaned.csv (or cleaned.dbf) contains the same data as yearcr.dbf, but similar records are merged and a column is added with cluster numbers that correspond to the numbers added in yearcr_cluster.dbf (or yearcr_cluster.csv). The Levenshtein algorithm for string-matching is used at the threshold level of 0.75. One can change this value and use the trigram-algorithm instead of Levenshtein if so wished.


In the case of problems with the decimal point in (e.g., German) versions of Excel, one can add appropriate parameters as in the batch file refmatchcluster.bat. One should take care that the program java.exe can be found by first, for example, typing “path C:\Program Files\java\jre7\bin” at the C-prompt (if necessary).




Barth, A., Marx, W., Bornmann, L., & Mutz, R. (2014). On the origins and the historical roots of the Higgs boson research from a bibliometric perspective. The European Physical Journal – Plus, 129(111) <preprint> <paper>

Bornmann, L., Thor, A., Marx, W., & Leydesdorff, L. (in preparation). Identifying seminal works most important for research fields: Softward for the Reference Publication Year Spectroscopy (RPYS).

Comins, J. A., & Hussey, T. W. (2015). Compressing multiple scales of impact detection by Reference Publication Year Spectroscopy. Journal of Informetrics, 9(3), 449-454.

Leydesdorff, L., Bornmann, L., Marx, W., & Milojević, S. (2014). Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST , Journal of Informetrics 8(1) (2014) 162-174.

Marx, W., & Bornmann, L. (2014). Tracing the origin of a scientific legend by reference publication year spectroscopy (RPYS): the legend of the Darwin finches. Scientometrics, 99(3), 839-844.

Marx, W., Bornmann, L., Barth, A., & Leydesdorff, L. (2014). Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS). Journal of the Association for Information Science and Technology, 65(4), 751-764.

Wray, K. Brad & Bornmann, L. (2015). Philosophy of science viewed through the lense of “References Publication Years spectroscopy” (RPYS). Scientometrics, 102(3), 1987-1996<preprint> <paper>