A user-friendly method for generating overlay maps

(last updated: November 5, 2015)

Scopus.exe enables one to organize a file which is exported from Elsevier’s Scopus database in the format “RIS format (Reference Manager, ProCite, Endnote”. The program assumes an input file with the name “scopus.ris”. (If you wish to combine more than a single search, cut and paste the different outputs together, and name the result “scopus.ris”. )

The output is the organization of this information in files which allow for relational database management (e.g., MS Access) or separate analysis as spread-sheets. These files (AU.dbf; CS.dbf; CR.dbf; TI.dbf) are in the dBase-format. Additionally, a file with the abstracts (abstract.txt) and relational numbers is provided in ASCII format. The output is similar to that of ISI.exe for the Science Citation Index at the Web-of-Science (WoS). (A number of other programs for social and semantic network analysis available at http://www.leydesdorff.net/indicators can use these files for further processing (see below). Scop2WOS.exe reads the databases produced by Scopus.exe, and can generate a file “wos.txt” in a format similar to WoS. WoS.txt is in a format which can be read by HistCite™; but for making a historiogram one probably needs additional formatting.

A first major difference between the Scopus and ISI output is the use of the abbreviated journal names in the WoS output. These are used in the cited references by WoS, but Scopus uses the full journal names. A program such as BibJourn.Exe is affected by this difference. (HistCite™ may have the same problem.) The file CR.DBF provides the cited references in the Scopus format. A field “referenced publication year” (RPY) is added for so-called “referenced publication years spectroscopy” (RPYS). For Scopus data, one can use an adapted routine rpysscop.exe (that is rpys.exe for WoS, but adapted for Scopus data).

Secondly, the address information is not (yet) fully standardized. A user may wish to parse this further. The first two fields (university, department) are consistent, but the numbers of commas in distinguishing among street addresses, cities, zip-codes, and countries is not fully consistent. The country name is consistently the last subfield and therefore reliable in the file CS.dbf.

All unique fields for a single document are gathered in TI.dbf. Abstracts are stored as memory fields, but then exported into the file abstract.txt .

The current version is in the development phase. Please, feel free to feedback with suggestions for improvements. I intend to develop further interfaces between this Scopus data and other routines. Source codes (in Clipper) are available from here for Scopus.prg and Scop2WOS.prg; they can be compiled with Harbour (freeware) for any platform.

Loet Leydesdorff,

Amsterdam, 19 August 2013

(revised and updated on November 5, 2015)

How to export from Scopus

1. Run your Scopus search request;

2. Select the document entries you would like to export (e.g., All);

3. Hit “Export” and select “RIS format (Reference Manager, ProCite, EndNote)” as the export format. Scopus’ upper limit for exports is 2000 documents. Thus, you may have to split your search request into several smaller parts. If you are searching for names, this can be done easily. Otherwise, e.g. when you are searching for keywords, it may help to narrow down your results by date ranges.

4. If the ASCII export is output in your browser, save it as a text document. Usually, however, this is done automatically.

5. If you had to split your search request you may now put your export files back together into a single file.

6. Download the tool Scopus.exe (and Scop2WoS.exe if so wished) into the same folder (e.g., C:\temp\) where the export file from Scopus is located.

7. Scopus.exe does not take any argument, but it expects a file named “scopus.ris” as input. Thus, you may now rename your (merged) export file accordingly. Run Scopus.exe. (If you run it from the C-prompt, you receive error messages if something goes wrong.)

8. The output files are:

a. Ti.dbf with all information which is unique for a document; among which the abstract and the times cited (TC). Sequence numbers are generated and will be added to all output files for relational database management.

b. Abstract.txt contains the abstracts as a text file with the numbering used in Ti.dbf for relational purposes;

c. AU.dbf contains all authors (full names, last names, initials) in the order of appearance in the file plus a sequence number for the relational database management;

d. CS.dbf contains all address information analogously. NOTE: the correspondence address is included in Ti.dbf since it is unique;

e. CR.dbf contains all cited references with sequence numbering. (The file is yet not further parsed except for the publication years; see above.)

f. KW.dbf contains the keywords with sequence numbering so that this information can be related to the other files;

g. WOS.txt (output of Scop2WOS.exe) can be used for a number of routines available from my website that assume WOS-input.

9. ADVANTAGES:

a. Using au.dbf, for example, one can easily generate a co-authorship network;

b. Using cs.dbf a network of international collaborations can be constructed;

c. The field TI in Ti.dbf can be used for semantic mapping by using, for example, ti.exe.
See: Pajek Manual: How to analyze frames using semantic maps of a collection of messages?

d. Using abstract.txt one can draw the semantic maps for the abstract words using fulltext.exe;

e. Citation analysis for evaluative purposes (e.g., in terms of percentiles) can be based on the field TC in Ti.dbf;

f. Differently from WoS, the citations in CR.dbf contain the titles and all coauthors (and thus the file is suited for semantic and coauthor mapping).

g. As noted, the field “rpy” is added to CR.dbf for the purpose of “referenced publication years spectography” (RPYS).

(adjusted from:)

Benjamin Schwalb,

22 August 2011

return to home page