A user-friendly method for generating overlay maps

These two routines can be used for making complete matrices at the article level in the Pajek and SPSS formats for the analysis of citations and medical subject headings, respectively. The two matrices can also be combined.

CitNetw.EXE generates the citation matrix with the citing papers in the rows and cited references in the columns as (i) mtrx.net for the Pajek format and (ii) mtrx.sps + mtrx.txt for SPSS. If so wished, one can transpose this matrix in Pajek or SPSS. The matrix is binary, asymmetrical, 2-mode, and directed. One can process the file “mtrx.net” further in Pajek, UCInet, or Gephi, etc. The file lcs.net contains the bounded network of citations among the documents under study. This file can be used, for example, for main path analysis (see below).

Input is a file “data.txt” based on downloads from WoS in the “plain text” format (tagged). This file is first processed into a format for relational database management, calling the routine isi.exe . One is prompted for skipping this if it was already done in a previous round.

If one wishes to combine this with medical subject headings, the files mtrx.* should first be saved and stored elsewhere; MHNetw.exe overwrites these files.

The objective of MHNetw.EXE to combine Medical Subject Headings (MeSH) and citation information at the article level. The MeSH are first retrieved from the PubMed database and can be organized into relational data using the routine pubmed.exe at http://www.leydesdorff.net/pubmed . Note that one also needs the file <pubmed.dbf> to be present in the same folder as the data and pubmed.exe.

Alternatively, one can retrieve the data from Medline in WoS. The advantage of retrieval from PubMed above retrieval from WoS is that there is no limitation of 500 records each time.) The data from either source has first to be organized in the same folder using PubMed.Exe. The program prompts with a question about either source. Input data have to be renamed “data.txt”.

Output of MHNetw.exe is:

· mtrx.net (Pajek) and mtrx.sps (for SPSS) containing the citing papers as rows and the MeSH as variables in the columns (analogous to CitNetw.exe).

· A file called “string.wos” which contains the search string for obtaining citation information at Web of Science (advanced search).

· The citation scores are written into the file with article descriptors ti.dbf in a field “tc”; citation scores are summed for MeSH into mh1.dbf.

· The file “string.wos” can be used to generate the corresponding file in the Science Citation indices of WoS; the file “string.pubmed” contains analogously the search string if one has worked from the WoS interface.

· The file cr_mh.net contains the citation information (cited references, CR) in the rows and the medical subject headings (MH) in the columns. The cell values provide the number of documents in which cited references and MeSH co-occur.

· The file jcr_mh.net contains the abbreviated journal names in the cited references (CR) in the rows and the medical subject headings (MH) in the columns. The cell values provide the number of documents in which the cited journals and MeSH co-occur.

· The file jcr_mh_a.net contains the same information (abbreviated journal names and MeSH categories), but differently organized: both are attributed as variables to the documents under study as the cases. Within Pajek, one can convert this matrix into an affiliations matrix (using Network > 2-Mode Network > 2-Mode to 1-Mode > Columns). One can also export this file to SPSS for cosine-normalization of the matrix.

Note that the asterisks in MeSH are discarded in this (beta) version. All files operate only on files present in the same (temporary) folder. Note that mtrx.net, mtrx.txt, and mtrx.sps are overwritten in each run of MHNetw.exe or CitNetw.exe. One is advised to save all files mtrx.* elsewhere or to rename them for this reason.

Suggested order of the routines (when working with PubMed data):

1. Download data at PubMed from the user interface at http://www.ncbi.nlm.nih.gov/pubmed/advanced . At the results page thereafter, select under “Send to” the format option MEDLINE and download to a file which has to be (re)named “data.txt”;

2. Run pubmed.exe (with data.txt as input) in the presence of pubmed.dbf;

3. Use the resulting string “search.wos” at the advanced user interface of WoS; save the retrieval via “Marked list” in portions of 500 records. Combine the data into a file data.txt.

4. Run CitNetw.EXE; save the citation matrices in the files mtrx.* elsewhere;

5. Run MHNetw.EXE; save the matrices that one wishes to use for further analysis. This analysis may take long.

Main Path analysis

Alongside other files, CitNetw.EXE generates a file lcs.net containing the citations within the bounded domain of the document set(s) under study. (This domain corresponds to the so-called local citation scores (lcs) in HistCite™.) However, the cited references are not disambiguated, but used as they are provided by WoS. The user may wish to disambiguate the references before entering this routine (for example, by using CRExplorer.EXE.) The cited references are matched against a string composed from the citing document using the WoS-format of the cited references “Name Initial, publication year, abbreviated journal title, volume number, and page number” as follows: “Zhang CL, 2002, CLIN CANCER RES, V8, P1234”.

The output file lcs.net contains a matrix with the citing documents in the rows and the cited ones in the columns. The matrix may be somewhat different from the one which one can obtain from using HistCite™ because of different matching and disambiguation procedures.

In order to proceed with main-path analysis in Pajek, the network has to be made a-cyclical (de Nooy et al., 2011, pp. 244f.). One can make the network a-cyclical within Pajek using the following steps in this order:

1. Extract the largest component from the network:

a. Network > Create partition > Component > Weak

b. Operations > Network + Partition > Extract subnetwork > Choose cluster 1;

2. Remove strong components from the largest component:

a. Network > Create partition > Component > Strong

b. Operations > Network + Partition > Shrink network > [use default values]

3. Remove loops:

a. Network > Create new network > Transform > Remove > Loops

4. Create main path (or critical path):

a. Network > Acyclic network > Create weighted > Traversal > SPC

b. Network > Acyclic network > Create (Sub)Network > Main Paths

The subsequent choice among the options of Main Path for “> Global Search > Standard”, for example, leads to the extraction of the subnetwork with the main path; this subnetwork is selected as the active network. The main path can then be drawn and/or further analyzed.