mapping city networks among author addresses

CityColl.exe for Geographical Analysis (including Google Map) of Coauthorship Relations among Cities

This program enables one to generate a representation of the coauthorship relations in a document set in terms of the city addresses of the participating units. Input is a set saved using ISI’s Web of Science as data.txt.

Like ISI.EXE, the program CityColl.EXE produces four databases containing the information in the original input set in relational format: au.dbf with the authors; cs.dbf with the address (“corporate sources”); core.dbf with information which is unique for each record (e.g., the title); and cr.dbf containing the cited references. The files are linked through the numbers in core.dbf. If one needs only these files, one is advised to use ISI.EXE, since the computation of the cosine is computer intensive, and therefore time-consuming.

Additionally, this program generates the following files:

1. cosine.dat provides an input file for Pajek as a visual representation of the collaboration network within this set at the city level. The matrix is normalized over the columns using the cosine. (The documents are the cases and the addresses the variables; the number of documents is unlimited; the maximum number of cities 1024; but one can set a threshold.)

2. the file cities.txt contains the city and country names. This file can be used as input into http://www.gpsvisualizer.com/geocoder/ in order to generate a Google Map.

3. coocc.dat and matrix.dbf are the files which underly cosine.dat. Coocc.dat is the file before normalization; and matrix.dbf the asymmetrical data matrix. The latter file can be used for statistical analysis in SPSS, the former for graph-analytical analysis using UCINet or Pajek.

Using coocc.dat, one can generate a Google Map with the network links, but this requires additional steps; the user is advised to use Chaomei Chen’s CiteSpace for the purpose of generating a map in Google Earth!

The following steps generate a networked map:

1. Feed cities.txt in at http://www.gpsvisualizer.com/geocoder/ . Choose, Google for the geo-encoding (because it is sometimes more precise than Yahoo!).

Copy and paste the output file into an ASCII editor (e.g., NotePad) and save as “geo.txt” into the folder where cities.txt was generated.

2. Run geo2kml.exe . You will be prompted for a file name: provide “geo.txt” as filename, for example. The program produces a file citycoll.kml which contains the necessary information for a map overlay in Google Maps. This file can be read and adapted in an ASCII editor.

3. Googe Maps reads this file when uploaded to the internet. Provide the URL within Google Maps.

At http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=http:%2F%2Fwww.leydesdorff.net%2Fgmaps%2Fcitycoll.kml&sll=37.0625,-95.677068&sspn=50.644639,79.013672&ie=UTF8&t=h&z=2 one finds, for example, my co-authorship network (1975-2009).

The program is based on DOS-legacy software. It runs in a MS-Dos Command Box under Windows. The programs and the input files have to be contained in the same folder. The output files are written into this same directory. Please, note that existing files from a previous run are overwritten by the program. The user is advised to save output elsewhere if one wishes to continue with these materials.

input files

The input file has to be saved as a so-called marked list in the tagged format from the Science Citation Index (Social Science Citation Index, Arts & Humanities Citation Index) at the Web-of-Science. The default filename “savedrecs.txt” should not be used, but “data.txt” instead.

output files

The program produces four output files in dBase IV format. These files can be read into Excel and/or SPSS for further processing. They can also be used in MS Access for relational database management. These files can be produced by using the simpler ISI.EXE (which is much less intensive in the computation).

Click here to download ISI.EXE

Like CoAuth, BibCoupl, BibJourn, and IntColl, the program additionally produces two files with the extension “.dat” (cosine.dat and coocc.dat) are in DL-format (ASCII) which can be read directly into Pajek for the visualization (Pajek is freely available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/ ). These country names in these files can be edited using an ASCII editor (e.g., Notepad). A number of additional databases are coproduced:

a. matrix.dbf contains the matrix of the documents as the cases and the journal names in the references in the set as the variables. This file can be imported into SPSS for further analysis.

b. coocc.dbf contains a co-occurrence matrix of the journal names from this same data. This matrix is symmetrical and it contains the journal names both as variables and as labels in the first field. The main diagonal is set to zero. The number of co-occurrences is equal to the multiplication of occurrences in each of the texts. (The procedure is similar to using the file matrix.dbf as input to the routine “affiliations” in UCINet, but the main diagonal is here set to zero in this matrix.) The file coocc.dat contains this information in the DL-format.

c. cosine.dbf contains a normalized co-occurrence matrix of the journal names from the same data. Normalization is based on the cosine between the variables conceptualized as vectors (Salton & McGill, 1983). (The procedure is similar to using the file matrix.dbf as input to the corresponding routing in SPSS.) The file cosine.dat contains this information in the Pajek-format. The size of the nodes is equal to the logarithm of the occurrences of the respective author; this feature can be turned on in Pajek.

Click here to download Coauth.EXE

Click here to download IntColl.Exe
Click here to download InstColl.Exe
Click here to download BibCoupl.EXE

Click here for similar programs for Full Text and Co-Word Analysis

return to home page