/software/index.html">Software  |   Courseware; indicators  |   Animation  |   Geo  |   Search website (Google)

Using Patent Classifications for the Mapping: Portfolio and Statistical Analysis,

and the Comparison of Strengths and Weaknesses

Loet Leydesdorff,*
[a] Dieter Franz Kogler,[b] & Bowen Yan [c]


Figure 9b: Comparison between 276 patents granted to Novartis vs. 350 patents granted to Merck Sharpe and Dome in 2016.

This map can be web-started at  http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/cpc_cos/portfolio/fig9b.txt&label_size_variation=0.25&scale=1.0 or http://j.tinyurl.com/hwz6275

 1.      Preparing input files

a.      Download the following files from http://www.leydesdorff.net/cpc_cos/portfolio into a single folder on your hard disk:


          cpc.dbf (with basic information about the classes);

          uspto1.exe (needed for the downloading of USPTO patents);

          cos_cpc.dbf (needed for the computation of distances on the map);

b.      Run cpc.exe.  


2.  Options within cpc.exe

a.      The program asks for a short name (≤ 10 characters) in each run. This name will be used as the variable label in later parts of the routine;

b.      The first option is to download the patents from USPTO at http://patft.uspto.gov/netahtml/PTO/search-adv.htm; detailed instructions for the downloading can be found at http://www.leydesdorff.net/ipcmaps;

c.      USPTO has a maximum of 1000 records at a time; but one is allowed to follow-up batches; after the download is completed, save the files in another folder or as a zip file;


3. The incremental construction of the files matrix.dbf and rao.dbf

a.      After each run, a column variable is added to the (local) file matrix.dbf containing the distribution of the 654 CPC classes in the document set under study. If the file matrix.dbf is absent, it is generated de novo and the current run is considered as generating the first variable; matrix.dbf can be read by Excel, SPSS, etc., for further (statistical) analysis;

b.      Similarly, a row variable is added after each run to the file rao.dbf containing diversity measures (explained in the article) as variables. This file is also de novo generated if previously absent. Distances are based on [1 cos(x,y)] for each two distributions x and y of aggregated citation at the level of CPC-4 classes;

c.      The routine cpc2cos.exe reads the file matrix.dbf and produces cosine.net and coocc.dat as (normalized) co-occurrence matrices that can be used in network analysis and visualization programs such as Pajek or UCInet.  


4. Output files in each run

a.   The file vos.txt can be read by VOSviewer for mapping the portfolio under study at the four-digit level of CPC; the distances and colors (corresponding to clusters) in the maps are based on the base-map provided in Figure 2 of the paper;

b.     The files cpc.vec and cpc.cls can be used as a vector and cluster files in the Pajek file provided at http://www.leydesdorff.net/cpc_cos/ . This allows for layouts other than VOSviewer and for more detailed network analysis and statistics. The file cpc.cls is a so-called cluster files which can be used in Pajek, among other things, for the extraction of partitions.

c.     The various fields in the USPTO records are organized in a series of databases that can be related (e.g., in MS Access) using the field nr.


5. Visual comparison among portfolios (using cpc2.exe)

One can compare two portfolios (as in Figure 9 above) using cpc2.exe (available at http://www.leydesdorff.net/cpc_cos/portfolio/cpc2.exe ).

a.      One first runs cpc.exe for the one set (e.g., city1 or industry1);

b.     Replace the downloaded patents (p1.htm, p2.htm, etc.) with the set for the second unit (e.g., city2 or industry2) and run cpc2.exe;

c.      The file vos2.txt generated is an input file to VOSviewer. The red-colored nodes indicate the CPC-4 classes in which the first unit is stronger than the second; the green-colored nodes indicate relative strength of the second set;

d.     The files cpc2.vec and cpc2.cls provide the corresponding input files for Pajek.

[a] * corresponding author; University of Amsterdam, Amsterdam School of Communication Research (ASCoR), PO Box 15793, 1001 NG Amsterdam, The Netherlands; email: loet@leydesdorff.net;

[b] School of Architecture, Planning & Environmental Policy and School of Geography, University College Dublin, Belfield, Dublin 4, Ireland; email: dieter.kogler@ucd.ie

[c] SUTD-MIT International Design Centre, Singapore University of Technology and Design, Singapore 487372; e-mail: bowen_yan@sutd.edu.sg