Animations of the intellectual and geographical dynamics of PatStat patents in terms of addresses and IPC classes
The routines that are discussed here, are derived from a similar set for USPTO patents at http://www.leydesdorff.net/ipcmaps/dynamic, but modified for patents in PatStat (2013). Within the environment of the latter database, one can use the following routine (SQL) to export the data in the format that can be used as input:
-- This script uses the table tls206x_person table generated by the script
-- provided by GianLuca Tarasconi, see http://rawpatentdata.blogspot.com
-- for more info
-- File 3 This gives all patents with CPC_CLASS_SYMBOL
SELECT t1.appln_id, CONCAT(t1.appln_auth, t1.appln_nr), t1.appln_filing_date FROM tls201_appln AS t1
INNER JOIN tls224_appln_cpc AS t2
ON t1.appln_id = t2.appln_id
AND cpc_class_symbol LIKE "%Y02E%10/541%"
INTO outfile 'applyear.csv';
-- File 1 with inventors incuded in tls206x_person
-- this script assumes an extra column with full country name associated with ctry_code
SELECT distinct t1.appln_id, CONCAT(t1.appln_auth, t1.appln_nr), city, ctry_code, country_name, invt_seq_nr
(tls201_appln AS t1 INNER JOIN tls224_appln_cpc AS t3
ON t3.cpc_class_symbol LIKE "%Y02E%10/541%"
AND t1.appln_id = t3.appln_id )
LEFT JOIN (tls207_pers_appln AS t4 INNER JOIN tls206x_person AS t2
ON t4.person_id = t2.person_id AND applt_seq_nr = 0 )
ON t1.appln_id = t4.appln_id
ORDER BY t1.appln_id, invt_seq_nr
INTO OUTFILE 'address.csv';
-- File 2
SELECT t1.appln_id, t1.appln_auth, t1.appln_nr, t2.ipc_class_symbol
FROM ( tls201_appln AS t1
INNER JOIN tls224_appln_cpc AS t3 on t1.appln_id = t3.appln_id
AND t3.cpc_class_symbol LIKE "%Y02E%10/541%")
LEFT JOIN tls209_appln_ipc AS t2
ON t1.appln_id = t2.appln_id
ORDER BY t1.appln_id ASC
INTO OUTFILE 'applnipc.csv';
Floortje Alkemade, September 12, 2013.
This routine provides the unique patent numbers in all three files (for relational database management), and respectively the attribution of inventor addresses (File 1: to be named “address.csv”) and IPC classes (File 2: to be named “applnipc.csv”) to these patents (File 3: to be named “applyear.csv”). A modified version with addresses of applicants (instead of inventors) can be found from <here>. The files have to be stored with their original names in a folder with the various files and routines that are discussed below.
At the date of our research we thus retrieved 3,428 priority patents attributed with Cooperative Patent Class Y02E10/541, covering CuInSe2 as material in photovoltaic cells. We used this set for further developing the methodology.
The three files in .csv-format can be read by patstat1.exe (to be downloaded into the same folder). This program generates input files for follow-up routines:
1. In order to obtain
output for IPC-mapping (see http://www.leydesdorff.net/ipcmaps
for USPTO data) ipc.dbf containing the baseline information is
required. (Right-click for saving if necessary.) Without this
data, the routine will ask for it! I advise to download also cos_ipc3.dbf
When these files are present, Rao-Stirling diversity among the IPC classes will
be computed (for three and four digits, respectively, and using [1 – cosine]
as the measure for technological distance); results are stored into the file
The program generates two main output files: vos3.txt and vos4.txt which can be read into VOSViewer for the mapping at the respective level (that is, 3 or 4 digits of the IPC-8). (The map is based on aggregated citations among IPC classes in USPTO data 1975-2012; see http://www.leydesdorff.net/ipcmaps for more details.)
One can continue with ps_ipcyr.exe for the dynamic animation. The program uses the application dates of the patents. (The geo-information is not needed for this analysis.) The user is prompted for a window in terms of numbers of years upon the consecutive years of application. The default choice of 1 leads to separate files for each year, but with low numbers of patents (and IPC categories) one may wish to extend this moving aggregate. Three years, for example, will add to “1984” also the data for “1985” and “1986”, etc.
The routine generates two sets of files that can be input into VOSViewer for drawing maps: (1) VOS3year.txt and (2) VOS4year.txt – where “year” is replaced with “1984,” “1985”, etc., for each (first) year present in the overall set (as a filing date). An example of animating the set of 3,428 patents is provided at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/cuinse2.patstat.pptx.
The routine ps_ipcyr.exe also writes a file rao.dbf containing the Rao-Stirling diversity for each year: Diversity Δ = Σi≠j pi pj Dij (Rao, 1992; Stirling, 2007). The technological distance D is calculated as (1 – cosine) in the vector space if the files cos_ipc3.dbf and cos_ipc4.dbf are available in the same folder. (All routines assume that all files and programs operate in the same folder!) For the above set of 3,428 patents (CuInSe2; Y02E10/541), the result is then as follows:
2. Patstat1.exe generates in the initial round aksi a file “cit_inv.txt” containing all addresses used in the set. These addresses can be geo-coded for the mapping. One can geocode this set—for example, at http://www.gpsvisualizer.com/geocoder—by cutting and pasting the content of this file into the upper window. (One can obtain an API key for Bing Maps as instructed.) After geocoding, the content of the lower window is copied into a file geo.txt that is saved in the same folder as the other files. (One may have to cut and paste the results into a single file in the case of more than 1,000 addresses.)
These files are read and recombined by patstgeo.exe into a file patents.txt that can be input into http://www.gpsvisualizer.com/map_input?form=data This generates in this case the global map at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/patstat.htm. The generation of the network of co-inventors (or co-applicants) may be time-consuming in the case of many co-invented patents.
One can thereafter use ps_geoyr.exe for generating the dynamic animation from files for the various years. This routine extracts subsets from the larger set of geo-codes (etc.) using specific application years. These files are written as “pat1976.txt”, “pat1977.txt”, etc. from the first occurring year until the last year. Inventor addresses or assignee addresses are used depending on which ones were used at the aggregated level. Networks of co-invention or co-assignment are correspondingly added. In the case of large files, the generation of network data can be time-consuming. The network files are written as affiliations matrices in the format of Pajek with the names “paj1976.txt”, “paj1977.txt”, etc. into the same folder.
The files can be used as inputs at http://www.gpsvisualizer.com/map_input?form=data, precisely as described for the aggregated set (see also at http://www.leydesdorff.net/software/patentmaps). An example can be found at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/index.htm for the set of patents downloaded for “Y02E10/541”. Y02E10/541 indicates among the “technologies of applications for mitigation and adaptation against climate change (Y02),” the subclass of patents classified as “CuInSe2 material PV cells” in the new (2013) Cooperative Patent Classification scheme between USPTO and EPO.
The html files for the different years need some editing:
· in the header one may wish to change the title;
· a Google API-key has to be inserted at line 13; (the API-key is freely available from Google;)
· one may wish to harmonize options in lines 51-53 across the files such as centering, initial zooming levels, and types of maps;
· (I also changed the legend in the infobox at lines 32-36).
The html formats (e.g., frame) can be copied from my webpage if so wished.
3. The two files can also be combined as at http://www.leydesdorff.net/photovoltaic/cuinse2/dualmix.html , but this project is still under development.
There is an add-in for PowerPoint at http://skp.mvps.org/liveweb.htm (freeware) that allows embedding Google Maps interactively into presentations. Unfortunately, this add-in does not allow for export as a movie.
September 16, 2013.