Hands-on class, power, influence, factors, 25 April 2007

Hands-on class on Science & Technology Indicators: Centrality and Latency.

25 April 2007

In this class we turn to elements from social networks analysis and multivariate analysis which can enrich and validate some of the results which were generated in the previous classes more pragmatically. Hopefully, you have read the chapter on centrality measures at http://www.faculty.ucr.edu/~hanneman/nettext/C10_Centrality.html by Hanneman & Riddle (2005).

Let’s first use Pajek for exploring the three centrality measures which are most common in the literature: degree centrality, closeness, and betweenness centrality. Like most measures common in social network analysis, these are available in Pajek under “Net”. We will use again the citation matrix for the journal Research Policy available at http://www.leydesdorff.net/jcr05 and load this file into Pajek. Run now: Net > Partition > Degree > All. (Since the file contains cosine values and is one-mode, it makes no sense to distinguish between indegree and outdegree in this case.) If you draw the partition, you obtain another color scheme than we obtained last week. Can you understand the colors in terms of the number degrees in- or outgoing from each node? Which is the journal with the highest degree?

Scientometrics, for example, has three links in this picture. If we edit the partition (File > Partition > Edit), the partition number is six. This is because Pajek counts both in- and outdegrees. You can also click on the partition information itself and this provides you with a new window containing the partition information. This information can be saved and then be read into Excel or SPSS for further processing. Let’s save it as part1.rep. (The extension .rep is provided by Pajek.)

In addition to drawing the partition, one can now draw the vector of the partition information: Draw > Draw vector. In this case, the nodes have different sizes. Because the vector is normalized, the size is not so large, but you are able to enlarge it. It visualizes immediately how central nodes are. Under Options > Mark Vertices with > Vector Values, you can also visualize these values. The values are different from the ones which we saw above as partition information because the vector values are normalized between 0 and 1. Research Policy, for example, scores 0.59 on this vector. (A vector is a different word for a variable.) You can obtain this vector information also by returning to the main menu and by clicking on the bar which is indicated “Vectors” and shows “Normalized All Degree Partition.” Save this file as part2.rep.

Open Excel and read part2.rep. Follow the suggestions of Excel for opening the file. You find the values in the B-column and the names in the D-column. In many cases, we transform the values in the B-column into percentages by entering into the E-column “= B1 * 100” and then dragging the percentages down the column using the technique of Excel. You can also open similarly the file in SPSS and compute v2 into a new variable (v2 * 100) under Transform > Compute.

The advantage of Excel is the ease of sorting and visualizing the data. Select the whole spreadsheet and go to Data > Sort. Which field should be sorted in which order (ascending or descending)? Once you have sorted the field, click columns D and E, and generate a picture by clicking on the picture icon or Insert > Chart. Select the Column option and generate the picture. Embellish it with legends along the axis and otherwise. What does it mean?

Let us return to Pajek and run Net > Vector > (1) Closeness and (2) Betweenness. Inspect the visualizations and generate similar files with the extension .rep for Closeness and Betweenness. Bring the three files into one Excel sheet and make the three measures visible using the column option of the charts. The handing in of this picture is part of the mid-term exam.

At http://www.fhk.eur.nl/personal/denooy/ one can find the Pajek data for the Top-200 in the Dutch elite 2006. Right-click on the file, save it, and open it in Pajek. The file contains 200 elite members in the rows and 395 governing bodies in the columns. By using Net > Transform > 2-Mode to 1-Mode > Rows, you can generate the network among the 200 elite members and using the various measures for centrality you can easily determine the most powerful man or woman of the Netherlands in terms of degree centrality.

Factor analysis in SPSS

The following file in the DL-format shows the 23 times 23 citation matrix of the environment of Research Policy on the basis of which the cosine-normalized file which we used above, can be generated (e.g., by using SPSS). Take a look at the file. The column with the high values represents the citations to Research Policy itself. Note the large number of zeros in such a matrix. Journals cite one another in clusters and between the clusters there are mainly empty cells.

NR=23, NC=23

FORMAT = FULLMATRIX DIAGONAL PRESENT

ROW LABELS:

IntJTechnolManage

JProdInnovatManag

Scientometrics

SocStudSci

AnnRegionalSci

CalifManageRev

CambridgeJEcon

EntrepRegionDev

EnvironPlannC

EurPlanStud

HighEduc

IndCorpChange

IntJIndOrgan

JBusVenturing

R&dManage

RegStud

ResEvaluat

ResPolicy

ServIndJ

StrategicManageJ

TechnolAnalStrateg

TechnolForecastSoc

ZSoziol

COLUMN LABELS:

IntJTechnolManage

JProdInnovatManag

Scientometrics

SocStudSci

AnnRegionalSci

CalifManageRev

CambridgeJEcon

EntrepRegionDev

EnvironPlannC

EurPlanStud

HighEduc

IndCorpChange

IntJIndOrgan

JBusVenturing

R&dManage

RegStud

ResEvaluat

ResPolicy

ServIndJ

StrategicManageJ

TechnolAnalStrateg

TechnolForecastSoc

ZSoziol

DATA:

47 12 0 0 0 22 3 4 0 0 0 9 2 19 32 8 0 85 2 102 4 12 0

4 173 0 0 0 9 0 0 0 0 0 0 2 4 11 0 0 31 0 70 0 0 0

5 0 520 25 0 0 0 0 0 0 6 2 2 0 5 4 33 74 0 0 2 4 0

0 0 21 100 0 3 0 0 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0

0 0 8 0 33 2 2 2 2 0 0 0 4 0 0 29 0 23 0 2 0 0 0

2 2 0 0 0 39 0 0 0 0 0 6 0 4 0 0 0 21 0 41 0 0 0

2 0 0 0 0 2 86 0 0 0 0 21 4 0 0 16 0 24 0 7 0 0 0

3 0 0 0 0 3 11 82 2 21 0 4 0 17 7 40 0 24 0 9 0 0 0

4 0 0 0 2 0 10 12 51 10 0 6 0 0 2 50 0 22 0 11 0 2 0

3 0 2 0 7 0 16 3 5 75 0 15 0 3 4 71 0 61 0 4 5 5 0

2 0 2 5 0 0 0 0 0 0 91 0 0 0 0 0 0 23 0 0 0 0 0

7 0 2 0 0 6 10 0 0 0 0 88 16 0 2 3 0 93 0 119 3 0 0

0 0 0 0 0 0 0 0 0 0 0 2 62 0 0 0 0 33 0 5 0 0 0

5 5 0 0 0 14 0 3 0 0 0 3 5 243 10 2 0 33 0 83 2 0 0

15 26 0 0 0 16 0 2 0 0 2 9 0 22 48 8 3 70 0 62 7 4 0

6 0 0 0 19 5 23 13 9 24 0 22 2 2 0 231 0 45 2 9 0 3 0

7 0 77 2 0 0 0 0 0 0 3 0 0 0 0 0 25 26 0 0 6 0 0

18 12 35 7 0 10 15 0 5 21 4 50 27 45 35 44 3 432 3 129 12 8 0

0 15 0 0 0 4 0 8 0 2 0 2 2 6 6 31 0 27 50 29 2 0 0

0 6 0 0 0 30 2 0 0 0 0 18 12 33 2 4 0 33 0 659 0 0 0

5 3 0 4 0 7 3 0 0 0 0 7 0 4 22 4 0 68 0 31 27 2 0

2 16 7 2 0 4 6 0 0 0 0 0 0 3 5 9 0 55 0 16 6 169 0

0 0 2 3 0 0 0 0 0 0 0 3 2 0 2 13 0 24 0 0 0 0 21

Let’s load the file into Pajek. It is a 2-Mode file because it is asymmetrical. The columns are “cited” and the rows “citing”. Generate the visualizations for the cited patterns and for the citing patterns (using Transform > 2-Mode to 1-Mode). How are the two visualizations different? Why? What does it tell you about Research Policy?

Export the file to SPSS using Tools > SPSS > Current Network. Double click on PajekSPSS at the place indicated in the report window or open this file in SPSS as a syntax file and run the file: Run > All. Now you should have the matrix in SPSS and we can perform factor analysis. Factor analysis informs you about the latent dimensions (eigenvectors) of the matrix. Can we describe this aggregated journal-journal citation matrix in terms of a number of specialties? Thus, it is a form of “data reduction”: instead of 23 journals, we are searching for a more limited number of dimensions.

Click Analyze > Data Reduction > Factor. Bring the 23 journals (not the first ID) to the right side for inclusion into the analysis. Click on Extraction and set Scree plot on. Click on Rotation and choose Varimax and Loading plots. Under Options choose Sorted by size and Suppress values less than 0.1. This latter option makes the factor matrix more easy to read. Varimax searches for orthogonal dimensions and the other options may come in handy. Run the analysis by clicking OK.

SPSS is sometimes called a generator of bulk output. The important results are those in the Rotated Component Matrix. The values in this matrix are called factor loadings. They are the correlation coefficients with the latent dimension indicated by the factor. The factor designation has to be done by the analyst.

Rotated Component Matrix(a)

	Component
	1	2	3	4	5	6	7	8	9	10
R&dManage	.908		-.107	.142					-.107
IntJTechnolManage	.753			.187			.128	-.172	-.168
TechnolAnalStrateg	.673			-.137		-.152	-.340		.170
ResPolicy	.669				.551				.110
Scientometrics		.937
ResEvaluat		.931		-.103
SocStudSci	-.139	.302	-.160	-.139	-.107	-.141			.281
AnnRegionalSci	-.182		.853		-.119	-.205
RegStud			.812			.292
EurPlanStud	.140		.534	-.168	.148	.391			.156
CalifManageRev	.186	-.157	-.117	.826		-.102			-.105
StrategicManageJ				.758	.248	-.126
JBusVenturing	.161	-.116	-.115	.480	-.130				.458
IndCorpChange	.195			.117	.790			-.127
IntJIndOrgan					.653	-.269	.138	.191	.134
CambridgeJEcon	-.174			-.154	.460	.438	-.216	-.285	-.180	.106
EntrepRegionDev					-.174	.762		.166	.142
EnvironPlannC			.288	-.125		.396		-.178
TechnolForecastSoc				-.101	-.128	-.118	-.819
ServIndJ								.864	-.110
JProdInnovatManag	.101		-.137		-.167	-.105		.128	-.760
ZSoziol		-.131	-.115	-.218		-.108	.164	-.137		-.854
HighEduc		-.178	-.215	-.400	-.170	-.244	.437	-.239	.134	.534

Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.

a Rotation converged in 10 iterations.

The rotated component matrix tells us that SPSS (using default values) has extracted 10 indepenent dimensions from this data. Research Policy loads on the first dimension with 0.669, but it is not the most outspoken representative of this dimension. That is R&D Management. Is this a meaningful grouping? How would you designate it and how would you designate the third factor? Note that a journal like Industrial and Corporate Change has a low factor loading on the first dimension, but loads highest on a fifth component. Technological Forecasting and Social Change has a deviant pattern of being cited from the other journals.

The choice of the number of dimensions is by default of SPSS equal to the number of eigenvectors with a value higher than unity because with a value of one the explanatory power of a factor would be equal to that of an average variable. Thus, a factor with an eigenvalue lower than one would explain less than an average variable and this would counteract upon the objective of “data reduction”. If you go back to the output panel, you can inspect this under the tab “Total Variance Explained”. The scree plot informs us about the size of this distribution. Inspection of the scree plot teaches us that the scree begins after five eigenvectors. Let’s run the analysis again with five dimensions. You can choose the number of factors to be extracted under “Extraction” in the factor menu. The result should be as follows:

Rotated Component Matrix(a)

	Component
	1	2	3	4	5
R&dManage	.878	-.106		.214
TechnolAnalStrateg	.721	-.124		-.178
IntJTechnolManage	.707			.295
ResPolicy	.692				.529
RegStud		.833			.104
EurPlanStud	.152	.683			.149
AnnRegionalSci	-.214	.568
EnvironPlannC		.482
EntrepRegionDev		.473			-.257
Scientometrics			.931
ResEvaluat			.925
SocStudSci	-.134	-.196	.346	-.182
CalifManageRev	.130	-.234	-.196	.802
StrategicManageJ		-.222	-.123	.727	.249
JBusVenturing				.471
TechnolForecastSoc	.187	-.148	-.173	-.312	-.155
ZSoziol	-.177	-.187	-.150	-.283
HighEduc	-.124	-.196		-.253
IndCorpChange	.237	.101	-.107	.157	.764
IntJIndOrgan		-.232			.661
CambridgeJEcon		.331	-.144	-.153	.387
JProdInnovatManag	.130	-.201	-.198		-.324
ServIndJ			-.129		-.165

Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.

a Rotation converged in 7 iterations.

Let us go to the citing dimension by transposing the matrix. Data > Transpose. Use the variable “caselab” as the naming variable and use all the other variables for the transposition. Run the factor analysis again. How many dimensions would you choose in this case? Click on the component plot of factors 1, 2 and 3 and try to embellish it. Paste the resulting picture into your mid-term exam.

SPSS has a wealth of analyses under all the tabs. We just chose factor analysis because it is a traditional way to analyze data and you may find it most frequently in social science articles. Another possibility is multi-dimensional scaling which you find under Analyze > Scale > Multidimensional scaling (PROXSCAL). It provides you with maps. Try to make one. How are the maps different from the visualizations produced by Pajek?

Other resources

As already obvious from using the elite data for the Netherlands above, many resources—both data and programs—are available at the Internet. We already noted the patent data in lesson zero. I wish to mention a few others. Richard Rogers in the humanities faculty of our university organizes a web crawler. His platform is different from ours, but you may find this data interesting for Internet research. You find the information at http://www.govcom.org/scenarios_use.html . One can auto-request an account from the site.

More advanced network tools can be found at https://nwb.slis.indiana.edu/community/?n=Main.NWBTool . These programs are academic. Others may be commercial, but it is increasingly good practice to provide a trial period of 30 days. In the domain of social network analysis, UCINet is an important alternative to Pajek, but sometimes results are a bit different. Another commercial alternative is NetMiner which includes a few more algorithms at the moment. Alternatives for SPSS are SAS or Strata. A few licenses for Strata are available at ASCoR. Combining network analysis with qualitative analysis, Atlas.ti is increasingly mentioned as currently the best bet. I have never worked with it.

Loet Leydesdorff

24 april 2007

return

Questions for the mid-term exam (2007)