return

 “Betweenness Centrality” as an Indicator of the

“Interdisciplinarity” of Scientific Journals

Journal of the American Society for Information Science and Technology (forthcoming)

<pdf-version>

 

Loet Leydesdorff

Amsterdam School of Communications Research (ASCoR), University of Amsterdam

Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands

loet@leydesdorff.net; http://www.leydesdorff.net

 

Abstract

In addition to science citation indicators of journals like impact and immediacy, social network analysis provides a set of centrality measures like degree, betweenness, and closeness centrality. These measures are first analyzed for the entire set of 7,379 journals included in the Journal Citation Reports of the Science Citation Index and the Social Sciences Citation Index 2004, and then also in relation to local citation environments which can be considered as proxies of specialties and disciplines. Betweenness centrality is shown to be an indicator of the interdisciplinarity of journals, but only in local citation environments and after normalization because otherwise the influence of degree centrality (size) overshadows the betweenness-centrality measure. The indicator is applied to a variety of citation environments, including policy-relevant ones like biotechnology and nanotechnology. The values of the indicator remain sensitive to the delineations of the set because of the indicator’s local character. Maps showing interdisciplinarity of journals in terms of betweenness centrality can be drawn using information about journal citation environments which is available online.

 

Keywords: centrality, betweenness, interdisciplinarity, journal, citation, indicator

 

1. Introduction

 

Ever since Garfield (1972; Garfield & Sher, 1963) proposed impact factors as indicators for the quality of journals in evaluation practices, this measure has been heavily debated. Impact factors were designed with the purpose of making evaluation possible (e.g., Linton, 2006). Other indicators (e.g., Price’s [1970] immediacy index) were also incorporated into the Journal Citation Reports of the Science Citation Index, but were coupled less directly to library policies and science policy evaluations (Moed, 2005; Monastersky, 2005; Bensman, forthcoming).

 

Soon after the introduction of the Science Citation Index, it became clear that publication and citation practices are field-dependent (Price, 1970; Carpenter & Narin, 1973; Gilbert, 1977; Narin, 1976). Hirst (1978), therefore, suggested constructing discipline-specific impact factors, but their operationalization in terms of discipline-specific journal sets has remained a problem. Should such sets be defined with reference to the groups of researchers under evaluation (Moed et al., 1985) or rather in terms of the aggregated citation patterns among journals (Pinski & Narin, 1976; Garfield, 1998)? How can one disentangle the notion of hierarchy among journals and the juxtaposition of groups of journals in the various disciplines (Leydesdorff, 2006)?

 

Furthermore, as Price (1965) noted, different types of journal publications within similar fields can be expected to vary also in terms of their citation patterns. Within each field, some journals follow developments at the research front (e.g., in the form of letters), while other journals (e.g., review journals) have a longer-term scope. Thirdly, journals differ in terms of their “interdisciplinarity,” with Nature and Science as the prime examples (Narin et al., 1972), while others include sections of both general interest and disciplinary affiliations (e.g., PNAS and the Lancet). In addition to the “multidisciplinarity” or “interdisciplinarity” of journals at a general level, “interdisciplinarity” can also occur at the very specialized interface between established fields of science, as in the case of biotechnology and nanotechnology.

 

Three indicators of journals were codified in the ISI databases: impact factors, immediacy indices, and the so-called subject categories. These indicators are based on the Journal Citation Reports, which offer aggregated citation data among journals. However, the subject categorization of the ISI has remained the least objective among these indicators because the indicator is not citation-based. The ISI-staff assigns journals to subjects on the basis of a number of criteria, among which are the journal’s title, its citation patterns, etc. (McVeigh, personal communication, 9 March 2006).

 

An unambiguous categorization of the journal set in terms of subject matters seems impossible because of the fuzziness of the subsets (Bensman, 2001). In addition to intellectual categories, journals belong to nations, publishing houses, and often to more than a single discipline (Leydesdorff & Bensman, 2006). The potential “interdisciplinarity” of journals makes it difficult to compare journals as units of analysis within a specific reference group of “disciplinary” journals.

 

“Interdisciplinarity” is often a policy objective, while new developments may take place at the borders of disciplines (Caswill, 2006; Zitt, 2005). New developments may lead to new journal sets or be accommodated within existing ones (Leydesdorff et al., 1994). For example, recent developments in nanotechnology have evolved at interfaces among applied physics, chemistry, and the material sciences. The delineation of a journal set in nanotechnology is therefore not a sine cure, while in the meantime a much more discrete set of journals in biotechnology has evolved. Existing classifications may have to be revised and innovated from the perspective of hindsight (Leydesdorff, 2002). The U.S. Patent and Trade Office, for example, has launched a project to reclassify its existing database using “nanotechnology” as a new category at the level of individual patents.

 

Reclassification at the level of individual articles would mean changing the (controlled) keywords with hindsight (Lewison & Cunningham, 1988). However, this is unnecessary since scientific articles are organized into journals by a strong selection process of submission and peer review. The recursive selection processes lead to very strong structures and correspondingly skewed distributions. Garfield (1972, at p. 476) argued that a multidisciplinary core for all of science comprises no more than thousand journals.

 

The citation structures among journals are updated each year because of changes in citation practices. However, in the case of “interdisciplinary” developments the classification may be more ambiguous because different traditions and standards are interfaced. Cross-links (e.g., citations) provide inroads for change in an otherwise (nearly) decomposable system (Simon, 1973).

 

The development of a measure of interdisciplinarity at the level of journals derived from this destabilizing effect on citation structures could be extremely useful as an early-warning indicator of new developments. In a previous attempt to develop such indicators, Leydesdorff et al. (1994) were able to show that new developments can be traced in terms of deviant being-cited patterns in various groups of neighboring journals. However, the opposite effect, namely that this deviant pattern also indicates new developments, could not be shown (Leydesdorff, 1994; Van den Besselaar & Heimeriks, 2001). Cross-links may have other functions as well. Like most research in the bibliometric field, these analyses of interdisciplinarity were based on the assumption that journals can be grouped either using the ISI subject categories (e.g., Leeuwen & Tijssen, 2000; Morillo et al., 2003) or on the basis of clustering citation matrices (Doreian & Farraro, 1985; Leydesdorff, 1986; Tijssen et al., 1987).

 

Before one can delineate groups of journals in “interdisciplinary” fields, one would need an indicator of “interdisciplinarity” at the level of individual journals. To what extent do articles in a specific journal feed into or draw upon different intellectual traditions? The focus on the position of individual agents in networks—in this case journals—has been developed in social network analysis more than in scientometrics (Otte & Rousseau, 2002).

 

2. Centrality Measures in Social Network Analysis

 

Social network analysis has developed as a specialty in parallel with scientometrics since the late 1970s. In a ground-laying piece, Freeman (1977) developed a set of measures of centrality based on betweenness. Freeman stated that “betweenness” as a structural property of communication was elaborated in the literature as the first measure of centrality (Bavelas, 1948; Schimbel, 1953). In a follow-up paper, Freeman (1978) gradually elaborated four concepts of centrality in a social network, which have since been further developed (Hanneman & Riddle, 2005; De Nooy et al., 2005):

 

  1. centrality in terms of “degrees:” in- and outgoing information flows from each node as a center;
  2. centrality in terms of “closeness,” that is, the distance of an actor from all other actors in a network. This measure operationalizes the expected reach of a communication;
  3. centrality in terms of “betweenness,” that is, the extent that the actor is positioned on the shortest path (“geodesic”) between other pairs of actors in the network; and
  4. centrality in terms of the projection on the first “eigenvector” of the matrix.

 

These measures and their further elaboration into relevant statistics were conveniently combined in the software package UCINet that Freeman and his collaborators have developed since the 1980s (Bonacich, 1987; Borgatti et al., 2002; Otte & Rousseau, 2002). A number of visualization programs for networks like Pajek and Mage interface with UCINet. The visualization and the statistics have become increasingly integrated.

 

Centrality in terms of degree is easiest to grasp because it is the number of relations a given node maintains. Degree can further be differentiated in terms of “indegree” and “outdegree,” that is, incoming or outgoing relations. In the case of a citation matrix, the total number of references provided by a textual unit of analysis (e.g., an article or a journal) can then be considered as its outdegree, and instances of its being cited as the indegree. Degree centrality is often normalized as a percentage of the degrees in a network.

 

“Betweenness” is a measure of how often a node (vertex) is located on the shortest path (geodesic) between other nodes in the network. It thus measures the degree to which the node under study can function as a point of control in the communication. If a node with a high level of betweenness were to be deleted from a network, the network would fall apart into otherwise coherent clusters. Unlike degree, which is a count, betweenness is normalized by definition as the proportion of all geodesics that include the vertex under study. If gij is defined as the number of geodesic paths between i and j, and gikj is the number of these geodesics that pass through k, k’s betweenness centrality is defined as (Farrall, 2005):

 i ≠ j ≠ k

 

“Closeness centrality” is also defined as a proportion. First, the distance of a vertex from all other vertices in the network is counted. Normalization is achieved by defining closeness centrality as the number of other vertices divided by this sum (De Nooy et al., 2005, p. 127). Because of this normalization, closeness centrality provides a global measure about the position of a vertex in the network, while betweenness centrality is defined with reference to the local position of a vertex.

 

Eigenvector analysis brings us back to approaches that are familiar from multivariate analysis. Principal component and factor analysis decompose a matrix in terms of the latent eigenvectors which determine the positions of nodes in a network, while graph analysis begins with the vectors of observable relations among nodes (Burt, 1982). How can these be grouped bottom-up using algorithms? For example, core-periphery relations can be made visible using graph-analytical techniques, but not by using factor-analytical ones (Wagner & Leydesdorff, 2005).

 

Betweenness is a relational measure. One can expect that a journal which is “between” will load on different factors because it does not belong to one of the dense groups, but relates them. The factor loadings of such journals may depend heavily on the factor-analytic model (e.g., the number of factors to be extracted by the analyst). For example, one might expect inter-factorial complexity among the factor loadings in the case of inter- or multidisciplinary journals (Van den Besselaar & Heimeriks, 2001; Leydesdorff, 2004). Closeness is less dependent on relations between individual vertices because a vertex can be close to two (or more) densily connected clusters. Closeness can thus be expected to provide us with a measure of “multidisciplinarity” within a set while betweenness may provide us with a measure of specific “interdisciplinarity” at interfaces.

 

3. Size, impact, and centrality

 

While the impact factor and the immediacy index are corrected for size (because the number of publications in the previous two years and the current year, respectively, is used in the denominator; cf. Bensman, forthcoming), centrality measures are sensitive to size. A further complication, therefore, is the possibility of spurious correlations between different centrality measures. Large journals (e.g., Nature) which one would expect to be “multidisciplinary” rather than “interdisciplinary,” might generate a high betweenness centrality because of their high degree centrality.

 

Normalization of the matrix for the size of patterns of citation can suppress this effect (Bonacich, personal communication, 22 May 2006). Fortunately, there is increasing consensus that normalization in terms of the cosine and using the vector-space model provides the best option in the case of sparse citation matrices (Ahlgren et al., 2003; Chen, 2006; Salton & McGill, 1983). Using the cosine for the visualization, a threshold has to be set because the cosine between citation patterns of locally related journals will almost never be equal to zero. However, the algorithms for computing centrality first dichotomize this matrix.



Figure 1
: Betweenness centrality of 54 journals in the vector space of the citation impact environment of Social Networks (cosine ≥  0.2).

 

Actually, when I was working with visualizations of cosine-based journal maps (Leydesdorff, forthcoming-a, forthcoming-b), it occurred to me that the interdisciplinarity of journals corresponds with their visible position in the vector space. Figure 1, for example, shows the citation impact environment of Social Networks as an example. Among the 54 journals citing Social Networks more than once in 2004,[1] this journal is on the shortest path between vertices in 15% of the possible cases, followed by the Journal of Mathematical Sociology with a value of 11% on betweenness centrality. The other journals have considerably lower values. The visual pattern of connecting different subgroups also follows the intuitive expectation of “interdisciplinarity” among these journals.


Figure 2: Betweenness centrality of Social Network in its citation environment before normalization with the cosine.

 

Figure 2 contrasts this finding with the betweenness centrality in the unnormalized networks. Social Networks is still the journal with the largest betweenness value (0.07), but the Journal of Mathematical Sociology now has a score of 0.01. This is even lower than the corresponding value for the American Sociological Review (0.03). The latter is a much larger journal with a distinct disciplinary affiliation (that is, sociology). In sum, the visualization using unnormalized citation data can be expected to show neither the cluster structure in the data nor betweenness centrality among groups of nodes. One needs a normalization in terms of similarity patterns (using a similarity coefficient like the Pearson correlation or the cosine) to observe the latent structures in this data.

 

The research question of this paper is to address the phenomenon of betweenness centrality in the vector space systematically. I will first study the different centrality measures in the non-normalized matrix, then in the cosine-normalized one, and finally in a few applications, including some with obvious policy relevance (nanotechnology and biotechnology).

 

4. Methods and Materials

 

The data was harvested from CD-Rom versions of the Journal Citation Reports of the Science Citation Index and the Social Sciences Citation Index 2004. These two databases cover 5,968 and 1,712 journals, respectively. Since 301 journals are covered by both databases, a citation matrix can be constructed among (5,968 + 1,712 – 301) = 7,379 journals. Seven journals are not processed by the ISI in the “citing” dimension, but we shall focus below on the “cited” dimension of this matrix. This focus enables us to compare the centrality measures directly with well-established science citation indicators like impact factors, immediacy, etc.

 

Among the 7,379 vectors of the matrix representing the cited “patterns,” similarities were calculated using the cosine. Salton’s cosine is defined as the cosine of the angle enclosed between two vectors x and y as follows (Salton & McGill, 1983):

 

Cosine(x,y) =

 

The cosine is very similar to the Pearson correlation coefficient, except that the latter measure normalizes the values of the variables with reference to the arithmetic mean (Jones & Furnas, 1987). The cosine normalizes with reference to the geometrical mean. Unlike the Pearson correlation coefficient, the cosine is non-metric and does not presume normality of the distribution (Ahlgren et al., 2003). An additional advantage of this measure is its further elaboration into the so-called vector-space model for the visualization (Chen, 2006).

 

Note that the two matrices—that is, the matrix of citation data and the matrix of cosine values—are very different: the cosine matrix is a symmetrical matrix with unity on the main diagonal, while citation matrices are asymmetrical transaction matrices with usually outliers (within-journal “self”-citations) on the main diagonal (Price, 1981). The topography of the vector space spanned by the cosine values is accordingly different from the topography of the multi-dimensional space spanned by the vectors of citation values.

 

Subsets can be extracted from the database in order to measure the relations among journals that are citing a specific journal. I shall call these subsets the local citation impact environments of the journal under study. Betweenness centrality and other centrality measures will be different within these local citation environments from their values in the global set because each two journals within a local set can also be related through the mediation of journals outside the subset.

 

For the computation of centrality measures I use exclusively the methods available within the Pajek environment. This allows for a one-to-one correspondence between the visualizations and the algorithmic results. (The normalizations are sometimes slightly different between UCINet and Pajek.) Although UCINet is faster and richer in providing various computational options, Pajek is currently able to analyze centrality in asymmetrical matrices in both directions. Given our interest in asymmetrical citation matrices, this can be an advantage. The analysis focuses on degree centrality, betweenness centrality, and closeness centrality because eigenvector analysis is used in Pajek only as a means for the visualization. When displaying the citation impact environments (Leydesdorff, forthcoming-a and forthcoming-b), I shall use the vertical size for the relative citation contributions of journals in a specific environment, and the horizontal size for the same measure, but after correction for within-journal citations.

 

5. Centrality at the level of the Journal Citation Reports

 

5.1 The asymmetrical citation matrix

 

The asymmetrical citation matrix contains two structures, one in the “cited” and another in the “citing” dimension of the matrix. Pajek provides options to compute the three centrality measures (degree, betweenness, and closeness) in both directions. Thus, six indicators can be measured across the file. The values on these six indicators can be compared with more traditional science citation indicators like “impact,” “immediacy,” and “total citations.” (The values of the six [two times three] centrality measures for the 7,379 journals are available online at http://www.leydesdorff.net/jcr04/centrality/index.htm .)

 

           Rotated Component Matrix(a)

 

 

Component

 

1

2

3

Number of issues

.924

 

.185

Total number of references (citing)

.909

.210

.237

Within journal “self”-citations

.815

.152

 

Betweenness (citing)

.740

 

.103

Total number of citations (cited)

.672

.639

 

Immediacy

 

.806

.267

Impact

 

.802

.295

Indegree (cited)

.405

.713

.381

Betweenness (cited)

.261

.691

-.240

Closeness (cited)

 

 

.776

Closeness (citing)

.190

.413

.663

Outdegree (citing)

.498

.356

.633

Extraction Method: Principal Component Analysis.  Rotation Method: Varimax with Kaiser Normalization.

a  Rotation converged in 5 iterations.

 

Table 1: Three-factor solution of the matrix of 7,379 journals versus six centrality measures and a number of science (citation) indicators.

 

Table 1 shows the rotated three-factor solution for the matrix of 7,379 journals versus the various science indicators and centrality measures as variables. Three factors explain 73.5% of the variance. Factor One (46.9%) can be designated as indicating the size of journals, Factor Two (16.4%) registers the effects of citations (“impact,” etc.), and Factor Three (10.3%) seems to indicate the reach of a communication through citation. The strong relation between immediacy and impact has previously been noted by Yue et al. (2004). The further elaboration of the relation between centrality measures and science citation indicators would lead me beyond the scope of this study.

 

In Table 1, the three indicators on which we will now focus our attention are shown in boldface. First, one can note the difference in sign for “betweenness centrality” and “closeness centrality” on the third factor, but as expected, this negative correlation is overshadowed by the commonality between “betweenness centrality” and “indegree” on the first two factors.

 

                                                                Correlations

 

 

 

 

 

 

 

 

 

Indegree

Betweenness cited

Closeness cited

Indegree

Pearson Correlation

1

.509(**)

.651(**)

Sig. (2-tailed)

 

.000

.000

N

7379

7379

7379

Betweenness cited

Pearson Correlation

.509(**)

1

.210(**)

Sig. (2-tailed)

.000

 

.000

N

7379

7379

7379

Closeness cited

Pearson Correlation

.651(**)

.210(**)

1

Sig. (2-tailed)

.000

.000

 

N

7379

7379

7379

**  Correlation is significant at the 0.01 level (2-tailed).

 

Table 2: Correlations among the centrality measures in the cited dimension (N = 7,379).

 

Table 2 provides the correlation coefficients among the three centrality measures. Because of the large N (= 7,379) all correlations are significant. However, the correlation between closeness and betweenness is considerably lower (r = 0.21; p < 0.01) than the other correlations (r > 0.5; p < 0.01).

 

 

Indegree

 

Between­ness

 

Closeness

Science

4904

Science

0.098921

Science

0.538172

Nature

4555

Nature

0.067541

Nature

0.522138

P Natl Acad Sci USA

3776

P Natl Acad Sci USA

0.039714

P Natl Acad Sci USA

0.490666

Lancet

2834

Lancet

0.013324

Lancet

0.456274

New Engl J Med

2780

JAMA-J Am Med Assoc

0.011943

New Engl J Med

0.453366

J Biol Chem

2674

New Engl J Med

0.011665

JAMA-J Am Med Assoc

0.442401

JAMA-J Am Med Assoc

2510

Brit Med J

0.009516

Ann NY Acad Sci

0.441714

Ann NY Acad Sci

2375

J Am Stat Assoc

0.009486

J Biol Chem

0.440729

Brit Med J

2228

Ann NY Acad Sci

0.008139

Brit Med J

0.433717

Biochem Bioph Res Co

2075

J Biol Chem

0.007159

Biochem Bioph Res Co

0.420714


Table 3
: Top-10 journals on three network indicators of centrality in the being-cited direction.

 

Table 3 shows the ten journals with highest values on these three indicators. The set for the “indegree” overlaps completely with “closeness,” and these two sets differ only by a single journal from the list for “betweenness:” the Journal of the American Statistical Association is included in the latter set, while Biochemical and Biophysical Research Communications is not included in this list. In other words, the three measures may indicate different dimensions, but they do not discriminate sufficiently among one another to provide us with a measure of “interdisciplinarity” or “multidisciplinarity” at the level of the file.

 

5.2 The centrality measures in the vector space

 

Let us turn now to the vector space of these 7,379 vectors, while continuing to focus on the cited dimension. Closeness centrality cannot be computed in the vector space since the network is not fully connected. Betweenness centrality and degree correlate at r = 0.69 (p < 0.01). Table 4 provides the top ten journals on these two indicators.

 

 

Degree

 

Betweenness

Science

0.979534

Science

0.2860

Nature

0.958254

Nature

0.2106

Sci Am

0.950935

Sci Am

0.1946

J Am Stat Assoc

0.942667

J Am Stat Assoc

0.1785

Ann NY Acad Sci

0.935484

Brit Med J

0.1471

P Natl Acad Sci USA

0.928707

Lancet

0.1469

Lancet

0.925047

Ann NY Acad Sci

0.1409

Biometrika

0.921523

Am Econ Rev

0.1366

New Engl J Med

0.910952

P Natl Acad Sci USA

0.1363

JAMA-J Am Med Assoc

0.898075

Biometrika

0.1350

 

Table 4: Top-10 journals in the vector space (being-cited direction).

 

Seven of the ten journals occur on both lists, and the order of the top four is the same. There are important differences from the top-10 lists provided in table 3. However, it is no longer clear what we are measuring. Both measures correlate, for example, at the level of r = 0.47 (p < 0.01) with the impact factor, but in themselves they don’t have a clear interpretation other than the fact that Science and Nature have the highest centrality at the global level, no matter how one measures the indicator.

 

6. The local citation impact environments

 

6.1.      Social Networks as an example

 

Let us return to our example of the journal Social Networks for a more precise understanding of what centrality measures may mean in local citation environments. Social Networks is included in the Social Sciences Citation Index, but it relates also to journals which are included in the Science Citation Index. In the combined set, Social Networks is cited by 54 journals (as against 40 in the Social Sciences Citation Index). Figure 3 provides the visualization of these journals with the cosine as the similarity measure. The vertical and horizontal axes of the vertices are proportional to the citation impact in this environment with and without within-journal citations, respectively.

 

Eleven journals are grouped in the bottom right corner because they are isolates in this context. Social Networks, and to a lesser extent the Journal of Mathematical Sociology, are central in relating major clusters such as two groups of social-science journals (sociology and management science), a physics group, and a group of computer-science journals and statistics. However, the contribution of the two centrally positioned journals to the citation impact in this network is extremely small: only 0.41 % for Social Networks and 1.07% for the Journal of Mathematical Sociology.

 


Figure 3: Citation impacts of fifty-four journals which cited Social Networks more than once in 2004 (N = 7,379; cosine ≥ 0.2).

 

Visual inspection of Figure 3 suggests that these two journals (Social Networks and the Journal of Mathematical Sociology) are central in relating the various clusters. Using betweenness as a measure, Pajek enables us to draw the vectors for the various measures of centrality and to display the vertices in terms of the values of these vectors. In Figure 1 above, “betweenness centrality” was thus used as the indicator in this same environment.

 

                                                 Correlations

 

 

Degree

Between-ness

Closeness

Local impact

Degree

Pearson Correlation

1

.724(**)

.877(**)

-.009

 

Sig. (2-tailed)

 

.000

.000

.949

 

N

54

54

54

54

Betweenness

Pearson Correlation

.724(**)

1

.542(**)

-.035

 

Sig. (2-tailed)

.000

 

.000

.801

 

N

54

54

54

54

Closeness

Pearson Correlation

.877(**)

.542(**)

1

-.001

 

Sig. (2-tailed)

.000

.000

 

.991

 

N

54

54

54

54

Local impact

Pearson Correlation

-.009