In Figs. 1–4 below we use the 1999–2004 data to show where papers comprising the clusters are distributed by discipline in our dataset (as a percent of the total). The stability of most disciplines over time is notable, with the exception of an apparent slight trend upward in the biological sciences such as molecular biology, biology, and microbiology (Table 2). One type of cluster we find with the methodology might be considered an artifact of the publishing process. It is the result of a set of papers that artificially cite each other—for example, the case of a ‘‘single issue cluster’’ (Rousseau and Small 2006; Small 2005) formed when an editor creates a special issue of a journal and arranges each article to cite some or all of the other articles in the same issue, creating a citation clique. Normally, cited and citing document populations are somewhat distinct, but in the above case potentially every citing item is also a cited item. Because clusters are defined for a multi-year period, e.g., 1999–2004, it is possible for a highly cited paper to also legitimately function as a citing paper for the group. This would happen, for example, if a paper citing one of the founding papers in the front became itself highly cited before the end of the time period. The degree to which citing papers are also highly cited papers would then, conceptually, measure the extent to which the papers are building on each other. To capture this we create a metric called endogeneity, discussed in detail later, which is the percentage of citing papers that are also cited papers. The average endogeneity for the file is quite low: 2.3. Only about 1.5% of clusters have an endogeneity percentage of 20% or higher. Most of these clusters are almost certainly the artifact of a editorial policy and do not reflect the emergence of a new research area. Therefore, for our quantitative analysis, we have excluded any cluster having an endogeneity percentage of 20% or higher. Naturally occurring levels of moderate endogeneity below 20% may indeed be a healthy sign for a research area, indicating that there are highly cited papers in the current citing paper population and that front is building on its own findings efficiently (Pfeffer and Salancik 1978). To sum up, research fronts consist of clusters of highly cited papers, with some upper bound on cluster size. The papers are linked by strong co-citation relationships at or above a defined normalized clustering threshold unique to each cluster. For each front there is a corresponding set of citing papers. The cited and citing paper sets can overlap and the same authors can appear in both sets.
Growing, shrinking, stable, emerging, and exiting fronts
To explore the trends in nascent research we construct some variables to assist in our analysis. Following on previous work on emerging fronts (Small 2003), we measure the growth rates of fronts from the 1998–2003 dataset to the 1999–2004 dataset and categorize them as growing, stable, or shrinking. Growing fronts are those that have more papers in our 1999–2004 period than the sum of all of their contributing fronts in the 1998–2003 analysis (a ‘‘contributing front’’ means that at least one paper from an earlier front is in a later front). Similarly, shrinking fronts are those that are smaller than the sum of all their contributing fronts in the previous time period, and stable fronts are those for which the sum of all contributing fronts yields the same number of papers. Emerging fronts are fronts in the 1999–2004 dataset that contain no papers from the 1998–2003 dataset. Exiting fronts are fronts that existed in the 1998–2003 analysis but have no papers in any front in the 1999–2004 analysis. Some basic statistics about fronts are in Table 3.
As noted above, the extent to which the cited papers set overlaps with the citing papers set for the front may be an important aspect of its potential growth, provided that the overlap is not the result of an artifact such as a single journal issue. A front that has a high cited-citing overlap is said to have high endogeneity, and reflects the compression of cited and citing generations. Scientists in such a front may have a better chance of building on each other’s work quickly and creating a ‘‘cohesive paradigm’’. As seen in the last row of Table 3, the level of endogeneity is generally higher among emerging fronts than among existing fronts. Additionally, growing fronts clearly display larger average levels of endogeneity.
We construct a variable for cluster multidisciplinarity by creating a Herfendahl index of the distribution of disciplines of the papers comprising the front. We do this by summing the squared percent distribution of each front in each discipline. This marks the extent to which a front is composed of one main discipline or split between many disciplines. The closer a front is to having a multidisciplinary concentration score of 1.0, the closer it is to being composed of one discipline only, and the closer it is to zero, the more it is fragmented between many disciplines.
For all authors we coded whether their affiliation was ‘‘academic’’ or ‘‘non-academic’’i.e., academic vs. government or industry. We found that academic institutions almost always had ‘‘univ,’’ ‘‘school,’’ ‘‘coll,’’ ‘‘insti,’’ or ‘‘ecol’’ in their titles. Some academic institutions were exceptions, so we added a number of more institution-specific filters such as ‘‘Berkeley,’’ ‘‘MIT,’’ ‘‘Harvard,’’ ‘‘polytechnic,’’ ‘‘politecnico,’’ and ‘‘polytechnique,’’ among others. We did not differentiate between government and industry in this variable. We found that the percent of academic (64%) and non-academic (36%) affiliations matched those in our four case studies, which we coded manually.