SYNTHESIZING MULTISOURCE FIELD PLOT DATA FOR STUDIES OF VEGETATION ALLIANCES
Sources and Characteristics of Data Used
The 39,131 vegetation field plots used for this study were collected from (1) the Interior Columbia Basin Ecosystem Management Project, (2) the Department of Energys Hanford Site, (3) the Department of the Armys Yakima Training Center, and the Department of the Armys Orchard Training Area. Field plots from the Interior Columbia Basin Ecosystem Management Project had been previously collated from other sources but had not been standardized. The original sources for and numbers of all field plots used in this work are shown in Table 1, and their locations are shown in Figure 2.
These data sets are heterogeneous in the types of information they contain, their units of measurement, and the precision of the measurements. For example, some data sets provide information on recent disturbances, such as ungulate browsing; others carry information on soil conditions. Some data sets report measurements in the English system, others in metric. Spatial precision of plot locations were reported in increments ranging from within a latitude and longitude minute to within 10 m. Importantly, all the data sets initially collected contain information on plot location (coordinates and projection), year of collection, plot size, plant species composition, elevation, ground slope gradient, and ground slope aspect.
Climate data are from the Daymet model series developed by P.E. Thornton (web resource: www.daymet.org). These data have a 1 km ground resolution, cover the coterminous U.S. and extend over an 18-year period from 1980 to 1997. They are based on daily readings from about 500 weather stations. The types of climate variables available from the Daymet series are shown in Table 2.
The Daymet method interpolates weather station data to a tessellated grid using a truncated Gaussian weighting filter spatial transformation (Thornton et al. 1997). Thornton et al. (1997) report error rates of the data ranging from 0.7 to 1.2 °C for average annual maximum and minimum temperatures, and up to 19% for average annual precipitation. Daily precipitation event frequencies were shown to be accurate for dry days (< 2 cm precipitation) as well as the wettest days (> 10 cm precipitation), while predicted frequencies of precipitation events ranging from 2-8 cm/day were under predicted, and frequencies of events ranging from 8-10 cm/day were over predicted, though these differences were small (Thornton et al. 1997).
Net primary productivity
Annual net primary productivity (ANPP) in g C/m2 for each of the field plot sites was estimated from a Biome-BGC model (Numerical Terradynamic Simulation Group web resource: http://www.forestry.umt.edu/ntsg/; Running and Hunt 1993) of the study area for the year 1988 (Intermountain Fire Science Lab 1996). The model predicts continuous values for ANPP at a 2 km2 ground resolution. An assessment of the accuracy of the predicted ANPP values was not provided, however, the Biome-BCG model performed at a 20% overall confidence interval when tested by White et al. (2000). Running (1994) reports a 0.82 r2 for correlations between field measurements and modeled values of ANPP from study sites that are within the INW.
Morphological traits of plant species
The database of plant species morphological traits was developed from a commercial computer program for identifying plant species (Flora ID Northwest; Barnes 2000). This work was performed under an exclusive agreement with XIDServices, Inc. and Flora ID Northwest, owners of the software and content. No other digital source of detailed morphological traits for the flora of a large region was found. The Barnes (2000) information was developed from descriptions of each species taken from herbaria and regional floras. For example, Table 3 shows the trait classes describing leaf margins. Table 4 shows those traits as attributes of a select set of species. The traits of each plant species in Barnes (2000) are grouped by coniferophyta, poaceae, angiospermae, and pteridophytes.
Data Standardization And Integration
Invalidation of values in any field constituted removal of the entire record. The process of testing the values in a particular data field against certain criteria to determine the validity of a record for this study is termed filtering.
First, all plot coordinates were converted to a Geographic decimal degree coordinate system. Those records having coordinates out of the expected range of the set were removed. For example, no plots occurred in Canada, yet some coordinates show plots occurring there (Figure 2). For some data sets the range of plot distribution was provided in formal metadata, for others the expected range was ascertained from narrative reports about the nature of the data. The spatial precision of plot locations was converted to the maximum possible value in meters. For example, the locations of some plots were reported as accurate to the nearest section, (a referenced mile2 [2.59 km2] in the Township and Range land survey system) where the field plot in question could be anywhere within a given square mile. In absolute linear terms, the plots true location then could be as much as the length of the diagonal of a square mile (2.27 km) from the given coordinates. Other plots are located to the quarter section or the quarter-quarter section (maximum linear distances of 914 and 229 m, respectively). Most plots, however, are reported to be within 10 m of the given coordinate positions, many are within 100 m. Plots with a maximum possible position accuracy > 1 km were removed. A total of 18,276 plot records (65% of those with valid coordinates) had positional accuracies of 1 km or less.
Each records date of collection was filtered for logical consistency with four-digit years between 1940 and 2000. Elevation measurements were converted to meters and filtered for values between 0 and 4,000. Slope aspect values were converted to degrees and filtered for values between 0 and 360. Slope gradient values were converted to degrees and filtered for values between 0 and 90.
Plot sizes were filtered for those equivalent to about 20x20 m. Most plot size values were reported to be 0.1 acre in size, which converts to 404.6 m2. These were assumed to be square, and were considered as 20x20 m plots.
Information from all initial data sets indicated that plant species importance values were recorded as relative canopy cover. Therefore, each plant species canopy cover value was filtered for values between 1 and 100%.
The single most demanding task was standardizing plant names among the different field plot data sources. There are two types of problems that must be dealt with: the level of taxonomic resolution (i.e., family, genus, species, etc.) at which an organism entity is labeled, and synonymies for the names of taxa.
While all the entities in a set of field plots usually conform to a single taxonomic standard, it is common to find entities resolved at various levels, whether genus, species, subspecies, or variety. Some of the causes for multiple levels of taxonomic resolution in a field plot data set are: (a) the observer was unable to determine a finer taxonomic level of some of the organisms observed in the field, commonly resulting in the notation (genus) spp.; (b) a group of species intergrade, have significant morphological variability, or are not well described or understood; and (c) a species is well described with a number of varieties and subspecies that are recognizable and well known, resulting in some entities in the data set being resolved at a subspecific taxonomic level.
The taxa synonymy problem has two sources. First, the concepts for taxonomic entities (i.e., the Fabaceae family, Astragalus genus, or Astragalus filipes species) are developed independently by different researchers and published ad hoc in a variety of different journals. Which taxonomic authority a subsequent worker follows with respect to any entity is a matter of preference. Second, species concepts change through time, and the application of different names to those concepts changes with time as well.
These problems of taxonomic resolution and synonymies cannot be solved separately when integrating multisource field data since, for example, trinomial entities (those resolved with genus, species, and subspecies or variety names) such as Elymus lanceolatus ssp. lanceolatus (Gould) can be synonyms for binomial entities, as Agropyron dasystachyum (Hooker) is for E. lanceolatus lanceolatus (Gould). That is, an entity considered to be a subspecies by one authority may be considered to be a full species by another authority. When standardizing the species lists from several different data sources, it is necessary to address both of these issues at the same time.
The USDA PLANTS (1999) database was used as the source for both identifying synonyms and resolving subspecific entities to the species level. It offers the most comprehensive treatment of plant name synonyms in the United States, is widely used, and is readily available in electronic form. This database, however, does not by itself provide a standardized set of names; it provided a cross-referenced list of names and their synonymies at the species, subspecies, and variety levels. Many of the species concepts in this list have more than one name. A person integrating multisource data may use plant names in the PLANTS list and still have the same taxonomic entity in their data set represented by more than one name. What is more, the PLANTS database itself is continually revised, however, previous versions are not available, emphasizing the need to develop a single standardized species list for the work reported here. To avoid counting the same actual species more than once because of taxonomic ambiguity it is essential that each plant name be checked for synonymies and, if necessary, changed to a standard name.
Each taxonomic entity at the species level or lower in the data set was standardized to a single genus-species name. First, those plant names identified by a genus-species binomial were separated from those identified by a genus-species-subspecies or -variety trinomial (Figure 3, processes 1 and 2). Binomial entities having no synonyms in the plot data were further separated from those that had synonyms. Those binomial entities having synonyms were standardized to a single genus-species name and taxonomic authority listed in USDA PLANTS (1999).
Each trinomial entity in the data set, identified as a subspecies or variety, was separated into those that either were or were not synonymous with a unique binomial found from process 1 (Figure 3). Those that were synonymous with a binomial from process 1 were relabeled accordingly. Those that had no synonym with a binomial from process 1 were relabeled with a unique binomial. Most of these entities were easily translated to the genus-species binomial already contained in that entitys name. All plant names at the species level or lower provided in original data sets were resolved to one of 1,905 binomials.
Entities that are resolved only to genus present a different problem. These entities may or may not be too coarse to be usefully included for analysis. At some amount of either the number of such entities or their relative canopy cover values, or both, their occurrence may diminish the value of the information in a plot record for species-based investigation. At the same time, they do signify the occurrence and abundance of an organism, and this information may be important. To gauge the significance of these entities in the data set, the number of genus-level entities, the average relative cover of each, the standard deviation of their cover values, the total number of plots affected, and the number of plots affected by each entity were calculated.
There were 33 such entities, making up 1.7% of all entities but occurring in almost half of all plots. These entities did not have significant canopy cover values in any single plot. The following threshold rule was followed to filter plots for inadequate taxonomic resolution. Any plot having genus level entities with a combined total cover of > 20% or any plot having > 10% of its entities resolved only to the genus level was removed. No plots met this criteria.
Two types of procedures were applied to the climate data sets. First, mean annual and monthly data of all variables were segmented from the coterminous U.S. grid to cover the study area only. Second, data for each variable at the location of each field plot was extracted into a discrete database. To segment the climate data to the study areas extent each file containing data for a climate theme was imported into ArcGIS GRID software, where the data were clipped to the study area extent and saved as a GRID file. Climate data for each field plot was extracted using geographic information system software by intersecting a point coverage of plot locations with the Daymet grid. The value of each corresponding grid cell was exported to an ASCII file. Each of the ASCII files was then imported into a database and related to field plot records.
Net primary productivity
Annual net primary productivity values were attributed to each field plot by intersecting the Biome-BCG model grid with the field plot coordinates using geographic information system software. Tables of these values for each plot were imported into a database and related to the plot records. The ANPP values, as attributes of the field plot locations, were tested for spatial autocorrelation in a sample of 34 alliance data sets based on Morans statistic (Cliff and Ord 1981, Kaluzeny et al. 1998). Of the 17 (50%) having p values < 0.05, six (35%) had correlation values < 0.5 and 11 (65%) had correlations > 0.5, showing the occurrence of spatial structure in these data. It is unknown, however, whether the spatial structure of these data reflects the actual condition of the landscape or is the consequence of data development.
Morphological traits of plant species
Data for coniferophyta, poaceae, angiospermae, and pteridophytes were each exported from Barnes (2000) to a set of three ASCII tables containing (a) species names and species codes, (b) morphological traits and morphological trait codes, and (c) species codes related to morphological trait codes. These tables were imported to a database where, first, all species names were standardized to the field plot species list.
Identifying and Testing Plots for Membership in Alliances
A yet unexplored problem in applying multisource data to plant community ecology research is how to attribute a field plot as a member of a given association or alliance. An optimal solution would be to add the unassigned plot record or records to a set of typal plot records for a given association or alliance, and implement an ordination of all records. The plot or set of plots may then be classified to a vegetation type by comparing their distances from the ordination center of the typal plots with some threshold distance (which could be a mean or weighted mean distance of the typal plots from their ordination center). Although providing the typal plots for a recognized association or alliance of the USNVC has been identified as a standard requirement in the classification and description of floristic units (Jennings et al. 2003), typal, or classification plots, records are not yet available for vegetation types in the INW.
In the absence of quantitative data with which to classify field plots, the descriptions of each alliance recorded from the study area were used as a basis for classifying field plots as members of an alliance. The sets of plots classified by alliance were then compared with each other by nonmetric multidimensional scaling ordination (Kruskal 1964). First, however, the species names used in the existing USNVC descriptions were standardized to the species list developed for this study.
Attributing Field Plots to Alliances
Field plot records that survived the univariate outlier filtering processes described above were classified to existing alliance concepts described in the International Classification of Ecological Communities (NatureServe 2000) that are listed as occurring in the INW. The parameters taken from the narrative descriptions of each alliance to classify the field plots include dominant species identity and relative canopy cover, associated species and their relative canopy covers, geographic range, elevation range, and in some cases ground slope gradient and aspect.
Descriptive parameters of each alliance were developed as a set of structured query language (SQL) statements. The SQL statements were applied one at a time to the database of field records to extract a subset of records that could initially be attributed to a given alliance. Once identified as an initial member of an alliance, a field plot record was excluded from queries for other alliances. Each set of field plot records was then examined and summary statistics tabulated for both field plots (sample space) and species occurring in those plots (species space). The summaries included averages of species cover values, the skew in species cover value distributions, species richness, and the evenness of plot compositions.
In cases where closely related alliance concepts could not be distinguished in the data, they were treated as a single alliance. Some forest alliance types in the USNVC, for instance, are provisionally described as seasonally flooded but otherwise not distinguished from corresponding upland types. In one such case the Pinus Contorta Seasonally Flooded Forest Alliance is not well distinguished from the Pinus Contorta Forest Alliance, and distinct sets of corresponding field plots were not found. Therefore, in this and similar cases closely related and poorly distinguished USNVC alliances types were considered as one.
One or more of the field plots were attributed to one of 76 alliance types. Of these, the Pseudotsuga Menziesii Woodland Alliance had the most field plots (1,488) and the Pascopyrum Smithii Alliance the least (1). The alliance with the next most field plot records (Abies Grandis Alliance, 824) had almost half the amount of plots as the Pseudotsuga Menziesii Woodland Alliance. It may be that the Pseudotsuga Menziesii Woodland Alliance data set reflects a sampling bias toward place where trees were thinned as a silvicultural practice. The mean number of field plots per alliance was 121.
Of the sets of plot records initially attributed to alliances, 19 had fewer than 10 field plots and these were not considered further. The remaining 57 sets were subjected to a multivariate outlier analysis for species cover values using a frequency distribution of Euclidian distance measures (McCune and Mefford 1999). Field plots within each set of records of more than two standard deviations from the mean distance were removed. Two hundred eleven field plots were identified as outliers and removed from 36 (63%) of the 57 alliance data sets. This did not result in the number of field plots in any of the 57 alliance data sets having less than 10 records, and all 57 alliance data sets were retained.
These data sets were then tested against a null hypothesis of having no more structure than a randomly selected set using a Mantel test (McCune and Mefford 1999), where a cover value matrix of field plots by species was assembled for (a) a given set of plot records that had been attributed to an alliance, and (b) an equal number of plot records drawn at random from all other plot records (Figure 4). The Mantel test evaluates the significance of the correlation between the two matrices (McCune et al 2002). Given the phytosociological nature of these data, and that the work at hand is to produce a relatively large multisource data set for a variety of applications, a p value of 0.1 was used as a threshold of significance for alliance plot records sets. Furthermore, of the data sets having significant p values, those correlated with the random data sets at a r value of > 0.3 were considered too strongly correlated with a random data set and were rejected.
Of the 57 alliance data sets tested, 50 survived the Mantel significance test for having no more structure than an equal number of randomly selected plots. Of these data sets, one, the Abies Lasiocarpa Krummholz Alliance, had a r value of 0.32 and was rejected, leaving 49 data sets of field plots attributed to an existing USNVC alliance concept. These alliances are shown in Table 5 along with the number of plots attributed to them, the number of species found in those plots, average plot evenness, and the skew in the distribution of species canopy cover values.
Ordination of Attributed Plots
To examine the floristic relationship of the plots within and among alliances the plot records were first grouped by alliance into similar types of biome vegetation: (a) forest and woodlands, (b) shrublands, and (c) herbaceous. Then a nonmetric multidimensional scaling ordination (NMDS; Kruskal 1964, McCune et al. 2002) was applied to a samples-by-species matrix of relative canopy cover values for each of the three data sets. The NMDS procedure was configured to use a Sorenson distance measure because of the theoretical advantage of a city-block versus a Euclidian application to data that are not normally distributed (McCune et al. 2002), as is common to plant community field data, and as indicated by the skewed cover value distributions in these data sets (Table 5). The NMDS was configured to produce three dimensions in order to visualize the results in the space defined by three axes. Process runs were based on 100 maximum iterations, a randomly chosen starting place in the matrix, a step length of 0.2, 20 runs with input data, 20 runs with randomized data, and a stability criterion of 0.005 (the mean standard deviation in the stress level over the previous 20 iterations).