Data Set Citation:
When using this data, please cite the data package:
National Center for Ecological Analysis and Synthesis and Strasser C. 2013.
The fractured lab notebook: undergraduate and ecological data management training in the United States
knb.300.9 (
General Information:
Title:The fractured lab notebook: undergraduate and ecological data management training in the United States
Data presented here are those collected from a survey of Ecology professors at 48 undergraduate institutions to assess the current state of data management education. The following files have been uploaded: Scripts(2): 1. DataCleaning_20120105.R is an R script for cleaning up data prior to analysis. This script removes spaces, substitutes text for codes, removed duplicate schools, and converts questions and answers from the survey into more simple parameter names, without any numbers, spaces, or symbols. This script is heavily annotated to assist the user of the file in understanding what is being done to the data files. The script produces the file cleandata_[date].Rdata, which is called in the file DataTrimming_20120105.R 2. DataTrimming_20120105.R is an R script for trimming extraneous variables not used in final analyses. Some variables are combined as needed and NAs (no answers) are removed. The file is heavily annotated. It produces trimdata_[date].Rdata, which was imported into Excel for summary statistics. Data files (3) 3. AdvancedSpreadsheet_20110526.csv is the output file from the SurveyMonkey online survey tool used for this project. It is a .csv sheet with the complete set of survey data, although some data (e.g., open-ended responses, institution names) are removed to prevent schools and/or instructors from being identifiable. This file is read into DataCleaning_20120105.R for cleaning and editing. 4. VariableRenaming_20110711.csv is called into the DataCleaning_20120105.R script to convert the questions and answers from the survey into simple parameter names, without any numbers, spaces, or symbols. 5. ParamTable.csv is a list of the parameter names used for analysis and the value codes. It can be used to understand outputs from the scripts above (cleandata_[date].Rdata and trimdata_[date].Rdata).
  • data
  • data management
  • ecology
  • education
  • environmental sciences
  • undergraduate
Publication Date:2013-01-22
Data Table, Image, and Other Data Details:
Metadata download: Ecological Metadata Language (EML) File
Data Table:Data Cleaning R script ( View Metadata | Download File download)
Data Table:Data Trimming R script ( View Metadata | Download File download)
Data Table:AdvancedSpreadsheet_20110526.csv ( View Metadata | Download File download)
Data Table:VariableRenaming_20110711.csv ( View Metadata | Download File download)
Data Table:ParamTable.csv ( View Metadata | Download File download)

Involved Parties

Data Set Creators:
Organization:National Center for Ecological Analysis and Synthesis
Individual: Carly Strasser
Email Address:
Data Set Contacts:
Individual: Carly Strasser
Email Address:
Associated Parties:
Individual: Carly Strasser
Individual: Stephanie Hampton
Metadata Providers:
Individual: Xueying Han

Data Set Characteristics

Geographic Region:
Geographic Description:Institutions across the United States
Bounding Coordinates:
West:  -97.0000  degrees
East:  -97.0000  degrees
North:  38.0000  degrees
South:  38.0000  degrees
Time Period:

Sampling, Processing and Quality Control Methods

Step by Step Procedures
Step 1:  


Universities and colleges most likely contribute to the teaching of future graduate students in ecology were selected by using the following 3 methods. First, the US News and World Report's "Best Graduate Schools" website was used to identify the top ten Ecology and Evolutionary Biology graduate schools in the United States in 2010. Second, the website was used to collate a list of graduate schools in ecology with high National Research Council (NRC) Quality Measures. The NRC Quality Measure was set to priority five out of a possible five with all other priorities set to zero (not considered). Third, the same method as method two was used, except that the priority was set to five for Research Quality of the institution. And lastly, we obtained a list of the NSF Graduate Research Fellowship recipients for 2006 to 2010 for the life sciences. We removed the following areas of study: biochemistry, biophysics, cell biology, computational biology, developmental biology, genetics, immunology, molecular biology, neurosciences, and nutrition. The number of awards per institution were tallied and the Carnegie Classification system was used to determine whether schools were Research Universities (RU) or Baccalaureate/Arts and Sciences (BAS) institutions. BAS institutions with more than 4 awards were used in the survey. Based on these methods, a list of 51 target institutions were generated for the survey. Each institution's website was extensively searched to determine which course(s) were applicable for the survey. Fit was determined subjectively based on the course description, course requirements for ecology-related majors, and the academic department(s) housing the course. After the most appropriate course was identified, the department was contacted to verify that the course selected was appropriate for the survey. The person contacted was the department chair, the undergraduate course advisor, or a professor within the department. The survey consisted of 33 questions that fell into four categories: (1) basic course characteristics, such as class size, laboratory components, reading materials, and prerequisites; (2) the extent to which data management is covered and use of data in the course; (3) instructor opinion about the importance of data management education for undergraduates and perceived barriers to teaching topics related to data management; and (4) instructor characteristics, including year of PhD, percentage of time teaching versus conducting research, and data sharing practices. The survey was conducted online using Survey Monkey. Emails to instructors were sent out 29 March 2011 and the survey was closed 25 May 2011.

Data Set Usage Rights

no restrictions
Access Control:
Auth System:knb
Allow: [all] uid=knbadmin,o=NCEAS,dc=ecoinformatics,dc=org
Allow: [all] cn=knb-prod,o=NCEAS,dc=ecoinformatics,dc=org
Allow: [all] cn=esa-moderators,dc=ecoinformatics,dc=org
Allow: [read] [write] uid=han,o=NCEAS,dc=ecoinformatics,dc=org
Allow: [read] public
Additional Metadata
Metadata download: Ecological Metadata Language (EML) File