About the KNB

The Knowledge Network for Biocomplexity (KNB) is an international repository intended to facilitate ecological and environmental research.

The KNB was launched in 1998 with a grant from the National Science Foundation (NSF), with the purpose of being the long term home for synthesis datasets and research products generated by National Center for Ecological Analysis and Synthesis (NCEAS) working groups. Since then, NCEAS has continued to operate the KNB not only as an archive for NCEAS working group products, but also for the broader ecology and environmental science community. The KNB acceps all environmental or ecological related data and publishes datasets with Digital Object Identifiers for the express purpose of ensuring long-term access to these datasets. We strive to abide by FAIR (findable, accessible, interoperable, resuble) principles of data sharing and preservation.

Hosted by the National Center for Ecological Analysis and Synthesis

NCEAS has fostered a global community of ecologists and multidisciplinary environmental scientists eager to solve some of the toughest environmental questions through collaborative, synthesis research, since 1995. NCEAS believes data transparency and reproducibility are essential to the utility of the environmental sciences, and works to advance scientific culture in the direction of open science. The KNB is a primary public data archive for NCEAS working group research products, and hosts data from NCEAS initiatives such as the State of Alaska's Salmon and People project and the Science for Nature and People Parnership.

Partnered with DataONE

Data Observation Network for Earth (DataONE) provides a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. DataONE is a federation of repositories based on a common infrastructure that allows for federated search and replication between repositories. The KNB is a Tier 4 DataONE member node, meaning that it allows the DataONE infrastructure to use available storage space on the KNB for storing copies of objects that originate on other member nodes in the DataONE network, based on the Node Replication Policy. This helps ensure dataset longevity by providing geographically distinct copies of data on multiple servers.

Powered by rich, detailed metadata.

For scientists, the KNB is an efficient way to share, discover, access and interpret complex ecological data. Due to rich contextual information provided with KNB data, scientists are able to integrate and analyze data with less effort. The data originate from a highly-distributed set of field stations, laboratories, research sites, and individual researchers. The foundation of the KNB is the rich, detailed metadata provided by researchers that collect data, which promotes both automated and manual integration of data into new projects.

Open-source software.

Learn about the KNB Developer Tools and API

As part of the KNB effort, data management software is developed in a free and open source manner, so other groups can build upon the tools. The KNB is powered by the Metacat data management system, and is optimized for handling data sets described using the Ecological Metadata Language, but can store any XML-based metadata document. Learn more about the software behind KNB

Data Submission Guidelines

Getting Started

Submitting your data to the KNB is easy. Upload your data on the KNB website using a simple online form. The KNB supports ORCID logins.

Upload now

If your project involves large numbers of files (hundreds to thousands), you may want to automate the creation of your metadata. This is certainly possible, and has been done in programs like Matlab, R, etc. Send a note to KNB Help, and we can point you to some examples of scripted metadata creation.

If your dataset is large (>50 GB) please write to KNB Help prior to publishing.

Use open source file formats

While the KNB supports the upload of any data file format, sharing data can be greatly enhanced if you use ubiquitous, easy-to-read formats. For instance, while Microsoft Excel files are commonplace, it's better to export these spreadsheets to Comma Separated Values (CSV) text files, which can be read on any computer without having Microsoft products installed.

For image files, use common formats like PNG, JPEG, TIFF, etc. Most all browsers can handle these. If you use specialized software to create your data, try to save you data in well-known formats. For instance, GIS data can be exported to ESRI shapefiles, and data created in Matlab or other matrix-based programs can be exported as NetCDF (an open binary format).

Organize files

In order to optimally document and share a project’s output, quality data file management is necessary. Some resources for best practices for managing data files can be found at DataONE and in Borer et al. 2009. The following are a few guidelines that are encouraged for file organization for projects that plan to submit to the KNB. Following these guidelines should help ensure a project’s outputs are easy to access and understand.

All files should have short, descriptive names.
Only letters, numbers, hyphens (“-“), and underscores (“_”) should be used in file names. Always avoid using spaces and specialized ASCII characters when naming files.
All files should be stored in open, ubiquitous, and easy-to-read file formats (see File Format Guidelines). Tabular data should be submitted in a long (versus wide) format if possible. Long file formats will make documentation of attributes (variables), as well as access to the data, much easier.
For models/scripts, all files necessary to run the code should be included and organized in a manner that makes running the code as accessible as possible. If outside dependencies (software, hardware, or otherwise) are needed to run code which cannot be submitted to the KNB, details of these dependencies should be made clear within the metadata description of the code files as well as within the method’s metadata.

Write high quality metadata

Metadata are ultimately "data about data" - the contextual information needed to interpret a set of raw data observations. They provide meaning to data, and are critical when it comes to sharing, integrating, and analyzing data. Too often people collect data for projects and leave them undocumented for years or decades. These data, while potential of very high value, can become useless over time due to data entropy.

The KNB primarily stores metadata in a structured, XML-based files. Transformation of plain text documentation into a structured metadata format for archiving is done automatically when submitting through the KNB website. Prior to submitting documentation to the KNB, we advise projects to create complete, plain text metadata records. Ideally, plans to create and store metadata records should be made during the initial stages of project development (i.e., within the data management plan of the project proposal).

The goal of metadata is to document a project's output so that a reasonable scientist will be able to understand and use all the components of the output without any outside consultation. The following components represent a non-exhaustive list of components typically expected within metadata records submitted to the KNB:

A descriptive title that includes the topic, geographic location, dates, and, if applicable, the scale of the data.
A descriptive data package abstract that provides a brief overview summarizing the specific contents and purpose of the data package.
Funding information, if applicable.
A list of all people or organizations associated with the data package with at least one person or organization acting as a creator and one acting as a contact (these can be the same).
Full records of field and laboratory sampling times and locations, including a geographic description interpretable by a general scientific audience.
Full records of taxonomic coverage within the data package (if applicable).
Full descriptions of field and laboratory sample collection methods.
Full descriptions of field and laboratory sample processing methods.
Full descriptions of any hardware and software used (including make, model, and version where applicable).
Full attribute/variable information for all data.
Quality control procedures.
Relevant explanations for why the particular components detailed above were chosen for the project.

Checking metadata quality

The KNB utlizes MetaDig to evaluate metadata quality. To evaluate your metadata record for completeness, after submitting your dataset, click the "Quality report" button just below the dataset citation on the landing page for your dataset. This button shows the results of a series of checks that are automatically performed on your dataset by the MetaDig quality engine. We encourage users to take advantage of this automated feedback as a way to improve their dataset documentation.

Identification Guidelines

The KNB uses ORCID iDs to identify individuals associated with each data package. When submitting to the KNB, an ORCiD is required for the submitter of each data package. ORCiDs are not required for all associated parties (contacts, additional creators, etc.), but are strongly encouraged, especially for the primary creator, so that proper identification and attribution can be given. Additionally, access to edit each data package can only be granted to individuals using ORCiDs. Therefore, we advise researchers to register and record ORCID iDs for each individual involved with the project during the initial stages of project development (i.e., within the data management plan of the project proposal).

Publish data with a DOI for long-term stable access.

DOI Example — Publish your data with a DOI and it will display on the metadata page for your dataset.

Assign your data set a Digital Object Identifier (DOI) and allow others to cite your data with the DOI to find the current location(s) for the data.

Because web addresses can change over time, it is important that your data set not be tied to a specific address on the internet. DOIs allow an identifier to be created that can be resolved to the multiple locations that a data set might exist, and then client tools can decide which of those copies is the most efficient to access.

How to assign a DOI to your data

To assign a DOI using the KNB requires that you have the proper permissions on the data set. Simply log in to the KNB web interface with your username and password, and then search for your data set by clicking on "My Packages" from the user menu. If you are the owner or have been granted management permissions, you will see a 'Publish' button, which both makes the data set publicly accessible and assigns a DOI to that particular version of the data set. The DOI is registered with DataCite using the EZID service, and will be discoverable through multiple data citation networks, including DataONE and others.

Learn more about DOIs

Licensing and Data Distribution

All data and metadata will be released under either the CC-0 Public Domain Dedication or the Creative Commons Attribution 4.0 International License. In cases where legal (e.g., contractual) or ethical (e.g., human subjects) restrictions to data sharing exist, it is incumbent on the researcher to ensure compliance with all federal, university, and Institutional Review Board policies, so as to not publish sensitive data. As a repository dedicated to helping researchers increase collaboration and the pace of science, this repository needs certain rights to copy, store, and redistribute data and metadata. By uploading data, metadata, and any other content to the KNB, users warrant that they own any rights to the content and are authorized to do so under copyright or any other right that might pertain to the content. Data and facts themselves are not covered under copyright in the US and most countries, since facts in and of themselves are not eligible for copyright. That said, some associated metadata and some particular compilations of data could potentially be covered by copyright in some jurisdictions. **By uploading content, users grant the KNB repository and UCSB all rights needed to copy, store, redistribute, and share data, metadata, and any other content. By marking content as publicly available, users grant the KNB repository, UCSB, and any other users the right to copy the content and redistribute it to the public without restriction under the terms of the CC-0 Public Domain Dedication or the Creative Commons Attribution 4.0 International License, depending on which license users choose at the time of upload.**

Data Preservation

Data preservation is critically important to the KNB. We recognize that data preservation is difficult, both for technical and non-technical reasons. We have developed this data preservation plan to be explicit about how the KNB ensures the long-term preservation of the data entrusted to the repository. Key to this plan is our belief that no single organization can possibly provide sufficient institutional stability to guarantee multi-decadal preservation, and that partnerships among committed archives are necessary for successful data longevity. The guiding principles for our preservation plan follow.

Preserve the bits.
The primary mission of the KNB is data preservation and data access. High-quality data management is essential to data preservation. Data are managed following best practice for systems administration at UCSB’s North Hall Data Center (NHDC), which complies with a subset of Tier 1 ANSI/TIA Data Center Standards.
Open Science, Open Standards.
Wherever possible, we utilize and encourage the use of open standards for representation of data and metadata, and for provisioning of services. Metadata are managed in the open Ecological Metadata Language (EML), and we encourage researchers to provide data using open data formats such as ASCII CSV for tabular data and open formats for imagery. Open formats support accessibility of the data in the future even in the face of large software changes. In addition, the repository supports open access via the DataONE REST API for external groups to be able to access all components of the system.
Replicate data and metadata.
All metadata and data are replicated at geographically distinct locations including DataONE replication nodes. Replication is automated, and occurs any time that a change to any file in the system is made. Replication assures that data and metadata remain available even in the case of unplanned local system outages (such as a regional-scale fire or earthquake event), and provides for higher-performance access to data from multiple replica sites.
Strong Versioning.
Following the Force11 Data Citation guidelines , every version of every object in the system is assigned a unique identifier which is used to track that version of the object and relate it to earlier versions. For data packages, a DataCite DOI identifier is assigned and registered upon publication. All updates to objects are tracked, and old versions of data packages and data objects remain accessible even after an update, ensuring that any citations to the original versions of data can continue to be resolved to exactly the version of the data that was cited. Older versions of data packages are clearly marked, making it easy to navigate to the most recently updated versions, and search systems point users to the most recent version.
Frequent Auditing.
All data, metadata, and other objects in the system are provided with a checksum that can be used to validate that the contents of the object have not changed over time. The KNB participates in the DataONE federation, which audits all objects to ensure that the current copy of the object matches the original authoritative copy. In addition, DataONE checks all replica copies to ensure that they continue to persist and have matching checksums. This periodic auditing ensures that accidental content corruption due to disk, network, and human error are detected and remedied in a timely manner.

Wind-down Plan

In addition to this preservation plan, we recognize that over long time periods spanning many decades, it is extremely difficult to predict and sustain funding for single institutions. Our replication policy ensures high-availability during normal operations, but also provides security should investment in data archival wane. Should the main KNB fail to be sustained, then the management of the KNB will work with our partnering institutions to ensure that the archival replicas that they hold continue to be preserved and available to the scientific community. This will likely mean that another DataONE member node would become the authoritative holder of the data until a time when continued support can be obtained to re-establish operations.

UCSB North Hall Data Center

Primary systems are maintained at the North Hall Data Center, which complies with a subset of the Tier 1 ANSI/TIA Data Center Standards. Networking at 10GbE is via redundant connections to the public Internet and Internet2 through the CalREN2 and CENIC networks. Room UPS power backed by an emergency generator is available up to the 162kW capacity of the data center. Primary cooling capacity is derived from the campus chilled water loop. With the campus chilled water loop subject to regional power outages, secondary emergency cooling is from two locally installed chillers with a total 60 Tons of capacity. When NHDC is on emergency power, the emergency chilled water is used for the UPS room, AHU 5 (campus networking) and chilled water distribution to advanced rack cooling technologies. All racks are mounted on zone 4 ISO-Base platforms for seismic protection. The NHDC is subject to the environmental conditions of the campus and the region. Planned outages involving all equipment within NHDC will be uncommon, but occasionally necessary for certain types of maintenance activity. During such outages, data and metadata from the KNB will still be available via our replica holdings, but data submissions will be delayed until normal operations are restored.