About the KNB

The Knowledge Network for Biocomplexity (KNB) is an international repository intended to facilitate ecological and environmental research.

The KNB was launched in 1998 with a grant from the National Science Foundation (NSF), with the purpose of being the long term home for synthesis datasets and research products generated by National Center for Ecological Analysis and Synthesis (NCEAS) working groups. Since then, NCEAS has continued to operate the KNB not only as an archive for NCEAS working group products, but also for the broader ecology and environmental science community. The KNB acceps all environmental or ecological related data and publishes datasets with Digital Object Identifiers for the express purpose of ensuring long-term access to these datasets. We strive to abide by FAIR (findable, accessible, interoperable, resuble) principles of data sharing and preservation.

NCEAS

Hosted by the National Center for Ecological Analysis and Synthesis

NCEAS has fostered a global community of ecologists and multidisciplinary environmental scientists eager to solve some of the toughest environmental questions through collaborative, synthesis research, since 1995. NCEAS believes data transparency and reproducibility are essential to the utility of the environmental sciences, and works to advance scientific culture in the direction of open science. The KNB is a primary public data archive for NCEAS working group research products, and hosts data from NCEAS initiatives such as the State of Alaska's Salmon and People project and the Science for Nature and People Parnership.

DataONE

Partnered with DataONE

Data Observation Network for Earth (DataONE) provides a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. DataONE is a federation of repositories based on a common infrastructure that allows for federated search and replication between repositories. The KNB is a Tier 4 DataONE member node, meaning that it allows the DataONE infrastructure to use available storage space on the KNB for storing copies of objects that originate on other member nodes in the DataONE network, based on the Node Replication Policy. This helps ensure dataset longevity by providing geographically distinct copies of data on multiple servers.

Data underlies metadata

Powered by rich, detailed metadata.

For scientists, the KNB is an efficient way to share, discover, access and interpret complex ecological data. Due to rich contextual information provided with KNB data, scientists are able to integrate and analyze data with less effort. The data originate from a highly-distributed set of field stations, laboratories, research sites, and individual researchers. The foundation of the KNB is the rich, detailed metadata provided by researchers that collect data, which promotes both automated and manual integration of data into new projects.

Open-source software.

Learn about the KNB Developer Tools and API

As part of the KNB effort, data management software is developed in a free and open source manner, so other groups can build upon the tools. The KNB is powered by the Metacat data management system, and is optimized for handling data sets described using the Ecological Metadata Language, but can store any XML-based metadata document. Learn more about the software behind KNB


Data Submission Guidelines

Getting Started

Submitting your data to the KNB is easy. Upload your data on the KNB website using a simple online form. The KNB supports ORCID logins.

Upload now

If your project involves large numbers of files (hundreds to thousands), you may want to automate the creation of your metadata. This is certainly possible, and has been done in programs like Matlab, R, etc. Send a note to KNB Help, and we can point you to some examples of scripted metadata creation.

If your dataset is large (>50 GB) please write to KNB Help prior to publishing.


Use open source file formats

File formats
Credit: Blugraphic.com

While the KNB supports the upload of any data file format, sharing data can be greatly enhanced if you use ubiquitous, easy-to-read formats. For instance, while Microsoft Excel files are commonplace, it's better to export these spreadsheets to Comma Separated Values (CSV) text files, which can be read on any computer without having Microsoft products installed.

For image files, use common formats like PNG, JPEG, TIFF, etc. Most all browsers can handle these. If you use specialized software to create your data, try to save you data in well-known formats. For instance, GIS data can be exported to ESRI shapefiles, and data created in Matlab or other matrix-based programs can be exported as NetCDF (an open binary format).


Organize files

In order to optimally document and share a project’s output, quality data file management is necessary. Some resources for best practices for managing data files can be found at DataONE and in Borer et al. 2009. The following are a few guidelines that are encouraged for file organization for projects that plan to submit to the KNB. Following these guidelines should help ensure a project’s outputs are easy to access and understand.


Write high quality metadata

Metadata are ultimately "data about data" - the contextual information needed to interpret a set of raw data observations. They provide meaning to data, and are critical when it comes to sharing, integrating, and analyzing data. Too often people collect data for projects and leave them undocumented for years or decades. These data, while potential of very high value, can become useless over time due to data entropy.

The KNB primarily stores metadata in a structured, XML-based files. Transformation of plain text documentation into a structured metadata format for archiving is done automatically when submitting through the KNB website. Prior to submitting documentation to the KNB, we advise projects to create complete, plain text metadata records. Ideally, plans to create and store metadata records should be made during the initial stages of project development (i.e., within the data management plan of the project proposal).

The goal of metadata is to document a project's output so that a reasonable scientist will be able to understand and use all the components of the output without any outside consultation. The following components represent a non-exhaustive list of components typically expected within metadata records submitted to the KNB:


Checking metadata quality

The KNB utlizes MetaDig to evaluate metadata quality. To evaluate your metadata record for completeness, after submitting your dataset, click the "Quality report" button just below the dataset citation on the landing page for your dataset. This button shows the results of a series of checks that are automatically performed on your dataset by the MetaDig quality engine. We encourage users to take advantage of this automated feedback as a way to improve their dataset documentation.


Identification Guidelines

The KNB uses ORCID iDs to identify individuals associated with each data package. When submitting to the KNB, an ORCiD is required for the submitter of each data package. ORCiDs are not required for all associated parties (contacts, additional creators, etc.), but are strongly encouraged, especially for the primary creator, so that proper identification and attribution can be given. Additionally, access to edit each data package can only be granted to individuals using ORCiDs. Therefore, we advise researchers to register and record ORCID iDs for each individual involved with the project during the initial stages of project development (i.e., within the data management plan of the project proposal).


Publish data with a DOI for long-term stable access.

DOI Example
Publish your data with a DOI and it will display on the metadata page for your dataset.

Assign your data set a Digital Object Identifier (DOI) and allow others to cite your data with the DOI to find the current location(s) for the data.

Because web addresses can change over time, it is important that your data set not be tied to a specific address on the internet. DOIs allow an identifier to be created that can be resolved to the multiple locations that a data set might exist, and then client tools can decide which of those copies is the most efficient to access.

How to assign a DOI to your data

To assign a DOI using the KNB requires that you have the proper permissions on the data set. Simply log in to the KNB web interface with your username and password, and then search for your data set by clicking on "My Packages" from the user menu. If you are the owner or have been granted management permissions, you will see a 'Publish' button, which both makes the data set publicly accessible and assigns a DOI to that particular version of the data set. The DOI is registered with DataCite using the EZID service, and will be discoverable through multiple data citation networks, including DataONE and others.

Learn more about DOIs


Licensing and Data Distribution

Creative Commons License CC0

All data and metadata will be released under either the CC-0 Public Domain Dedication or the Creative Commons Attribution 4.0 International License. In cases where legal (e.g., contractual) or ethical (e.g., human subjects) restrictions to data sharing exist, it is incumbent on the researcher to ensure compliance with all federal, university, and Institutional Review Board policies, so as to not publish sensitive data. As a repository dedicated to helping researchers increase collaboration and the pace of science, this repository needs certain rights to copy, store, and redistribute data and metadata. By uploading data, metadata, and any other content to the KNB, users warrant that they own any rights to the content and are authorized to do so under copyright or any other right that might pertain to the content. Data and facts themselves are not covered under copyright in the US and most countries, since facts in and of themselves are not eligible for copyright. That said, some associated metadata and some particular compilations of data could potentially be covered by copyright in some jurisdictions. **By uploading content, users grant the KNB repository and UCSB all rights needed to copy, store, redistribute, and share data, metadata, and any other content. By marking content as publicly available, users grant the KNB repository, UCSB, and any other users the right to copy the content and redistribute it to the public without restriction under the terms of the CC-0 Public Domain Dedication or the Creative Commons Attribution 4.0 International License, depending on which license users choose at the time of upload.**


Data Preservation

Data preservation is critically important to the KNB. We recognize that data preservation is difficult, both for technical and non-technical reasons. We have developed this data preservation plan to be explicit about how the KNB ensures the long-term preservation of the data entrusted to the repository. Key to this plan is our belief that no single organization can possibly provide sufficient institutional stability to guarantee multi-decadal preservation, and that partnerships among committed archives are necessary for successful data longevity. The guiding principles for our preservation plan follow.

Wind-down Plan

In addition to this preservation plan, we recognize that over long time periods spanning many decades, it is extremely difficult to predict and sustain funding for single institutions. Our replication policy ensures high-availability during normal operations, but also provides security should investment in data archival wane. Should the main KNB fail to be sustained, then the management of the KNB will work with our partnering institutions to ensure that the archival replicas that they hold continue to be preserved and available to the scientific community. This will likely mean that another DataONE member node would become the authoritative holder of the data until a time when continued support can be obtained to re-establish operations.

UCSB North Hall Data Center

Primary systems are maintained at the North Hall Data Center, which complies with a subset of the Tier 1 ANSI/TIA Data Center Standards. Networking at 10GbE is via redundant connections to the public Internet and Internet2 through the CalREN2 and CENIC networks. Room UPS power backed by an emergency generator is available up to the 162kW capacity of the data center. Primary cooling capacity is derived from the campus chilled water loop. With the campus chilled water loop subject to regional power outages, secondary emergency cooling is from two locally installed chillers with a total 60 Tons of capacity. When NHDC is on emergency power, the emergency chilled water is used for the UPS room, AHU 5 (campus networking) and chilled water distribution to advanced rack cooling technologies. All racks are mounted on zone 4 ISO-Base platforms for seismic protection. The NHDC is subject to the environmental conditions of the campus and the region. Planned outages involving all equipment within NHDC will be uncommon, but occasionally necessary for certain types of maintenance activity. During such outages, data and metadata from the KNB will still be available via our replica holdings, but data submissions will be delayed until normal operations are restored.