MORPHO - Data and Metadata

    MORPHO is a tool that has been construted to help ecologists create and manage ecological data, as well as possibly locate and use related data from other sources. It is assumed that data can be stored in any of a large variety of forms, including simple text files, spreadsheets, custom binary files, or even handwritten notebooks. Usually, however, it is assumed that the data is stored in electronic form on a computer. That computer can be the same one that MORPHO is running on, or some other computer connected to the local computer over a local area network or the internet.

    Metadata is loosely defined as "data about data". It might be something as simple as the text at the top of a column in a spreadsheet that identifies what kind of information is stored in the column, or it might be a description of the overall history of a collection of hundreds of individual data tables along with references to related efforts. The distinction between data and metadata is not always well defined; in MORPHO, metadata is thought of as any information used to find datasets of information to the ecologists. MORPHO is based on the assumption that there is some collection of metadata which can be searched (queried) so as to locate datasets of interest. This  data collection can be either on the local machine, or somewhere else on the network. The metadata should 'point to' the data, which is either 'on-line' or available through some reference in the metadata (e.g. some person or organization).

    Metadata can occur in a variety of forms and formats. It might be as simple as the name of a data file and a few keywords describing the data in the datafile. In this case creating the metadata would require the user to fill in a few named fields (the term "field" is used to refer to some value, usually a name/value pair, in the metadata collection); e.g.

data_file_name = xxxxx
keyword1 = yyyyy
keyword2 = zzzzz

    In this simple example, metadata is defined in terms of named parameters (e.g. "data_file_name"), each of which has some specific value (e.g. "xxxxx"). Thus metadata documents often have a predefined structure (which defines the set of values or fields that must be added), while specific instances of these documents have specific values filled in with one instance for each dataset being described. The simplest metadata document can be thought of as a simple form or table. The number and name of fields in the form (or columns in the table) define the structure, while the information actually entered into text field on the form (cells in the table) create a specific instance of the metadata document. Other types of metadata may be more complicated; in particular, the various fields or item may be arranged not in a 'flat' list but in a structured hierarchy (an 'outline' type structure).

    MORPHO contains modules for creating and editing metadata, seaching metadata, and organizing/viewing certain kinds of data. MORPHO is designed to handle many different types of metadata documents (called document types,  or 'doctypes' ). It is only required that the metadata be placed in an XML format. XML can be used to describe any hierarchical data structure (i.e. an 'outline' or 'tree' organization).; one can thus encode almost any type of metadata in XML. XML structure can be specified using DTDs (document type descriptors) or XML schemas. Thus, specific types of metadata documents can be specified using these general purpose tools. (Similarly, general purpose XML tools can transform one document type into another - another reason for XML.)

    MORPHO will work with any metadata in XML format. For example, the "Quick Search" function on the toolbar simply searches all metadata entries for the supplied text. In this case the metadata structure and field names are ignored. On the other hand, the structure of the metadata can be used explicitly in a search. For eample, one can seach for an author's name specifically in the "Authors" field or a keyword only in the "Keyword" field.

    Such searches require some knowledge of how the metadata is structured (e.g what are the names of fields). A partial screen display from MORPHO is shown below to illustrate just how one can examine the structure of a metadata doctype. The various document types that the system 'knows about' are shown in the list on the left. Selecting one of these document types will result in the structure of that doctype being displayed in the 'outline' shown to its immediate right. (The structure shown is created at runtime either from a locally stored DTD or Schema document, or constructed from data on the central metadata server database.) The outline shows the names of various fields in this doctype and where this field is located in the data hierarchy. The user can thus select the field to be searched from the outline.

    EML (Ecological Metadata Language) is a set of metadata document types specifically designed as a comprehensive set of metdata structures for describing ecological data. EML is divided into several (currently 8) separate document types.   EML is currently being revised to better agree with other metadata standards; both the number, names, and structural details of individual modules are thus subject to change. Nevertheless, a current set of such metadata standards is included with MORPHO for use in editing and querying. In addition, the user can add other documents types.

   The Demo Editor can also be used to display ALL EML documents at one time by clicking the "Open" button at the lower left and then opening the catalog/emltemplate.xml file. The result is shown below. The root node in the outline is 'eml'. It has eight 'children', representing each of the EML dtds. Note that this document shows one of all possible nodes. Some are actually mutually exclusive, while other may be duplicated many times in actually metadata documents.