Element Definitions:
|
eml-physical
|
|
Tooltip:
|
Physical structure. |
|
Summary:
|
Physical structure of an entity or entities. |
|
Description:
|
Physical structure of an entity or entities. This generally is a detailed
description of a text representation that shows how the columns and rows
of a table are represented, or simply the name of a well-known binary or
proprietary format (e.g., Microsoft Excel 2000).
|
|
Example:
|
|
|
identifier
|
|
Tooltip:
|
Unique identifier
|
|
Summary:
|
The unique identifier of this metadata file or object.
|
|
Description:
|
The identifier field provides a unique identifier for this
metadata documentation. It will most likely be part of a
sequence of numbers or letters that are meaningful in a
larger context, such as a metadata catalog. That larger
system can be identified in the "system" attribute. Multiple
identifiers can be listed corresponding to different catalog
systems.
|
|
Example:
|
<identifier system="metacat">nceas.3.2</identifier> |
|
format
|
|
Tooltip:
|
File format
|
|
Summary:
|
Contains the name of the format for this file.
|
|
Description:
|
This element contains the name of the file's format.
The file's format is typically ASCII, Unicode, or some
well-known binary format (e.g., Microsoft Excel 2000).
This could be a mime-type.
|
|
Example:
|
<format>ASCII</format> |
|
characterEncoding
|
|
Tooltip:
|
Character Encoding
|
|
Summary:
|
Contains the name of the chracter encoding used for the data.
|
|
Description:
|
This element contains the name of the character encoding.
This is typically ASCII or UTF-8, or one of the other common encodings.
|
|
Example:
|
<characterEncoding>UTF-8</characterEncoding> |
|
size
|
|
Tooltip:
|
Entity size
|
|
Summary:
|
Describes the physical size of the entity.
|
|
Description:
|
This element contains information of the physical size
of the entity, typically in bytes.
|
|
Example:
|
<entitySize unit="bytes">13</entitySize> |
|
authentication
|
|
Tooltip:
|
Authentication method
|
|
Summary:
|
A value, typically a checksum, used to authenticate that the bitstream
delivered to the user is identical to the original.
|
|
Description:
|
This element describes authentication procedures or
techniques, typically by giving a checksum method (e.g., MD5) and
checksum value for the bytestream.
|
|
Example:
|
<authentication method="MD5">f5b2177ea03aea73de12da81f896fe40</authentication>
|
|
compressionMethod
|
|
Tooltip:
|
Entity's compression method
|
|
Summary:
|
Name ofthe entity's compression method
|
|
Description:
|
This element describes any compression methods used to
compress the entity, such as zip, compress, etc.
|
|
Example:
|
|
|
encodingMethod
|
|
Tooltip:
|
Encoding Method
|
|
Summary:
|
Method used for encoding the entity
|
|
Description:
|
This element describes the entity's encoded method, such as
MIME base64 encoding or binhex encoding.
|
|
Example:
|
|
|
numHeaderLines
|
|
Tooltip:
|
Header lines
|
|
Summary:
|
Header lines in the entity
|
|
Description:
|
Number of header lines or information that prepares data.
|
|
Example:
|
<numHeaderLines>3</numHeaderLines> |
|
recordDelimiter
|
|
Tooltip:
|
Record delimiter character
|
|
Summary:
|
Character used to delimit records.
|
|
Description:
|
This element specifies the record delimiter character
when the format is text. The record delimiter is usually a
newline (\n) on UNIX, a carriage return (\r) on MacOS, or
both (\r\n) on Windows/DOS. Multiline records are usually
delimited with two line ending characters, for example on UNIX
it would be two newline characters (\n\n).
|
|
Example:
|
<recordDelimiter>\n\r</recordDelimiter> |
|
quoteCharacter
|
|
Tooltip:
|
Quote character
|
|
Summary:
|
Character used to quote values for delimeter escaping
|
|
Description:
|
This element specifies a character to be used in the entity
for quoting values so that field delimeters can be used within
the value. This basically allows delimeter "escaping". The
quoteChacter is typically a " or '.
|
|
Example:
|
<quoteCharacter>"</quoteCharacter> |
|
literalCharacter
|
|
Tooltip:
|
Literal character
|
|
Summary:
|
Character used to escape other characters
|
|
Description:
|
This element specifies a character to be used for escaping
character values so that the following character is treated as its literal
value. This allows "escaping" for special characters like quotes, commas,
and spaces when they aren't intended as a delimiter value. The
literalChacter is typically a \.
|
|
Example:
|
<literalCharacter>\</literalCharacter> |
|
fieldStartColumn
|
|
Tooltip:
|
Start column
|
|
Summary:
|
The starting column number for a fixed format attribute.
|
|
Description:
|
FixedWidth fields have a set length, thus
the end of the field can always be determined
by adding the fieldWidth to the starting
column number.
|
|
Example:
|
any positive integer, see example in "delimeter" description
|
|
fieldWidth
|
|
Tooltip:
|
Field width
|
|
Summary:
|
FieldWidth specification for fixed field length.
|
|
Description:
|
FixedWidth fields have a set length, thus
the end of the field can always be determined
by adding the fieldWidth to the starting
column number.
|
|
Example:
|
any positive integer, see example in "delimeter"
description
|
|
fieldDelimiter
|
|
Tooltip:
|
Attribute delimiter
|
|
Summary:
|
The end of the attribute (field) is delimited by a
special character called a field delimiter.
|
|
Description:
|
Variable width format fields (attributes) can vary in their
field length, thus the end of the field is
delimited by a special character called a
field delimiter (typically a comma or a space).
Data sets are generally classified as fixedWidth
format or variableWidth format, but we have
determined that this is actually a per-field
classification because one may encounter
fixedWidth fields mixed together in the same
data file with variableWidth fields.
In our encoding scheme, the start of each field
is assumed to be the column after the last column
of the previous field, or the first column
if this is the first field in the dataset, unless
the starting column is explicity enumerated using the
"fieldStartColumn" element.
The end column for each field is classified
using either a special character delimeter indicated
using the filedDelimiter element,
or a fixed field length indicated by using the "fieldWidth"
element. The delimiter for the last field in the data set can be omitted.
variableWidth fields can vary in their field length, and the end of
the field is delimited by a special character
called a field delimiter, usually a comma or
a tab character. fixedWidth fields have a set
length, and so the end of the field can always
be determined by adding the fieldWidth to the
starting column number. Here is an example:
Assume we have the following data in a data set:
May,100aaaa,1.2,
April,200aaaa,3.4,
June,300bbbb,4.6,
The metadata indicating the physical layout of the 4 fields would include the
following:
<delimiter>,</delimiter>
<fieldWidth>3</fieldWidth>
<fieldWidth>3</fieldWidth>
<delimiter>,</delimiter>
In a strictly fixed format file, the metadata would be slightly different:
May100aaaa1.2
Apr200aaaa3.4
Jun300bbbb4.6
<fieldWidth>3</fieldWidth>
<fieldWidth>3</fieldWidth>
<fieldWidth>4</fieldWidth>
<fieldWidth>3</fieldWidth>
or, one could explicitly describe the starting columns:
<fieldStartColumn>1</fieldStartColumn>
<fieldWidth>3</fieldWidth>
<fieldStartColumn>4</fieldStartColumn>
<fieldWidth>3</fieldWidth>
<fieldStartColumn>7</fieldStartColumn>
<fieldWidth>4</fieldWidth>
<fieldStartColumn>11</fieldStartColumn>
<fieldWidth>3</fieldWidth>
|
|
Example:
|
comma, tab, white space, etc.
|
|
Attribute Definitions:
|
system
|
|
Tooltip:
|
Catalog system
|
|
Summary:
|
The catalog system in which this identifier is used.
|
|
Description:
|
This element gives the name of the catalog system in which
this identifier is used. It is useful to determine the
scope of the identifier, and to determine the semantics
of the various subparts of the identifier. Unresolved issue:
can or should this be a URI/URL pointing to the catalog
system, or just the name?
|
|
Example:
|
<identifier system="metacat">nceas.3.2</identifier> |
|
unit
|
|
Tooltip:
|
Unit of measurement
|
|
Summary:
|
Unit of measurement for the entity size, typically bytes
|
|
Description:
|
This element gives the unit of measurement for the
size of the entity, and is typically bytes.
|
|
Example:
|
<entitySize unit="bytes">13</entitySize> |
|
method
|
|
Tooltip:
|
Authentication method
|
|
Summary:
|
The method used to calculate an authentication checksum.
|
|
Description:
|
This element names the method used to calculate and
authentication checksum that can be used to validate a
bytestream. Typical checksum methods include MD5 and CRC.
|
|
Example:
|
<authentication method="MD5">f5b2177ea03aea73de12da81f896fe40</authentication>
|
|