Module Documentation: eml-physical
Back to EML Contents
The eml-physical Module defines the structural characteristics of data formats as delivered over the wire or as found in a file system. One physical object (which can be a bytestream or an object in a file system) might contain multiple entities (for example, this would be typical in a MS Access file that contained multiple tables of data). However, it is typically used to describe a file or stream that is in some text-based format such as ASCII or UTF-8, and includes the information needed to parse the data stream to extract the entity and its attributes from the stream.

Element Definitions:

eml-physical
Content of this field: Description of this field:
Elements: Required?: How many:
A sequence of (
identifierOptionalMultiple Times
formatOptionalMultiple Times
characterEncodingOptionalMultiple Times
sizeOptionalMultiple Times
authenticationOptionalMultiple Times
compressionMethodOptionalMultiple Times
encodingMethodOptionalMultiple Times
numHeaderLinesOptionalMultiple Times
recordDelimiterOptionalMultiple Times
maxRecordLengthOptionalMultiple Times
quoteCharacterOptionalMultiple Times
literalCharacterOptionalMultiple Times
A sequence of (
fieldStartColumnOptionalMultiple Times
A choice of (
fieldWidthOptionalMultiple Times
OR
fieldDelimiterOptionalMultiple Times
)
)
)
Attributes: Required?: Default Value:

Tooltip:
Physical structure.
Summary:
Physical structure of an entity or entities.
Description:
Physical structure of an entity or entities. This generally is a detailed description of a text representation that shows how the columns and rows of a table are represented, or simply the name of a well-known binary or proprietary format (e.g., Microsoft Excel 2000).
Example:

Lineage:
The eml-physical was introduced into EML 1.4 as eml-file.
identifier
Content of this field: Description of this field:
Elements: Required?: How many:
Attributes: Required?: Default Value:

Tooltip:
Unique identifier
Summary:
The unique identifier of this metadata file or object.
Description:
The identifier field provides a unique identifier for this metadata documentation. It will most likely be part of a sequence of numbers or letters that are meaningful in a larger context, such as a metadata catalog. That larger system can be identified in the "system" attribute. Multiple identifiers can be listed corresponding to different catalog systems.
Example:
<identifier system="metacat">nceas.3.2</identifier>
Lineage:
The 'identifier' field is derived from the eml-dataset meta_file_id filed in EML 1.4.
format
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
File format
Summary:
Contains the name of the format for this file.
Description:
This element contains the name of the file's format. The file's format is typically ASCII, Unicode, or some well-known binary format (e.g., Microsoft Excel 2000). This could be a mime-type.
Example:
<format>ASCII</format>
Lineage:
The format element was introduced into EML 1.4.
characterEncoding
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Character Encoding
Summary:
Contains the name of the chracter encoding used for the data.
Description:
This element contains the name of the character encoding. This is typically ASCII or UTF-8, or one of the other common encodings.
Example:
<characterEncoding>UTF-8</characterEncoding>
Lineage:
Introduced in EML 2.0
size
Content of this field: Description of this field:
Elements: Required?: How many:
Attributes: Required?: Default Value:

Tooltip:
Entity size
Summary:
Describes the physical size of the entity.
Description:
This element contains information of the physical size of the entity, typically in bytes.
Example:
<entitySize unit="bytes">13</entitySize>
Lineage:
The entitySize was introduced into EML 1.4.
authentication
Content of this field: Description of this field:
Elements: Required?: How many:
Attributes: Required?: Default Value:

Tooltip:
Authentication method
Summary:
A value, typically a checksum, used to authenticate that the bitstream delivered to the user is identical to the original.
Description:
This element describes authentication procedures or techniques, typically by giving a checksum method (e.g., MD5) and checksum value for the bytestream.
Example:
<authentication method="MD5">f5b2177ea03aea73de12da81f896fe40</authentication>
Lineage:
The authentication element was introduced into EML 1.4.
compressionMethod
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Entity's compression method
Summary:
Name ofthe entity's compression method
Description:
This element describes any compression methods used to compress the entity, such as zip, compress, etc.
Example:

Lineage:
The compressed element was introduced into EML 1.4.
encodingMethod
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Encoding Method
Summary:
Method used for encoding the entity
Description:
This element describes the entity's encoded method, such as MIME base64 encoding or binhex encoding.
Example:

Lineage:
The encoded element was introduced into EML 1.4.
numHeaderLines
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Header lines
Summary:
Header lines in the entity
Description:
Number of header lines or information that prepares data.
Example:
<numHeaderLines>3</numHeaderLines>
Lineage:
The numHeaderLines element was introduced into EML 1.4.
recordDelimiter
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Record delimiter character
Summary:
Character used to delimit records.
Description:
This element specifies the record delimiter character when the format is text. The record delimiter is usually a newline (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two newline characters (\n\n).
Example:
<recordDelimiter>\n\r</recordDelimiter>
Lineage:
The recordDelimiter element was introduced into EML 1.4.
maxRecordLength
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:
quoteCharacter
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Quote character
Summary:
Character used to quote values for delimeter escaping
Description:
This element specifies a character to be used in the entity for quoting values so that field delimeters can be used within the value. This basically allows delimeter "escaping". The quoteChacter is typically a " or '.
Example:
<quoteCharacter>"</quoteCharacter>
Lineage:
The quoteCharacter element was taken from the NBII standard.
literalCharacter
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Literal character
Summary:
Character used to escape other characters
Description:
This element specifies a character to be used for escaping character values so that the following character is treated as its literal value. This allows "escaping" for special characters like quotes, commas, and spaces when they aren't intended as a delimiter value. The literalChacter is typically a \.
Example:
<literalCharacter>\</literalCharacter>
Lineage:
Introduced in EML 2.0.
fieldStartColumn
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Start column
Summary:
The starting column number for a fixed format attribute.
Description:
FixedWidth fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number.
Example:
any positive integer, see example in "delimeter" description
Lineage:
Introduced into EML 2.0.
fieldWidth
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Field width
Summary:
FieldWidth specification for fixed field length.
Description:
FixedWidth fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number.
Example:
any positive integer, see example in "delimeter" description
Lineage:
The fieldWidth element was introduced into EML 1.4. Semantics changed to work identically to the NBII DTD.
fieldDelimiter
Content of this field: Description of this field:
Type: xs:string
Attributes: Required?: Default Value:

Tooltip:
Attribute delimiter
Summary:
The end of the attribute (field) is delimited by a special character called a field delimiter.
Description:
Variable width format fields (attributes) can vary in their field length, thus the end of the field is delimited by a special character called a field delimiter (typically a comma or a space). Data sets are generally classified as fixedWidth format or variableWidth format, but we have determined that this is actually a per-field classification because one may encounter fixedWidth fields mixed together in the same data file with variableWidth fields. In our encoding scheme, the start of each field is assumed to be the column after the last column of the previous field, or the first column if this is the first field in the dataset, unless the starting column is explicity enumerated using the "fieldStartColumn" element. The end column for each field is classified using either a special character delimeter indicated using the filedDelimiter element, or a fixed field length indicated by using the "fieldWidth" element. The delimiter for the last field in the data set can be omitted. variableWidth fields can vary in their field length, and the end of the field is delimited by a special character called a field delimiter, usually a comma or a tab character. fixedWidth fields have a set length, and so the end of the field can always be determined by adding the fieldWidth to the starting column number. Here is an example: Assume we have the following data in a data set: May,100aaaa,1.2, April,200aaaa,3.4, June,300bbbb,4.6, The metadata indicating the physical layout of the 4 fields would include the following: <delimiter>,</delimiter> <fieldWidth>3</fieldWidth> <fieldWidth>3</fieldWidth> <delimiter>,</delimiter> In a strictly fixed format file, the metadata would be slightly different: May100aaaa1.2 Apr200aaaa3.4 Jun300bbbb4.6 <fieldWidth>3</fieldWidth> <fieldWidth>3</fieldWidth> <fieldWidth>4</fieldWidth> <fieldWidth>3</fieldWidth> or, one could explicitly describe the starting columns: <fieldStartColumn>1</fieldStartColumn> <fieldWidth>3</fieldWidth> <fieldStartColumn>4</fieldStartColumn> <fieldWidth>3</fieldWidth> <fieldStartColumn>7</fieldStartColumn> <fieldWidth>4</fieldWidth> <fieldStartColumn>11</fieldStartColumn> <fieldWidth>3</fieldWidth>
Example:
comma, tab, white space, etc.
Lineage:
The delimiter element was introduced into EML 1.4. Semantics changed to work identically to the NBII DTD, and then modified to fit more cases.

Attribute Definitions:

system

Type: xs:string

Use: optional


Tooltip:
Catalog system
Summary:
The catalog system in which this identifier is used.
Description:
This element gives the name of the catalog system in which this identifier is used. It is useful to determine the scope of the identifier, and to determine the semantics of the various subparts of the identifier. Unresolved issue: can or should this be a URI/URL pointing to the catalog system, or just the name?
Example:
<identifier system="metacat">nceas.3.2</identifier>
Lineage:
New to EML 2.0.
unit

Use: required


Tooltip:
Unit of measurement
Summary:
Unit of measurement for the entity size, typically bytes
Description:
This element gives the unit of measurement for the size of the entity, and is typically bytes.
Example:
<entitySize unit="bytes">13</entitySize>
Lineage:
The unit was introduced into EML 1.4.
method

Type: xs:string

Use: optional


Tooltip:
Authentication method
Summary:
The method used to calculate an authentication checksum.
Description:
This element names the method used to calculate and authentication checksum that can be used to validate a bytestream. Typical checksum methods include MD5 and CRC.
Example:
<authentication method="MD5">f5b2177ea03aea73de12da81f896fe40</authentication>
Lineage:
The authentication element was introduced into EML 1.4.

Complex Type Definitions:

Simple Type Definitions:

Web Contact: jones@nceas.ucsb.edu