Packages and Relationships

Back | Home | Next

Metacat allows a user to create a virtual link between XML documents within the system. These links are called Relationships and are defined in a Package file. A relationship can be defined between two XML files in Metacat, between a file in Metacat and an external URL (i.e. a web page) or from a Metacat XML file to a data file that exists on the Metacat System. The following is an example of a package file.

    <?xml version="1.0"?>
    <!DOCTYPE package PUBLIC "-//NCEAS//package//EN" 
                                "http://dev.nceas.ucsb.edu/dtd/schemas/package.dtd">
    <package>
    
     <relation>
      <subject>Metacat://server.domain.com/Metacat?docid=nceas.17</subject>
      <relationship>isfilemetadatafor</relationship>
      <object>http://www.domain.com/data/somedatafile.html</object>
     </relation>
     <relation>
      <subject>Metacat://server.domain.com/Metacat?docid=nceas.18</subject>
      <relationship>isresourcemetadatafor</relationship>
      <object>http://www.domain.com/data/somedatafile.html</object>
     </relation>
     <relation>
      <subject>Metacat://server.domain.com/Metacat?docid=nceas.22</subject>
      <relationship>isvariablemetadatafor</relationship>
      <object>http://www.domain.com/data/somedatafile.html</object>
     </relation>
     
     <relation>
      <subject>Metacat://server.domain.com/Metacat?docid=nceas.52</subject>
      <relationship>iscometadatafor</relationship>
      <object>Metacat://server2.otherdomain.com/Metacat?docid=xyz.10</object>
     </relation>
     
     <relation>
      <subject>Metacat://server.domain.com/Metacat?docid=nceas.99</subject>
      <relationship>iscometadatafor</relationship>
      <object>Metacat://server.domain.com/Metacat?docid=nceas.101</object>
     </relation>
     
    </package>
  
Description of the Package File

Note that the doctype of this document is an unregistered NCEAS specific DTD (-//NCEAS//package//EN). The package doctype is an application property of Metacat. Setting this property (and others) is described in Setting Metacat Properties. The package file is broken up into n relations. Each relation has a subject, relationship, and an object. This grouping can be read as follows: <subject> has <relationship> to <object>. Each relation group is a logical link between the subject and object with the relationship being a description of that link.

Transitive Relations

In the sample file there are five listed relations. In actuallity, this package file represents 11 different relations. The top three relations are grouped together because they all have the same object. Whenever two relations have the same object, the subjects of the two relations are related to each other in a transitive fashion. For example: The relations A -> B and C -> B contain a transitive relationship. Because A and C have the same object, A and C are related. Metacat defines these transitive relations in both directions, thus A -> C AND C -> A. Thus any package file with n relations with the same object actually represent n + n! (n! transitive relations + n explicit relations) different relations. There are 14 relations total because the bottom two relations are not transitive in any way.

The Utility of Relations

Relations become useful because many XML data schemas are broken up into multiple DTDs. Thus, there may be many different XML files that are all related to each other yet are stored seperately within the system. Also, since we, here at NCEAS, are developing Metacat for use as a metadata repository for ecological data, we need some way of linking our metadata to the datafiles that they describe. Packages are the way we do this.

Post Processed Relations

A package file is inserted into Metacat as any other file is. Its doctype is checked against the packagedoctype property in the Metacat.properties file. If it is of that type, the file is sent to a postprocessor to be analyzed and inserted into the xml_relation table. It is at this time that the transitive relations are located and created internally. The xml_relation table looks like the following.

relationiddocidsubjectsubjectdoctype relationshipobjectobjectdoctype
1 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.17 -//NCEAS//eml-file//EN isfilemetadatafor http://www.domain.com/data/somedatafile.html null
2 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.18 -//NCEAS//resource//EN isresourcemetadatafor http://www.domain.com/data/somedatafile.html null
3 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.22 -//NCEAS//eml-variable//EN isvariablemetadatafor http://www.domain.com/data/somedatafile.html null
4 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.17 -//NCEAS//eml-file//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.18 -//NCEAS//resource//EN
5 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.18 -//NCEAS//resource//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.17 -//NCEAS//eml-file//EN
6 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.17 -//NCEAS//eml-file//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.22 -//NCEAS//eml-variable//EN
7 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.22 -//NCEAS//eml-variable//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.17 -//NCEAS//eml-file//EN
8 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.18 -//NCEAS//resource//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.22 -//NCEAS//eml-variable//EN
9 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.22 -//NCEAS//eml-variable//EN hastransitiverelationto Metacat://server.domain.com/Metacat?docid=nceas.18 -//NCEAS//resource//EN
10 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.52 -//NCEAS//resource//EN iscometadatafor Metacat://server2.otherdomain.com/Metacat?docid=xyz.10 null
11 nceas.200 Metacat://server.domain.com/Metacat?docid=nceas.99 -//NCEAS//eml-variable//EN iscometadatafor Metacat://server.domain.com/Metacat?docid=nceas.101 -//NCEAS//resource//EN

Once, the system has processed the package file and inserted the relations into the xml_relation table, a files relations are always returned to with it in the resultset of a query.

Package Views (formerly known as 'backtracking')

Package View is a feature that was intentionally left out of the Queries and Results section. Package views involves sending a doctype (called a returndoc) along with a query request. When there is a hit from that query, the system will check the doctype of the hit document against the returndoc doctype. If the doctypes do not match, the system checks the xml_relation table to see if that document has a related document that matches that doctype. If such a relation exists, that related document is returned instead of the one which was originally hit. If no such relation exists, then the document which was originally hit is returned. This allows a display system (such as a web browser) to try to display a certain type of document.

For example: Take our package file from above. Say we do a query for "abalone" which returns the document nceas.22 of type -//NCEAS//eml-variable//EN. However, we have set returndoc equal to "-//NCEAS//resource//EN". When nceas.22 is hit, the system will check its related documents to see if there is a document of type -//NCEAS//resource//EN related to nceas.22. Since there is, (relationid 9) document nceas.18 is returned instead of nceas.22. Now, since nceas.22 is related to nceas.18, nceas.22 will still be returned along with nceas.18 in one of its relation tags in the resultset.

From a client the returndoc is a servlet parameter. A URL with a returndoc would look something like:

http://server.domain.com/Metacat?action=query&anyfield=%&qformat=html&returndoc=-//NCEAS//resource//EN

The system then inserts the returndoc parameter value into a pathquery document as illustrated in Queries and Results.


Back | Home | Next