Class Sitemap

java.lang.Object
java.util.TimerTask
edu.ucsb.nceas.metacat.Sitemap
All Implemented Interfaces:
Runnable

public class Sitemap extends TimerTask
A Sitemap represents a document that lists all of the content of the Metacat server for use by harvesting spiders that wish to index the contents of the Metacat site. It is used to generate an XML representation of all of the URLs of the site in order to facilitate indexing of the metacat site by search engines. Which objects are included? - Only documents with public read permission are included - Only documents with object_formats in the xml_catalog table are included - All non-obsoleted metadata objects are included in the sitemap(s) Other notes: - The sitemaps this class generates are intended to be served another application such as MetacatUI - A sitemap index is generated regardless of the number of URLs present - URLs for the location of the sitemaps and the entries themselves are controlled by the 'sitemap.location.base' and 'sitemap.entry.base' properties which can be full URLs or absolute paths. - sitemap.location.base controls first part of the URLs in the sitemap index - sitemap.entry.base controls the first part of the URLs in the sitemap files themselves
Author:
Matt Jones, Bryce Mecum
  • Constructor Summary

    Constructors
    Constructor
    Description
    Sitemap(File directory, String locationBase, String entryBase, String portalBase, List<String> portalFormats)
    Construct a new instance of the Sitemap class.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Generate all of the sitemap files needed to list the URLs from this instance of Metacat, using the open sitemap format described here: http://www.sitemaps.org/protocol.html URLs are written to a one or more files and a sitemap index file is always written.
    Generate a comma-separated list of metadata format IDs so generateSitemaps can filter the available objects to just metadata objects.
    void
    run()
    Execute the timed task when called, in this case by generating the sitemap files needed for this Metacat instance.

    Methods inherited from class java.util.TimerTask

    cancel, scheduledExecutionTime

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Sitemap

      public Sitemap(File directory, String locationBase, String entryBase, String portalBase, List<String> portalFormats)
      Construct a new instance of the Sitemap class.
      Parameters:
      directory - The location to store sitemap files
      locationBase - The base URL for constructing sitemap location URLs
      entryBase - The base URL for constructing sitemap entry URLs any metadata records
      portalBase - The base URL for constructing sitemap entry URLs for portals
      portalFormats - Set of format IDs to determine whether a record is a portal or not
  • Method Details

    • run

      public void run()
      Execute the timed task when called, in this case by generating the sitemap files needed for this Metacat instance.
      Specified by:
      run in interface Runnable
      Specified by:
      run in class TimerTask
    • generateSitemaps

      public void generateSitemaps()
      Generate all of the sitemap files needed to list the URLs from this instance of Metacat, using the open sitemap format described here: http://www.sitemaps.org/protocol.html URLs are written to a one or more files and a sitemap index file is always written. The number of sitemap files is determined by MAX_URLS_IN_FILE and how many metadata documents you have registered in Metacat.

      The sitemap index can be registered with search index providers such as Google, but beware that it needs to be accessible in a location above the mount point for the service URLs. By default the files are placed in {context}/sitemaps, but you will need to expose them at a location matching what's set in the sitemap.location.base and sitemap.entry.base properties in order to be trusted by Google. See the Sitemaps.org documentation for details.

    • getMetadataFormatsQueryString

      public String getMetadataFormatsQueryString()
      Generate a comma-separated list of metadata format IDs so generateSitemaps can filter the available objects to just metadata objects.
      Returns:
      (string) List of metadata format ids as a comma-separated string suitable for including in an SQL query. Each value is wrapped in single quotes.