Data Sources
Data were mined from the US National Center for Biotechnology Informationâs GenBank. DNA sequences of Symbiodinium were used to organize the organisms into clades and types based on partial 5.8S, complete ITS2 and partial 28S rDNA, referred to here on as ITS2 types. For Symbiodinium ITS2 sequences, GenBank possessed a multitude of redundant entries, often incomplete descriptions of associated attributes and mismatched fields in the submitted data. Comparison of ITS2 sequences from 79 published studies identified redundancies between records with synonymous sequences but different ITS2 type nomenclature. Identical sequences (i.e. 100% residue
similarity) with different accession numbers were identified as synonyms with the first published record as the âparentâ accession number. The source literature identified in the GenBank record was then searched to confirm or ascertain the following descriptive characteristics for each sequence: host taxa, location, collection year and laboratory methodology. The accurate mapping of the location of Symbiodinium occurrences required manual data mining of the primary literature sources identified in each GenBank accession, with a cross-check of location in GEOnet Names (http://earth-info.nga.mil/gns/html) and Google Earth (http://www.google.com/earth).