Data Set Citation:
When using this data, please cite the data package:
Hochberg M , National Center for Ecological Analysis and Synthesis , Cornell H , Nettle D , NCEAS 6640: Hochberg: HumanSocialBehavior , Guégan J , and Choisy M.
Human cultural Diversity - A Cross-national data set
bowdish.246.10 (
General Information:
Title:Human cultural Diversity - A Cross-national data set
A cross-national data set of 21 variables was assembled for 212 countries from three sources (Barro and Lee 1994; Gordon 2005; CIA World Fact Book 2005). Our data set includes several proxy measures for national wealth, cultural diversity, social instability (both at national and international levels), and demography. Separate diversity measures were calculated for three different cultural domains, namely language, religion and ethnic groups . In addition, wealth variables (per capita GDP, and GINI, the coefficient of income inequality) were assembled, along with indicators of societal functioning drawn from the literature (especially Barro and Lee 1994), including indices of political rights (PRIGHTSB), revolutions and coups d'états (REVCOUP), and political instability (PINSTAB). Measures of international conflict were extracted from the social science literature, and the following were used: the proportion of the time between 1960-85 the country was involved in an external war (WARTIME), the number of international disputes in which the country was involved (TOTINTDISP), and an index of total military expenditure (TOTMILITEXP). Possible confounding variables such as population size (POPSIZE) and the number of international borders (NBINTBORDERS) were also included.
  • Community Structure
  • Social Behavior
  • Language
  • Religion
  • Ethnic groups
Data Table, Image, and Other Data Details:
Metadata download: Ecological Metadata Language (EML) File
Data Table:Simpson_ D ( View Metadata | Download File download)
Data Table:Shannon_H ( View Metadata | Download File download)
Data Table:Jaccard_coefficient ( View Metadata | Download File download)
Data Table:Sorenson_coefficient ( View Metadata | Download File download)
Data Table:Dataset.txt ( View Metadata | Download File download)

Involved Parties

Data Set Creators:
Individual: Michael E. Hochberg
Organization:University of California, Santa Barbara
Email Address:
Organization:National Center for Ecological Analysis and Synthesis
Individual: Howard Cornell
Organization:University of California, Davis
Email Address:
Individual: Daniel Nettle
Organization:University of Newcastle
Email Address:
Organization:NCEAS 6640: Hochberg: HumanSocialBehavior
Individual: Jean-François Guégan
Organization:IRD, Montpellier, France
Email Address:
Individual: Marc Choisy
Organization:Centre National de la Recherche Scientifique (CNRS)
Email Address:
Data Set Contacts:
Individual: Michael E. Hochberg
Organization:University of California, Santa Barbara
Email Address:

Data Set Characteristics

Taxonomic Range:
Rank Name:Genus
Rank Value:Homo
Rank Name:Species
Rank Value:Homo sapiens
Common Name:human

Sampling, Processing and Quality Control Methods

Step by Step Procedures
Step 1:  

Alpha diversity indices

For the purposes of this study alpha diversity is the cultural diversity within a country. Past studies have measured cultural diversity as simply the number of cultural groups either corrected or uncorrected for the area under consideration, or the proportion of the population falling into the largest group. In ecological terms, the former would be a richness measure, and the latter a measure of dominance. While these measures have been revealing, there are several more sophisticated indices of diversity that may be more appropriate. These can be found in most basic ecology textbooks, and they take into account not only the number of species in the assemblage but also their relative abundance. Two such indices that are commonly used are Simpson's D and Shannon's H'. Simpson's D is calculated by first determining the proportion pi of the total number of individuals in the assemblage represented by each species i. These values are next squared and summed over all species, S to obtain (please see attached jpg named Simpson's D).

The quanity D is simply the probability that two individuals chosen at random from the assemblage represent two different species. As diversity increases, D decreases, so the index is usually presented as 1 -D and it is what we will consider in our study. Shannon's H' is an index borrowed from information theory. Like Simpson's D, H' requires knowledge of pi. The quantity pi is multiplied by its natural logarithm and then summed over all species, thus (please see attached jpg name Shannon's H').

In our analyses, we calculated both Simpson's 1-D and Shannon's H', to examine the effects of choice of measure on the conclusions reached. We did the calculations for ethnicity and religion (ETHNSIMPS and RELIGSIMPS variables). Language diversity (LANGSIMPS) was calculated from a different source (Gordon 2005), and only for the Simpson Index. Variables ETHNSIMPS and ETHNSHAN are highly correlated (r = 0.99, t = 76.66, df = 157, p < 0.001) and gave the same results in the analyses. This is also the case for RELIGSIMPS and RELIGSHAN (r = 0.99, t = 77.48, df = 174, p < 0.001) and we therefore present results for Simpson's indices only.

Step 2:  

Beta diversity indices

Ecology also has indices that measure the dissimilarity in species composition between two samples or assemblages (called β diversity). To our knowledge, these have never been used in social science despite the fact that such measures may also be useful to social scientists in evaluating regional cultural similarity and how this relates to the degree of external conflict among societies or nation-states. Of these indices, the two most commonly used are based on Jaccard's and Sorenson's coefficients of similarity. Jaccard's coefficient is calculated as PS = a/(a + b + c) x 100 (please see attached jpg titled Jaccard's coefficient), where PS is the percentage similarity between assemblages, a is the number of species shared by the assemblages, b is the number of species in the first assemblage and c is the number in the second assemblage. The index ranges from 0% when no species are shared to 100% when the compositions of the assemblages are identical. As beta diversity increases, PS decreases, thus, in the analyses, we employ 1-PS.

Sorenson's coefficient is calculated as PS = 2a/(b+c) x 100 (please see attached jpg tilted Sorenson's coefficient) and also ranges from 0 - 100%. We calculated Jaccard's and Sorenson's 1-PS indices for linguistics, ethnics and religions. However, without additional assumptions, these β (beta) diversity indices could not be calculated for islands (which do not have contiguous neighbors), and the dataset consequently was reduced to 143 countries. As observed for the two alpha diversity indices, the two β (beta) diversity indices are highly correlated and gave the same results in the analyses: JACLANG/SORLANG : r = 0.99, t = 95.49, df = 146, p < 0.001; JACETHN/SORETHN : r = 0.99, t = 82.58, df = 141, p < 0.001; JACREL/SORREL : r = 0.99, t = 81.08, df = 142, p < 0.001. We therefore only present results for Jaccard's index.

Data Set Usage Rights

Use of this data requires the express written permission of Michael E. Hochberg or Daniel Nettle.
Access Control:
Auth System:knb
Metadata download: Ecological Metadata Language (EML) File