Comparing Communities: Using β-diversity and similarity/dissimilarity indices to measure diversity across sites, communities, and landscapes
Introduction to Diversity
There are many different ways to measure biological diversity, and at different spatial scales.
Biological diversity, within an ecological context, is the
different type of species and their abundances at a given scale. However, many investigators are not able to
gather information on species’ abundances and they are only able to obtain presence/absence
(incidence) data. The number of species,
without knowing the abundances of those species, is usually referred to as species richness.
This webpage is about how to compare species diversity (using only species richness or species richness and abundance) among sites/habitats/communities. Some people call this beta diversity and others call it complementarity, turnover, similarity, dissimilarity, etc. Often, these terms are used interchangeably and at a variety of spatial scales. Methods for making these comparisons are used in other fields of science but we will only address ecology-based methods here.
Alpha, beta, gamma diversity- α, β, and γ diversity
Many researchers use these terms and they are part of the ecological vocabulary.
Alpha diversity is usually thought of as biological diversity at one site or sampling location. γ diversity is often thought of as regional/landscape diversity, or the entire diversity of the area in which one is sampling multiple α diversities. β diversity is generally thought of as the change in diversity among various α diversities. Historically, and often today, these terms only applied to measures of species richness. β diversity was originally introduced by Whittaker to describe changes in species composition and abundance across environmental continua such as gradients, elevation and moisture (1956). According to Whittaker, each plant species exhibited an individualistic response to environmental conditions. β diversity was then thought of as the change in the number of species from one place to another place along a gradient (1956, 1960), and later defined as “species turnover” or changes in specie composition from one community to another (1972). β diversity was incorporated into the ecological language to mean a change in species composition/abundance along a gradient and across different sites.
These categories of diversity are related to each other in this way:
In general, the larger the scale of the inventory/study, the less easy it is to measure species abundance and the more likely it is to only use species richness or higher taxon diversity. Investigators define their levels of diversity in different ways. Some treat α diversity as one sample whereas others treat α diversity as a 100m x 100m plot. Also, it’s difficult to transfer terrestrial terminology to marine systems.
Leaving behind estimates of α diversity
There are many ways to measure α diversity and that
topic needs another webpage. Some of the
classic ways to measure α diversity are; species
richness, Simpson Index,
For every organism and every study, the scale can be different, both in terms of the distributions of the organism and the geographical range of the study. In addition, habitat complexity can affect your ability to compare diversity across sites/habitats/communities.
According to Magurran (2004), there are 3 general categories for measuring β diversity. Most of these are based on presence/absence data.
1. Methods that examine the extent of the difference between two or more areas of α diversity relative to γ diversity, where γ diversity is measured as total species richness. These measures were originally and explicitly proposed as the measures of β diversity.
Alpha diversity is averaged across all sites/habitats/communities and the average α is used to compute β diversity.
i.e. Whittaker (1960) and Lande (1996)
Whittaker suggested a variety of metrics to describe β diversity, but the one that seemed to stick the most, and was not related to his gradient-definition of β diversity, was what became most commonly used.
Whitaker: βw = S/
where S = the total number of species recorded in the system (i.e. γ diversity); α = the average sample diversity, where each sample is a standard size and diversity is measured as species richness. He arrived at this equation by reasoning that if you know the average diversity within a set of communities or samples (α diversity), you could find the total diversity represented by all samples by multiplying the average diversity by the number of communities or samples. γ = α x β. In this way, he recognized the number of communities as a measure of β diversity. However this overestimated γ diversity when communities or samples shared species. Investigators using this equation often had estimates of the α and γ diversity and they just rearranged the equation to solve for β diversity. Interestingly, Whittaker and other ecologists more often used different metrics to look at species turnover but this equation stuck with the ecology discipline.
A number of other corrections and modifications to Whittaker’s multiplicative measure of β-diversity were made. Some of these modifications incorporate species loss and gain along a transect. See Magurran (2004) and Jost (2007) for these modifications.
Around the same time as Whittaker’s definition of β diversity took hold, in the early 1960’s, McArthur and then Levins formulated an additive partitioning of diversity. But it wasn’t officially taken up by ecologists because they didn’t express their diversity in terms of α, β, and γ and their measures weren’t developed analytically. A few other researchers used the additive partitioning of diversity but it went relatively unnoticed.
Lande, in 1996, appears to be the first person to place the additive partitioning of species idea into the context of α, β, and γ diversity.
Dβ = α + Dβ
When species richness is used to measure α and γ, β diversity can be estimated as follows:
Where ST = the richness of the landscape (γ diversity); Sj = the richness of assemblage j; and qj = the proportional weight of assemblage j based on its sample size or importance.
This method can also be adapted for the Shannon and Simpson diversity measures. See Lande (1996).
Lande’s approach contrasts with that of Whittaker because α and β are added to produce γ diversity, as opposed to multiplied.
Lande’s additive partitioning treats α diversity as the
average within-sample diversity, regardless of how diversity is measured (i.e.
Many small sampling units will yield low values of α diversity and high values of β diversity, while larger, and fewer samples will yield the opposite; higher α diversity and lower β diversity. β diversity will increase in heterogeneous landscapes where few species are shared in samples. β diversity will decrease in homogeneous landscapes where species’ composition in sampling units approaches complete identity.
Since Lande, other ecologists have been using various versions of his classic formula to calculate β diversity with additive partitioning. See Magurran (2004) and Jost (2007)
2. Methods that examine the differences in species composition and/or diversity among areas of α diversity. These methods are formulated as measures of similarity/dissimilarity and/or complementarity. These measures evaluate the distinctness of assemblages and are used in applied contexts. (We will refer to all of these as similarity measures or indices)
Pairwise comparisons of alpha diversities are made between all pairs of sites/habitats/communities in the study area.
Some people, including Magurran, say that complementarity (dissimilarity) is another way of saying β diversity-the more complementary two sites are, the higher their β diversity. Other investigators don’t like to apply the term “β diversity” to these other measures.
Most similarity measures combine 3 variables. For the purpose of examples, lets use: a, the total number of species present in both samples; b, the number of species present only in sample 1; and c, the number of species present only in sample 2. (Of course, any coefficients can be substituted into the equations). One of the easiest and most intuitive methods to describe similarity between pairs of sites is to use a similarity/dissimilarity coefficient. A large number of measures exist and only the most common ones are shown here:
Jaccard (1908): and the Marczewski-Steinhaus (MS) distance
The MS dissimilarity measure (1-similairty) is know as a metric (as opposed to a non-metric) measure. This means that is satisfies certain geometric requirements and it can be treated as a distance measure which can be used in ordination.
Another popular measure is the Sorenson similarity measure. Sorenson’s measure is regarded as one of the most effective presence/absence similarity measures. It is identical to the Bray-Curtis presence/absence coefficient.
Sorensen’s similarity measure (1948)
Bray-Curtis Distance or dissimilarity (1957)
Lennon et al. (2001) noted that if samples differ greatly in terms of their species richness, Sorenson measures will always be large. Their measure works better for largely varying values in species richness:
Bsim = 1-
using the smallest values of b and c in the denominator to reduce the impact of imbalances in species richness.
These similarity measures mentioned above are great because they are simple. However, they don’t take into account the relative abundance of species. A species that dominates an assemblage doesn’t carry more weight in a presence/absence similarity measure than one species represented by only one individual. Due to this problem, people have been developing similarity measures with quantitative diversity data.
A modified of version (Bray-Curtis 1957) of the Sorenson’s measure, which is sometimes called the Sorenson’s quantitative index or the Bray-Curtis index (Magurran 1988)
Where Na = the total number of individuals in site A; Nb = the total number of individuals in site B; and 2jN = the sum of the lower of the two abundances for species found in both sites.
If 12 individuals of a species were found in site A, and 29 individuals of the same species were found in site B, the value 12 would be included in the summation to produce jN. This index has been considered to be very satisfactory (Clarke and Worwick 2001a)
Wolda (1981) investigated a range of quantitative similarity indices and found that only one, the Morisita-Horn index, was not strongly influenced by species richness and sample size. However, the M-H index is sensitive to the abundance of the most abundant species.
S = total number of species at both sites
aN = total number of individuals of all species collected at site A
bN = total number of individuals of all species collected at site B
bni =number of individuals of the ith species collected at site B
-the denominator terms are:
Wolda then made modifications of the M-H measure to reduce the bias of the most abundant species.
Where Na = the total number of individuals at site A; Nb = the total number of individuals at site B; ai = the number of individuals in the ith species in A; bi = the number of inviduals in the ith species in B; and
The M-H measure is widely used.
Another simple measure is percentage similarity (Southwood & Henderson 2000; after Whittaker 1952).
P = 100-0.5
Where Pai and Pbi = the percentage abundances of species I in samples a and b, respectively; and S = the total number of species.
A large review and evaluation was carried out (Smith 1986) and both qualitative and quantitative similarity measures were used. The best proved to be the Sorensen quantitative index and all of the presence/absence (qualitative) measures proved unsatisfactory. However, Smith advised that the choice of index for any particular study depends on the goals of the investigator and the form of the data. She also concluded that Wolda’s version of the M-H was also very good (Magurran 2004).
Many qualitative and quantitative similarity/dissimilarity measures exist and only the most commonly used measures are described here. Please see the bibliography for more details on these measures.
3. The Third group of measures exploit the species-area relationship and measure turnover related to species accumulation with area.
The slope in the relationship between species richness and area can also be considered as a measure of turnover in areas that are nested subsets. I don’t have as much information on this type of measure for β diversity because it is not commonly used in the current ecological literature and many people don’t have data that would allow them to use these types of analyses. See Harte et al. (1999b), Lennon et al.(2001), Ricotta et al. (2002), Connor and McCoy (1979).
Problems with above measures
The measures listed above make the assumption that the sites being compared have been completely inventoried, which is most often NOT the case.
Scale and habitat heterogeneity affect estimates of β diversity, complementarity, and similarity/dissimilarity
Most sites have not been thoroughly sampled, and similarity/dissimilarity, do to statistical properties (Colwell and Coddington 1994), is more likely to be overestimated between rich samples than between species-poor samples unless sampling effort is sufficiently-large throughout or has been proportionally increased for species-rich sites (Magurran 2004).
Quantitative measures that do include species abundance data are often biased by species that are dominant in the samples or have disproportionately high abundances compared to that of other species in the samples.
Even though we are not discussing measurements of alpha diversity here, it’s very important to note that “coefficients of community similarity inherit the statistical sampling properties of the diversity measures on which they are based” (Lande 1996) Many of these alpha diversity measures are biased, thus directly affecting one’s ability to accurately compare diversity across sites/habitats/communities.
New measures to compare diversity among site/communities/landscapes
Anne Chao and colleagues are developing new techniques to estimate the number of species that two communities have in common, for both presence/absence and abundance data. Her techniques make estimates based on the number of rare species to predict the number of unobserved shared species. The number of abundant species is then added to this. These similarity measures are new and include probabilistic methods, and will hopefully prove more accurate and useful in future studies. See Chao et al. (2005), Chao et al. (2000), Chao et al. (2006), Chao et al. (2008).
Lou Jost, who has also been working with Anne Chao on this
topic, has recently published a paper, “Partitioning diversity into independent
alpha and beta components” (2007). He
presents a new definition of beta diversity and says that
Though all of the similarity/dissimilarity measures mentioned above (including those not mentioned here) have biases, they can be useful when attempting to compare species diversity (richness and/or abundance) among sites/communities. Regarding the selection of one particular similarity index over another, it depends on the question and the data. Below, are some suggestions regarding which similarity indices to use and when following Chao et al. (2005), Chao et al. (2006), and Magurran (2004). Please consult these references for more details.
For exhaustive sampling, with only presence/absence data, the classic Jaccard and Sorensen provide simple overlap measures to compare two species list. Chao provides bias-corrected formulas for these measures when sample sizes are unequal or insufficient, but abundance data is necessary for Chao’s corrected measures.
For abundance data, the Bray-Curtis (or quantitative Sorensen index) and the Morisita or Morisita-Horn indices are seen frequently in the ecological literature. However, Chao et al. state that the Bray-Curtis measure only works well when sampling fractions are know to be equal (species assemblages must be assumed to have the same total number of individuals susceptible to sampling) and this is difficult to establish for field conditions (2006). The authors state that his index only works “satisfactorily” with equal and sufficient sampling. Chao et al. acknowledge that the Morisita-Horn index is not strongly sensitive to sampling sizes and species richness, but it is highly sensitive to the most abundant species (2006). Chao et al. have created new probability-based indices that reduce undersampling bias by estimating and compensating for the effects of unseen, shared species. They show their indices to consistently reduce undersampling bias (2005, 2006).
An important thing to remember is that the classic abundance-based indices, for example, the Bray-Curtis and Morisita-Horn indices, match abundances species-by-species. These two indices primarily measure similarity in the composition of dominant species so they are affected by dominant species and may ignore the effect of rare species. The new indices created by Chao et al. assess the probability that individuals belong to shared vs. unshared species, without regard to which species they belong to. These indices are formulated by pooling shared abundances but detailed species by species composition is not taken into account. However, Chao et al. suggest that researchers consider carefully the aim of their study and questions, regarding the meaning of “similarity” or “overlap”and taking into account the limitations of their data. The authors recommend their new similarity indices for any application in which species matching and similarity of relative abundance are in question (2005).
Let’s make the assumption that the correct number of shared species has been estimated and scaling issues and species richness differences among communities have been accounted for. How do you make comparisons between communities using β diversity and/or similarity/complementarity measures that were reviewed above?
A classic representation of a similarity matrix
(Azevedo-Ramos & Galatti 2002)
Correlations: Correlations between the amount of similarity/dissimilarity in diversity among sites correlated with distance among sites.
(Novotny et al. 2008)
Ordination: Ordination is often used to describe the
relationship between a set of samples or localities based on their attributes
(i.e. presence or abundance of species found at different sites). PCA and nonmetric multidimensional scaling
are widely used ordination techniques (Clarke and
(Cleary & Mooers 2004)
Cluster Analysis: Similarity or dissimilarity/distance measures are used to measure the distance (based on species composition) between all pairs of sites. Presence/absence or abundance data can be used. The two most similar sites are formed into one cluster and the analysis proceeds by successively clustering similar sites until a single dendrogram is formed. There are many techniques for deciding how sites should be joined into clusters and how clusters show be combined with each other. Depending on the method, the distance between nodes on the dendrogram may represent β diversity. Bootstrap values can be added to dendrograms to indicate robustness of analysis. Bootstrap values inform you of the percentage of times a tree reconstructed using a resampling algorithm would exhibit the same pattern.
Analysis of Similarity, ANOSIM: This is a nonparametric test applied to the rank similarity matrix. It is somewhat analogous to a standard univariate ANOVA, and tests a priori-defined groups against random groups in ordinate space. The analysis uses a permutation procedure and tests the null hypothesis that there is no difference in community composition among sites. Significance levels are generated using a randomization approach. See Clarke & Gorley (2001).
Mantel Test: This is a mulivariate measure that evaluates the null hypothesis of no relationship between two similarity matrices.
(Su et al. 2004)
Comparing distributions of pairwise similarity measures:
For example, investigators examined patterns of β
diversity of pollution in freshwater fish assemblages in
(Jennings et al. 2008)
Turnover in time
Some investigators also use the α,
β, γ diversity definitions to measuring species diversity through
time. For additional details on this application, see
Here is a limited list of software that perform different types of similarity/dissimilarity/complementarity analyses and allow you to compare communities with a variety of methods
PAST: http://folk.uio.no/ohammer/past/ (free)
EstimateS: http://viceroy.eeb.uconn.edu/estimates (free)
SPADE: http://chao.stat.nthu.edu.tw/indexE.html (free)
EcoSim: http://www.garyentsminger.com/ecosim/index.htm (free)
R: http://www.r-project.org/ (free)
All metrics for determining β diversity and measuring similarity/dissimilarity/complementarity have biases and issues. In general, the classic equations of Whittaker and Lande for determining β diversity are used theoretically and rarely seen in applied contexts. The most common presence/absence and abundance-based similarity metrics are used in applied contexts and are often called β diversity metrics, though they are not in the “classic sense”. There are numerous similarity/dissimilarity measures and there is no measure that is superior above all others. In addition, there are numerous applications of these indices when attempting to compare communities. The selection of particular similarity measures and then methods by which to compare these measures across sites, habitats, and communities, depends on the type of data, type of questions, and ability to access various software packages. A popular and recent use of similarity measures has been in the context of conservation-especially when attempting to choose habitats/communities/ecoregions as future preserves. However, it is somewhat unclear as to how all of these similarity metrics truly aid in the process of selecting protected areas, largely due to their inherent biases, difficulty in interpreting the results in a biologically meaningful manner, and the fact that there are no “standard” measures that are commonly used across the board.
In addition to references cited throughout the body of this webpage, here are more references…
Classic papers and annotated bibliographies:
Legendre, P. & L. Legendre. 1998. Numerical Ecology, 2nd
English Edition. Elsevier Scientific Publishing Company,
Magurran, Anne E. 2004. Measuring
Biological Diversity. Blackwell Publishing.
Pielou, E.C. 1984. The Interpretation of
Ecological Data: A Primer on Classification and Ordination. Wiley,
Selected theoretical studies on measures of β-diversity and similarity
Harte, J., McCarthy, S., Taylor, K., Kinszig, A., & M. Fischer. 1999. Estimating species-area relationships from plot to landscape scale using species spatial-turnover data. Oikos 86: 45-54
Chao, A., Hwang, W., Chen, Y. & C-Y Kuo. 2000. Estimating the number of shared species in two communities. Statistica Sinica 10: 227-246
Chao, A., Chazdon, R.L., Colwell, R.K., & T. Shen. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8: 148-159
Chao, A., Chazdon, R. L., Colwell, R.K., & T. Shen. 2006. Abundance-based similarity indices and their estimations when there are unseen species in samples. Biometrics 62: 361-371
Chao, A., Jost, L.,
Diserud, O., & F. Ødegaard. 2007. A multiple-site similarity measure. Biology letters 3: 20-22
Jost, L. 2006. Entropy and diversity. Oikos 113: 363-375
Jost, L. 2007. Partitioning diversity into independent alpha and beta components. Ecology 88: 2427-2439
Lande, R. 1996. Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos 76: 5-13
Pelissier, R., & P. Couteron. 2007. An operational, additive framework for species diversity partitioning and beta-diversity analysis. Journal of Ecology 95: 294-300
Plotkin, J., & H.C. Muller-Landau. 2002. Sampling the species composition of a landscape. Ecology 83: 3344-3356
Wolda, H. 1981. Similarity indices, sample size, and diversity. Oecologia 50: 296-302.
Veech, J.A., Summerville, K.S., Crist, T.O., & J.C. Gering. 2002. The additive partitioning of species diversity: Recent revival of an old idea. Oikos 99: 3-9
Velland, M. 2001. Do commonly used indices of β-diversity measure species turnover? Journal of Vegetation Science 12: 545-552
Selected empirical studies using measures of similarity and β-diversity
Benedick, S., Hill, J.K., Mustaffa, N., Chey, V.K., Maryati, M., Searle,
J.B., Schilthusizen, M., & K.C. Hamers. Impacts of rain forest fragmentation on butterflies
Clarke, K.R. & Warwick, R.M. 2001a.
Change in marine communities: an approach to statistical analysis and
interpretation, 2nd edn.
Cleary, D.F., & A.Ø. Mooers.
2004. Butterfly species richness and community composition in forests affected
Clough, Y., Holzchuh,
A., Gabriel, D., Purtauf, T., Kleijn,
D., Krusess, A., Steffan-Dewenter,
Cornell, H.V., Karlson, R.H. & T.P. Hughes. 2007. Scale-dependent variation in coral community similarity across sites, islands, and island-groups. Ecology 88: 1707-1715
Correa, S.B., Cramption, W.G.R., Chapman, L.J. & J.S. Albert. 2008. A comparison of flooded forest and floating meadow fish assemblages in an upper Amazonian floodplain. Journal of Fish Biology 72: 629-644
Ghazoul, J. 2002. Impact of logging on the richness and diversity of forest butterflies
in a tropical dry forest in
Grimbacher, P.S., Catterall, J.K., & H.C. Proctor. 2007. Response of ground-active beetle assemblages to different styles of reforestation on cleared rainforest land. Biodiversity Conservation 16: 2167-2184
Hardy, O.J., & B. Senterre. 2007. Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity. Journal of Ecology 95: 493-506
Jobe, R.T. 2008. Estimating landscape-scale species richness: Reconciling frequency-and turnover-based approaches. Ecology 89: 174-182
La Sorte, F.A.,
Su, J.C., Debinski,
Wolda, H. 1983. Diversity, diversity indices and tropical cockroaches. Oecologica 58: 290-298.
Yanoviak, S.P., Fisher, B.L, & A. Alonso. 2007. Arboreal ant diversity (Hymenoptera: Formicidae) in a central African forest. African Journal of Ecology 46: 60-66