Why, and How, Should Geologists Use Compositional Data Analysis/Summary
Compositional data arise naturally in several branches of science, including geology. In geochemistry, for example, these constrained data seem to occur typically, when one normalizes raw data or when one obtains the output from a constrained estimation procedure, such as parts per one, percentages, ppm, ppb, molar concentrations, etc.
Compositional data have proved difficult to handle statistically because of the awkward constraint that the components of each vector must sum to unity. The special property of compositional data (the fact that the determinations on each specimen sum to a constant) means that the variables involved in the study occur in constrained space defined by the simplex, a restricted part of real space.
Pearson was the first to point out dangers that may befall the analyst who attempts to interpret correlations between Ratios whose numerators and denominators contain common parts. More recently, Aitchison, Pawlowsky-Glahn, S. Thió, and other statisticians have develop the concept of Compositional Data Analysis, pointing out the dangers of misinterpretation of closed data when treated with “normal” statistical methods.
It is important for geochemists and geologists in general to be aware that the usual multivariate statistical techniques are not applicable to constrained data. It is also important for us to have access to appropriate techniques as they become available. This is the principal aim of this book.
From a hypothetical model of a copper mineralization associated to a felsic intrusive, with specific relationships between certain elements, I will show how “normal” correlation methods fail to identify some of such embedded relationships and how we can obtain other spurious correlations. From there, I will test the same model after transforming the data using the CRL, ARL, and IRL transformations with the aid of the CoDaPack software.
Since I addressed this publication to geologists and geoscientists in general, I have kept to a minimum the mathematical formulae and did not include any theoretical demonstration. The “mathematical curios geologist”, if such category exists, can find all of those in a list of recommended sources in the reference section.
So let us start by introducing the model of mineralization that we will be testing.