Statistics/Different Types of Data
Data are assignments of values onto observations of events and objects. They can be classified by their coding properties and the characteristics of their domains and their ranges.
Identifying data type
[edit | edit source]When a given data set is numerical in nature, it is necessary to carefully distinguish the actual nature of the variable being quantified. Statistical tests are generally specific for the kind of data being handled.
Data on a nominal (or categorical) scale
[edit | edit source]Identifying the true nature of numerals applied to attributes that are not "measures" is usually straightforward and apparent. Examples in everyday use include road, car, house, book and telephone numbers. A simple test would be to ask if re-assigning the numbers among the set would alter the nature of the collection. If the plates on a car are changed, for example, it still remains the same car in reality.
Data on an Ordinal Scale
[edit | edit source]An ordinal scale is a scale with ranks. Those ranks only have sense in that they are ordered, that is what makes it ordinal scale. The distance [rank n] minus [rank n-1] is not guaranteed to be equal to [rank n-1] minus [rank n-2], but [rank n] will be greater than [rank n-1] in the same way [rank n-1] is greater than [rank n-2] for all n where [rank n], [rank n-1], and [rank n-2] exist. Ranks of an ordinal scale may be represented by a system with numbers or names and an agreed order.
We can illustrate this with a common example: the Likert scale. Consider five possible responses to a question, perhaps Our president is a great man, with answers on this scale
Response: | Strongly Disagree | Disagree | Neither Agree nor Disagree | Agree | Strongly Agree |
---|---|---|---|---|---|
Code: | 1 | 2 | 3 | 4 | 5 |
Here the answers are a ranked scale reflected in the choice of numeric code. There is however no sense in which the distance between Strongly agree and Agree is the same as between Strongly disagree and Disagree.
Numerical ranked data should be distinguished from measurement data.
Measurement data
[edit | edit source]Numerical measurements exist in two forms, Meristic and continuous, and may present themselves in three kinds of scale: interval, ratio and circular.
Meristic or discrete variables are generally counts and can take on only discrete values. Normally they are represented by natural numbers. The number of plants found in a botanist's quadrant would be an example. (Note that if the edge of the quadrant falls partially over one or more plants, the investigator may choose to include these as halves, but the data will still be meristic as doubling the total will remove any fraction).
Continuous variables are those whose measurement precision is limited only by the investigator and his equipment. The length of a leaf measured by a botanist with a ruler will be less precise than the same measurement taken by micrometer. (Notionally, at least, the leaf could be measured even more precisely using a microscope with a graticule.)
Interval Scale Variables measured on an interval scale have values in which differences are uniform and meaningful but ratios will not be so. An oft quoted example is that of the Celsius scale of temperature. A difference between 5° and 10° is equivalent to a difference between 10° and 15°, but the ratio between 15° and 5° does not imply that the former is three times as warm as the latter.
Ratio Scale Variables on a ratio scale have a meaningful zero point. In keeping with the above example one might cite the Kelvin temperature scale. Because there is an absolute zero, it is true to say that 400°K is twice as warm as 200°K, though one should do so with tongue in cheek. A better day-to-day example would be to say that a 180 kg Sumo wrestler is three times heavier than his 60 kg wife.
Circular Scale When one measures annual dates, clock times and a few other forms of data, a circular scale is in use. It can happen that neither differences nor ratios of such variables are sensible derivatives, and special methods have to be employed for such data.