Section 1.1: Data and Types of Statistical Variables
At the end of this section you should be able to answer the following questions:
- Using plain language, how would you define the concept of statistical data?
- What is a statistical variable?
- What are the main types of statistical variables?
So what are statistical data? The term “statistical data” refers to collections or sets of numerical information. The information is located within “cases” or records for separate individual entities, such as different people. However, cases can occur at multiple levels. Therefore, within data sets, you can have cases for people, groups, or larger entities like organisations or regions. For example, if researchers are analysing population health information in Queensland, Australia, they could have data sets representing cases of individual people, data sets representing cases of information for medical emergency teams, and data sets with cases for specific hospitals in which medical emergency teams are located. They could even arrange all this data for Queensland and compare it to other Australian states like Tasmania or Victoria.
Another important idea is the notion of a statistical variable. A statistical variable is a special type of mathematical variable.
As with all mathematical variables, statistical variables represent a conceptual space in a larger set of concepts. The conceptual space may be an abstract concept like a personality trait or it could be a physical concept such as height or weight. The fundamental properties of statistical variables are: 1) they hold the measurement of a particular value for an individual case, and 2) across all cases in a data set, a variable can possibly take on more than one value. If measurement for a “variable” is limited to only one value then it would not vary or change – and it would not be a variable. In this instance, you would have a constant rather than a variable.
Variables can be used to organise observations about many different concepts related to persons, objects, or groups. For example, variables can measure basic demographic information like gender, through to more complex and abstract information like attitudes or mental states. Basically, statistical data can represent a lot of different things.
There are a number of different types of variables.
Categorical variables are variables for which each possible value represents a different distinct category. For example, gender is categorical in most analyses, with people choosing male, female, or other. Another example of categorical data is type of driver’s license. A person can be on a provisional license, an open license, or have no license. Australian state of residence is another categorical variable and could be recorded as Queensland, Tasmania, or Victoria. Categorical variables may be grouped into collections of categorical data.
In contrast to categorical variables there are also continuous variables. For continuous variables possible responses will fall on a spectrum. For example, age or height would be a continuous variable. Another type of continuous variable would be reaction time to stimuli.
In addition, although we are discussing categorical and continuous variables as separate types, there are responses to variables that are categorical, but are best thought of as continuous. The most obvious of these are responses to a Likert scale. For a Likert scale, a person can respond to a question like ‘how much do you enjoy statistics?’ The responses can range from ‘greatly enjoy’ to ‘greatly dislike’. In this case, each response has its own category, but in practice it is most common to mathematically manipulate the measurements of Likert scale responses as a continuous scale.