# Learning to make a statistical claim distinction

When working with data there are two primary ways that we can make claims. One is to use data to assert an objective fact, the other is to use statistical methods to make an observation about patterns.

**Click on the numbered tabs to navigate this lesson.**

For example, we might state that “there are 36 states in Nigeria”. This is a fact which can be directly verified and backed up by data.

On the other hand, if we say “over 60% of Nigerians have access to electricity”, we are making a claim which can be proven true or false based on statistical analysis and hypothesis testing.

# What is a statistical claim?

A statistical claim can be:

- A claim about the percentages of people
- A claim of, relating to, based on, or employing the principles of statistics and statistical analysis

In other words, it is a conclusion based on collected and analysed numerical data in large quantities. So when we make a claim—a statistical statement—it has to be about a group (or groups), and it has to talk about something that varies within the group. It also has to use language that recognises that variability.

In statistics, a **hypothesis** is a claim or statement about a property of a population.

# How do we make statistical claims?

As we saw in Lesson 2, the gold standard for data gathering is to collect data from a sample of a population. If the sample is large enough, and randomly collected so that it is not disproportionately biased towards any particular part of the population, we can use that sample to make claims about the population as a whole.

In statistics, a **population** is a complete collection of whatever we are studying. This could be people, but might also be studying a population of economic transactions, or molecules in a particular substance, or all the stars in the sky. In a large population of anything which is too big to easily count or measure, we use statistical sampling and methods to make inferences about the whole population.

Anyone can make a claim or hypothesis about a population, but backing a claim up is critical. Otherwise, it may just be a belief which is true or false.

# Important statistical terms

When we describe the distribution of a particular variable within a population, we use statistical quantities like the **minimum**, the **maximum**, or measures of center and average such as the **mean** or **median**. As well as categorical attributes like **proportions** or **percentages**; or words about proportions, such as **most**, which means “more than 50%”.

The chart above shows the distribution of the population of Nigeria by age as visualised by PopulationPyramid.net. We can visually see that Nigeria has a young population, with greater numbers of people in the younger age brackets.

Given a statement such as “most Nigerians are younger than 19”, or “three-quarters of Nigerians are under the age of 34”, we can use the data above to support the claim.

# Recognising variability

The key difference between an observable fact about a population like “there are 36 states in Nigeria” and a statistical claim is that a statistical claim recognises variability. This may be through a **confidence interval **(see Lesson 2), or by reporting the limitations of the data. When we talk about the distribution of age groups in Nigeria, we have to acknowledge the limits of the data. Not all births and deaths are recorded, for example, so these are estimates based on datasets that may be sampled or as only as complete as can be given the circumstances.

A **hypothesis** is a claim made using data that we can use to test its validity. For example, if we claim that 75% of the Nigerian population is under the age of 34, we could test our hypothesis by conducting a random sample of the population and seeing if the distribution of ages matches our data.

We call the process of making observations and claims about a population based on sampling **statistical inference**.