All Collections
The MODELS
Data QUALITY
Interval and confidence level
Interval and confidence level

Definitions, formulas and limitations: what you need to know

Maxime LE MOIGNIC avatar
Written by Maxime LE MOIGNIC
Updated over a week ago

When they do not have the census of the points of sale that they cover, OpenHealth and its European partners allow their customers to follow modeled data on a national basis, ie that is, extrapolated from a sample of points of sale. While these extrapolated data provide our users with a very solid basis for their market analyzes, it nevertheless carries a margin of statistical uncertainty, the magnitude of which depends on several factors detailed below.

Definitions

Confidence interval:

A confidence interval frames a real value that we seek to estimate using measurements taken by a random process. This concept makes it possible to define a statistical uncertainty margin.

Confidence level:

A confidence level represents the level of certainty and is expressed in%. A 95% confidence level is most commonly used in statistical studies.

Factors impacting the size of the interval for a given confidence level

There are 4 factors that determine the size of the confidence interval for a given confidence level:

  • Sample size

  • The percentage

  • The size of the population

  • The time period

The size of the sample

The larger the sample size, the more the results will truly reflect the population. This indicates that for a given confidence level, the larger the sample size, the smaller the confidence interval. However, the relationship is not linear (ie, doubling the sample size does not halve the confidence interval).

The percentage

Precision also depends on the percentage of the sample that chooses a particular answer. If 99% of the sample answered "Yes" and 1% answered "No", the chances of statistical uncertainty are low, regardless of the sample size. However, if the percentages are 51% and 49%, the chances of statistical uncertainty are much greater. Extreme responses are easier to be sure than intermediate responses.

The size of the population

Population size is only likely to be a factor when working with a relatively small population.

The time period

The Selling Digital Distribution will depend on the time period studied. A DNV will be lower daily and therefore greater uncertainty.

Sample size formula

  • Z = Z value (eg 1.96 for 95% confidence level)

  • p = percentage picking a choice, expressed as decimal (.5 used for sample size needed)

  • c = confidence interval, expressed as decimal (eg, .04 = ± 4)

Correction formula for the finite population

Limitations

Confidence interval calculations assume that you have a true random sample of the affected population.

If your sample is not truly random, you cannot trust the intervals.

illustrations

For mainland France except Corsica:

  • If my product has a DNV of 100% and extrapolated sales of 100 units, a confidence interval of 0.68% means that there is 95% of lucky that my actual sales are between 99.32 units and 100.68 units. The uncertainty is low.

  • If my product has a DNV of 1% and extrapolated sales of 100 units, a confidence interval of 9.05% means there are 95 % chance that my actual sales are between 90.95 units and 109.05 units. The uncertainty is greater.

Did this answer your question?