Some preliminary definitions
- Seroprevalence : Share of the target population affected by a virus
- Serological test : blood test measuring the presence of antibodies reflecting the patient's immunity to a new infection
- Sampling : stage of selection of individuals to be interviewed in the target population
Principles of stratified random sampling
Among the sampling methods available, this method consists first of all in constituting relatively homogeneous strata in a target population. Secondly, to determine samples within each of the strata.
The stratification of a population is constructed from variables of interest chosen according to the study and which can be of several types:
- Demographic variables (age, sex, etc.)
- Geographical variables (regions, departments, municipalities, etc.)
- Sociological variables (marital, family status, etc.)
- Socio-economic variables (professional activity, income, etc.)
The choice of these variables requires that these variables be known in the target population in order to be taken into account in the stage of construction of the strata.
Each stratum must be strictly exclusive, ie an individual can only belong to one stratum.
In the case of a relatively homogeneous stratum, it is possible to have a proportional approach in the choice of the sample for each stratum. In other words, each stratum of the sample will have the same representativeness with respect to the strata of the target population.
One of the objectives is, in what is called statistical inference, to be able to draw conclusions generalized to the target population, from the results of the sample.
Example of a stratified random sampling plan applied to the department of Morbihan
Let us consider the target population, such as all the inhabitants of Morbihan (INSEE census whose data are available in Opendata at the following address: https://www.insee.fr/fr/statistiques/1893198 ), whose main residence is in the department, on the 1st January 2020.
The variables of interest retained are the sexes and the age groups (for 17 age groups of 5 years) of the inhabitants of Morbihan.
We then have 34 strata (2 sexes x 17 age groups) in the target population distributed as follows:
The populations of each of the strata are relatively homogeneous, which ensures that the method is suitable.
Using the population sizes, we can calculate a part of the population for each of the strata.
By applying the same population representativeness to a sample of 1,000 individuals, we then obtain the following stratification:
To measure the desired accuracy, the confidence level, generally set at 95%, is used to calculate a confidence interval . The confidence interval will allow a margin of error to be defined and calculated according to the rules described in the article below:
This sample may constitute the target to be questioned in each of the strata in order to be representative of the target population and constitutes the first phase of the sampling.
The second phase of the sampling consists in choosing the individuals to be interviewed in each of the strata.
The selection of individuals in the sample can be considered as a random selection without replacement: an individual who meets the inclusion criteria is drawn at random, but this individual will not be questioned again.
In the analysis phases, the individuals questioned can be subjected to a questionnaire allowing an enrichment of descriptive variables, potentially used subsequently. This makes it possible to refine the analyzes on sub-populations.
Application to serological tests for the SARS-Cov-2 virus
To conduct a seroprevalence survey of the SARS-Cov-2 virus within the Morbihan department, we can use the above sampling method, representative of the population of the department , in a study including asymptomatic patients.
The measurement of the quality of a test , particularly in epidemiology, is measured by 2 factors:
- sensitivity (ability to give a positive result)
- specificity (ability of a test to give a negative result).
You can find a detailed description of these 2 concepts in the following article:
The individuals will have to be questioned will have to be chosen randomly to reach the numbers of representativeness of each stratum.
The study participants must, respecting the strictest anonymity:
- Carry out a serological test which will be analyzed to validate the presence of immunoglobulin antibodies of type M (IgM), type G (IgG), see type A (IgA).
- Answer a questionnaire allowing data enrichment such as:
vulnerability criteria : > 65 years old, diabetic, suffering from cardiovascular or respiratory diseases, immune weaknesses due to disease or therapy , with active or obese cancer
place of residence : collective or individual
professional activity profession and possibility of exercising this function in time of confinement
All of this information would allow:
- to estimate immunity in a large population,
- to follow the epidemic evolution,
- target risk factors in HIV-positive populations