Cluster Sampling, Idea of two stage sampling and Multi stage sampling

2.2 Cluster Sampling

It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite number of distinct and identifiable units, called sampling units. The smallest units into which the population can be divided are called elements of the population. The groups of such elements are called clusters.

In many practical situations and many types of populations, a list of elements is not available and so the use of an element as a sampling unit is not feasible. The method of cluster sampling or area sampling can be used in such situations.

In cluster sampling,

-Divide the whole population into clusters according to some well-defined rule.

-Treat the clusters as sampling units.

-Choose a sample of clusters according to some procedure.

-Carry out a complete enumeration of the selected clusters, i.e. collect information on all the sampling units available in selected clusters.

Area sampling

In case, the entire area containing the populations is subdivided into smaller area segments and each element in the population is associated with one and only one such area segment, the procedure is called as area sampling.

Examples:

1. In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even maybe not available but a list of all the houses in the city may be available. So every individual person will be treated as sampling unit and every house will be a cluster.

2. The list of all the agricultural farms in a village or a district may not be easily available but the list of village or districts are generally available. In this case, every farm is sampling unit and every village or district is the cluster.

It is easier, faster, cheaper and convenient to collect information on clusters rather than on sampling units

In above examples, draw a sample of clusters from houses/villages and then collect the observations on all the sampling units available in the selected clusters.

Real life situations where Cluster Sampling is used

1. Geographical areas: When the population is spread over a large geographical region and it is difficult to reach every individual, like conducting household surveys across cities.

2. Educational studies: Sampling schools (clusters) rather than individual students for educational assessments.

3. Healthcare: For health surveys, clusters can be formed from hospitals or regions, and then surveys are conducted within selected clusters.

4. Market research: When targeting a specific demographic spread over various localities, a company might sample entire communities.

Conditions under which the cluster sampling is used:

Cluster sampling is preferred when

i) No reliable listing of elements is available, and it is expensive to prepare it.

ii) Even if the list of elements is available, the location or identification of the units may be difficult

iii) A necessary condition for the validity of this procedure is that every unit of the population under study must correspond to one and only one unit of the cluster so that the total number of sampling units in the frame may cover all the units of the population under study without any omission or duplication. When this condition is not satisfied, bias is introduced.

Construction of clusters:

The clusters are constructed such that the sampling units are heterogeneous within the clusters and homogeneous among the clusters.

This is opposite to the construction of the strata in the stratified sampling. There are two options to construct the clusters equal size and unequal size. We discuss the estimation of population means and its variance in equal size case.

Key Differences Between Stratified and Cluster Sampling:

Stratified Sampling:

In this method, the population is divided into strata (groups) based on shared characteristics, and these strata are internally homogeneous but externally heterogeneous. This means that the members within a stratum are similar, but different strata differ from each other. The idea is to improve precision by ensuring each stratum represents its subgroup of the population.

Cluster Sampling:

In cluster sampling, the population is divided into clusters, which are internally heterogeneous but externally homogeneous. That is, the elements within a cluster are diverse (representing the population), but clusters as a whole are similar to each other. The aim is to sample a few clusters that act as mini-populations representing the entire group.

Stratified sampling focuses on reducing variance and ensuring representativeness of each subgroup. Cluster sampling focuses on reducing costs and making data collection more practical.

Notation in Cluster Sampling (equal cluster size)

N: Total number of clusters in the population.

n: Number of clusters selected for the sample.

M: Number of elements within each cluster (assuming equal cluster sizes).

Yij: j^th element in the i^th cluster in population (j=1, 2, .. , M ; i=1,2,…,N)

yij: j^th element in the i^th cluster in sample (j=1, 2, .. , M ; i=1,2,…,n)

Population size = NM, Sample size = nM