Two broad categories:
Probability: each population element has a known, non-zero chance of being included in the sample
Non-probability: cannot mathematically estimate the probability of a population element being included in the sample. The main problem in non-probability samples is that there is no clear/specific sampling frame that can reliably represent the population.
Statistician’s opinion: all N-P samples are worthless because you cannot estimate the degree to which your results are generalizable.
A. Non-probability Samples
Can be used to disprove a hypothesis rather than to prove a hypothesis.
Ex. It is stated that all Republicans/Islamists are pro-death penalty. If a non-probability sample proves the opposite, then we cast some doubt on the generalizability of the hypothesis.
“Accidental samples” -- those in sample are where the data is being collected
One major form in marketing: “Mall Intercept”
What do statisticians think? “Rarely do samples selected on a convenience sample basis, regardless of size, prove representative, and are not recommended for descriptive or causal research.”
I agree, but….
Minimizing drawbacks of convenience samples:
1- compare sample characteristics and findings to those collected on a census/random sample basis
2- speculate intelligently about bias, and how it is likely to have affected results
3- When possible, collect the sample where your population is likely to be (retailers collecting in-store surveys).
4- Cultivate diversity in the sample (e.g. mall intercept using multiple locations)
May be better at understanding relationships between variables than at making descriptive estimates
2- Purposive or Judgment Samples
· Sample elements are hand picked because it is known that they are representative of some population of interest
· Typically a small sample (maybe as small as 10) in which the researcher tries to represent all groups or segments from the population
· Usually useful with elite or people who have a specific experience (For example, soldiers who came back injured from Iraq).
3- Snowball/network design:
· A special form of purposive sample
· Appropriate for small specialized populations
· Each respondent is asked to identify one or more other population members
· Judgment Samples
· Those with more ties to sample members are selected
· Similar people are more likely to be named
4- Quota Sampling
Attempt to be representative by selecting sample elements in proportion to their known incidence in the population
Example: Surveying undergraduate students about campus food services
Step 1: Identify attributes researcher believes is important, e. g. sex and class level
Step 2: Look at incidence of sex and class level in population
Don’t be fooled – It relies on personal, subjective selection of quota attributes.
The sample can still be non-representative with respect to some other characteristic (e.g. in this example, perhaps race)
To sum up:
Non-probability methods are all sampling procedures in which the units that make up the sample are collected with no specific probability structure in mind. This might include, for example, the following:
It is clear that such methods depend on unreliable and unquantifiable factors, such as the researcher's experience, or even on luck. They are correctly regarded as 'inferior' to probability methods because they provide no statistical basis upon which the 'success' of the sampling method (that is, whether the sample was representative of the population and so could provide accurate estimates) can be evaluated.
On the other hand, in situations where the sample cannot be generated by probability methods, such sampling techniques may be unavoidable, but they should really be regarded as a 'last resort' when designing a sample scheme.
B. Probability Sampling
The basis of probability sampling is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included. If we have 100 units in the frame, and we decide that we should have a sample size of 10, we can define the probability of each unit being selected as one in ten, or 0.1 (assuming each unit has the same chance).
Probability sampling does not guarantee representativeness, but does allow for the assessment of sampling error.
Sampling error: error that occurs because a sample rather than a census is used
1- Simple Random Sampling (SRS)
Each sample element has a known, non-zero, equal chance of being selected
Example: Lottery numbers
Or, put everyone’s name in a hat
Major polling firms use random digit dialing to approximate random samples
Or, use a random numbers table such as: http://www.randomizer.org/form.htm
We select the units by random sampling from the frame by assigning each unit a number then use random number tables, or use a computer program to generate random numbers.
94407382 94409687 <======== 93535459 <======== 93781078 94552345 94768091 <======== 93732085 94556321 94562119 93763450 <======== 94127845 94675420 94562119 <======== 93763450 <======== 94127845 94675420
2- Systematic Sampling
Systematically spreads sample through a list of population members
Example: If a population contained 10,000 people, and need a size of 1000, select every 10th list name.
In nearly all practical examples, the procedure results in a sample equivalent to SRS.
Only exception: when there are “regularities” in the list such as the names are ordered according to a specific characteristic such as all even names are males. So all the sample will be males.
We select the first point (the value of r) let us say 2. We then take every third sample after this (2, 5, 8,11, 14). Depending on the size of the sample frame this may (as it does here) produce a sample that is too small or too large by a single unit.
93535459 93781078 <======== 93732085 93763450 93763450 <======== 94407382 94409687 94552345 <======== 94768091 94556321 94562119 <======== 94127845 94675420 94562119 <======== 94127846 94675420
3- Stratified Sampling
Information about subgroups in the sample frame is used to improve the efficiency of the sample plan
Three major reasons to use it:
· Some subgroups are more homogenous than others so fewer numbers are needed for those groups to obtain the same level of precision
· Group comparison is the purpose of the study (disproportionate stratified sampling)
· Some elements are more important in determining outcome of research interest than are others
How is this different from quota sampling?
Within strata, selection of sample elements is random, not first available.
Note: Poststratification is OK. It is done after sampling to correct for MINOR differences between sample and population produced by non-cooperation
Example of disproportionate stratified sampling:
Here we first need to split the population into sub-populations (two in this example, presumably meaningful in the context of the study) and then sample from within those sub-populations. In the example the first sub-population (men) has eleven members, and the second has five (women); so we select four items from the first group (each unit has sampling probability within its own sub-population of 0.275) and two from the second (each unit has a sampling probability of 0.25).
93535459 93781078 93732085 <======== 93763450 93763450 <======== 94407382 93427890 94409687 <======== 94552345 94768091 <======== 94556321 ----------------------------- Women:
94562119 <======== 94127845 94675420 94562119 <======== 94127846
4- Area (or Cluster) Sampling
· Elements are geographically grouped into relatively homogenous clusters (e.g. a city is divided into 40 areas) in the same way the stratified sample is conducted.
· From these areas, 10 are randomly selected
· From these larger areas, blocks within areas will be randomly selected
· Within each block, attempt to survey each household
· Especially useful for door-to-door personal surveys (significantly reduces costs)
However, clustering increases sampling errors (people who live close together tend to be more similar)
An order of preference:
Note on the sample size:
For non-probability samples, it is highly recommended to increase the sample size and to diversify it based on the major cleavages in the population. Yet, a researcher might find it more appropriate to split the sample based on these cleavages. That is to come up with a sample for men and another one for women and so on.
For probability samples, a sample size of 1000 is roughly good enough to get us results within 3% margin of errors.