Sample Types

Two broad categories:

Probability:  each population element has a known, non-zero chance of being included in the sample

Non-probability:  cannot mathematically estimate the probability of a population element being included in the sample. The main problem in non-probability samples is that there is no clear/specific sampling frame that can reliably represent the population. 

 

Non-Probability Probability
Convenience

 Purposive or Judgment Samples

Snowball/network design:

Quota Sampling

Simple Random Sampling (SRS)

Systematic Sampling

Stratified Sampling

Area (or Cluster) Sampling

 

 Core differences:

Key terms Non-probability samples Probability samples
Sampling frame Does not exist or inaccurate Accurate and up-to-date
Sampling error Cannot be calculated Can b calculated
Sample size Matter of convenience Determined by sampling theory
Level of generalizability Illustrative Representative.

 

Statistician’s opinion:  all N-P samples are worthless because you cannot estimate the degree to which your results are generalizable.

 

 

A. Non-probability Samples

Can be used to disprove a hypothesis rather than to prove a hypothesis.

Ex. It is stated that all Republicans/Islamists are pro-death penalty. If a non-probability sample proves the opposite, then we cast some doubt on the generalizability of the hypothesis.

1- Convenience:

 “Accidental samples” -- those in sample are where the data is being collected

One major form in marketing:  “Mall Intercept”

 

What do statisticians think?  “Rarely do samples selected on a convenience sample basis, regardless of size, prove representative, and are not recommended for descriptive or causal research.”

I agree, but….

Minimizing drawbacks of convenience samples:

1- compare sample characteristics and findings to those collected on a census/random sample basis

2- speculate intelligently about bias, and how it is likely to have affected results

3- When possible, collect the sample where your population is likely to be (retailers collecting in-store surveys).

4- Cultivate diversity in the sample (e.g. mall intercept using multiple locations)

May be better at understanding relationships between variables than at making descriptive estimates

 

2- Purposive or Judgment Samples

·        Sample elements are hand picked because it is known that they are representative of some population of interest

·        Typically a small sample (maybe as small as 10) in which the researcher tries to represent all groups or segments from the population

·        Usually useful with elite or people who have a specific experience (For example, soldiers who came back injured from Iraq).

 

3- Snowball/network design:

·        A special form of purposive sample

·        Appropriate for small specialized populations

·        Each respondent is asked to identify one or more other population members

·        Judgment Samples

Drawbacks?

·        Those with more ties to sample members are selected

·        Similar people are more likely to be named

 

4- Quota Sampling

Attempt to be representative by selecting sample elements in proportion to their known incidence in the population

Example:  Surveying undergraduate students about campus food services

Step 1:  Identify attributes researcher believes is important, e. g. sex and class level

Step 2:  Look at incidence of sex and class level in population

Quota Sampling

Class Level

Freshmen      3200

Sophomores  2600

Juniors          2200

Seniors          2000

Sex

Males    4500

Females 5500

Draw backs?

 

Don’t be fooled – It relies on personal, subjective selection of quota attributes.

The sample can still be non-representative with respect to some other characteristic (e.g. in this example, perhaps race)

To sum up:

Non-probability methods are all sampling procedures in which the units that make up the sample are collected with no specific probability structure in mind. This might include, for example, the following:

  • the units are self-selected; that is, the sample is made up of 'volunteers'
  • the units are the most easily accessible (in geographical terms)
  • the units are selected on economic grounds (payment for participation, for example)
  • the units are considered by the researcher as in some way `typical' of the target population
  • the units are chosen without no obvious design ("the first fifty who come in this morning")

It is clear that such methods depend on unreliable and unquantifiable factors, such as the researcher's experience, or even on luck. They are correctly regarded as 'inferior' to probability methods because they provide no statistical basis upon which the 'success' of the sampling method (that is, whether the sample was representative of the population and so could provide accurate estimates) can be evaluated.

On the other hand, in situations where the sample cannot be generated by probability methods, such sampling techniques may be unavoidable, but they should really be regarded as a 'last resort' when designing a sample scheme.

B. Probability Sampling

The basis of probability sampling is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included. If we have 100 units in the frame, and we decide that we should have a sample size of 10, we can define the probability of each unit being selected as one in ten, or 0.1 (assuming each unit has the same chance).

 

Probability sampling does not guarantee representativeness, but does allow for the assessment of sampling error.

Sampling error:  error that occurs because a sample rather than a census is used

 

1- Simple Random Sampling (SRS)

Each sample element has a known, non-zero, equal chance of being selected

Example:  Lottery numbers

Or, put everyone’s name in a hat

Major polling firms use random digit dialing to approximate random samples

Or, use a random numbers table such as: http://www.randomizer.org/form.htm

 

 Example:

We select the units by random sampling from the frame by assigning each unit a number then use random number tables, or use a computer program to generate random numbers.

   94407382
   94409687     <========
   93535459     <========
   93781078
   94552345
   94768091     <========
   93732085
   94556321
   94562119
   93763450     <========
   94127845
   94675420
   94562119     <========
   93763450     <========
   94127845
   94675420

2- Systematic Sampling

Systematically spreads sample through a list of population members

Example: If a population contained 10,000 people, and need a size of 1000, select every 10th list name.

In nearly all practical examples, the procedure results in a sample equivalent to SRS.

Only exception:  when there are “regularities” in the list such as the names are ordered according to a specific characteristic such as all even names are males. So all the sample will be males.

 

 

 

 Example:

 We select the first point (the value of r) let us say 2. We then take every third sample after this (2, 5, 8,11, 14). Depending on the size of the sample frame this may (as it does here) produce a sample that is too small or too large by a single unit.

    93535459
    93781078     <========
    93732085
    93763450
    93763450     <========
    94407382
    94409687
    94552345     <========
    94768091
    94556321
    94562119     <========
    94127845
    94675420
    94562119     <========
    94127846
    94675420

3- Stratified Sampling

Information about subgroups in the sample frame is used to improve the efficiency of the sample plan

Three major reasons to use it:

·        Some subgroups are more homogenous than others so fewer numbers are needed for those groups to obtain the same level of precision

·        Group comparison is the purpose of the study (disproportionate stratified sampling)

·        Some elements are more important in determining outcome of research interest than are others

How is this different from quota sampling?

Within strata, selection of sample elements is random, not first available.

 

Note: Poststratification is OK. It is done after sampling to correct for MINOR differences between sample and population produced by non-cooperation

 

Example of disproportionate stratified sampling:

Here we first need to split the population into sub-populations (two in this example, presumably meaningful in the context of the study) and then sample from within those sub-populations. In the example the first sub-population (men) has eleven members, and the second has five (women); so we select four items from the first group (each unit has sampling probability within its own sub-population of 0.275) and two from the second (each unit has a sampling probability of 0.25).

Men:
   93535459
   93781078
   93732085     <========
   93763450
   93763450     <========
   94407382
   93427890
   94409687     <========
   94552345
   94768091     <========
   94556321
   -----------------------------
Women:
   94562119     <========
   94127845
   94675420
   94562119     <========
   94127846

 

4- Area (or Cluster) Sampling

·        Elements are geographically grouped into relatively homogenous clusters (e.g. a city is divided into 40 areas) in the same way the stratified sample is conducted.

·        From these areas, 10 are randomly selected

·        From these larger areas, blocks within areas will be randomly selected

·        Within each block, attempt to survey each household

·        Especially useful for door-to-door personal surveys (significantly reduces costs)

However, clustering increases sampling errors  (people who live close together tend to be more similar)

 

 

An order of preference:

1-

Simple Random is preferred to

2-

Systematic is preferred to

3- Stratified is preferred to

4- Cluster is preferred to

5-

Quota is preferred to

6- Purposive is preferred to

7- Network is preferred to

8- Convenience

Probability samples because we have a well-articulated sampling frame to compare the sample to. Non-Probability samples because we do not have a well-articulated sampling frame to compare the sample to.

Note on the sample size:

For non-probability samples, it is highly recommended to increase the sample size and to diversify it based on the major cleavages in the population. Yet, a researcher might find it more appropriate to split the sample based on these cleavages. That is to come up with a sample for men and another one for women and so on.

For probability samples, a sample size of 1000 is roughly good enough to get us results within 3% margin of errors.