Sample TypesTwo broad categories:Probability: each population element has a known, nonzero chance of being included in the sample Nonprobability: cannot mathematically estimate the probability of a population element being included in the sample. The main problem in nonprobability samples is that there is no clear/specific sampling frame that can reliably represent the population.
Core differences:
Statistician’s opinion: all NP samples are worthless because you cannot estimate the degree to which your results are generalizable.
A. Nonprobability SamplesCan be used to disprove a hypothesis rather than to prove a hypothesis. Ex. It is stated that all Republicans/Islamists are prodeath penalty. If a nonprobability sample proves the opposite, then we cast some doubt on the generalizability of the hypothesis. 1 Convenience:“Accidental samples”  those in sample are where the data is being collected One major form in marketing: “Mall Intercept”
What do statisticians think? “Rarely do samples selected on a convenience sample basis, regardless of size, prove representative, and are not recommended for descriptive or causal research.” I agree, but…. Minimizing drawbacks of convenience samples: 1 compare sample characteristics and findings to those collected on a census/random sample basis 2 speculate intelligently about bias, and how it is likely to have affected results 3 When possible, collect the sample where your population is likely to be (retailers collecting instore surveys). 4 Cultivate diversity in the sample (e.g. mall intercept using multiple locations) May be better at understanding relationships between variables than at making descriptive estimates
2 Purposive or Judgment Samples· Sample elements are hand picked because it is known that they are representative of some population of interest · Typically a small sample (maybe as small as 10) in which the researcher tries to represent all groups or segments from the population · Usually useful with elite or people who have a specific experience (For example, soldiers who came back injured from Iraq).
3 Snowball/network design:· A special form of purposive sample · Appropriate for small specialized populations · Each respondent is asked to identify one or more other population members · Judgment Samples Drawbacks? · Those with more ties to sample members are selected · Similar people are more likely to be named 4 Quota SamplingAttempt to be representative by selecting sample elements in proportion to their known incidence in the population Example: Surveying undergraduate students about campus food services Step 1: Identify attributes researcher believes is important, e. g. sex and class level Step 2: Look at incidence of sex and class level in population Quota Sampling Class Level Freshmen 3200 Sophomores 2600 Juniors 2200 Seniors 2000 Sex Males 4500 Females 5500 Draw backs?
Don’t be fooled – It relies on personal, subjective selection of quota attributes. The sample can still be nonrepresentative with respect to some other characteristic (e.g. in this example, perhaps race) To sum up: Nonprobability methods are all sampling procedures in which the units that make up the sample are collected with no specific probability structure in mind. This might include, for example, the following:
It is clear that such methods depend on unreliable and unquantifiable factors, such as the researcher's experience, or even on luck. They are correctly regarded as 'inferior' to probability methods because they provide no statistical basis upon which the 'success' of the sampling method (that is, whether the sample was representative of the population and so could provide accurate estimates) can be evaluated. On the other hand, in situations where the sample cannot be generated by probability methods, such sampling techniques may be unavoidable, but they should really be regarded as a 'last resort' when designing a sample scheme. B. Probability SamplingThe basis of probability sampling is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included. If we have 100 units in the frame, and we decide that we should have a sample size of 10, we can define the probability of each unit being selected as one in ten, or 0.1 (assuming each unit has the same chance).
Probability sampling does not guarantee representativeness, but does allow for the assessment of sampling error. Sampling error: error that occurs because a sample rather than a census is used 1 Simple Random Sampling (SRS)Each sample element has a known, nonzero, equal chance of being selected Example: Lottery numbers Or, put everyone’s name in a hat Major polling firms use random digit dialing to approximate random samples Or, use a random numbers table such as: http://www.randomizer.org/form.htm
Example: We select the units by random sampling from the frame by assigning each unit a number then use random number tables, or use a computer program to generate random numbers. 94407382 94409687 <======== 93535459 <======== 93781078 94552345 94768091 <======== 93732085 94556321 94562119 93763450 <======== 94127845 94675420 94562119 <======== 93763450 <======== 94127845 94675420 2 Systematic SamplingSystematically spreads sample through a list of population members Example: If a population contained 10,000 people, and need a size of 1000, select every 10th list name. In nearly all practical examples, the procedure results in a sample equivalent to SRS. Only exception: when there are “regularities” in the list such as the names are ordered according to a specific characteristic such as all even names are males. So all the sample will be males.
Example: We select the first point (the value of r) let us say 2. We then take every third sample after this (2, 5, 8,11, 14). Depending on the size of the sample frame this may (as it does here) produce a sample that is too small or too large by a single unit. 93535459 93781078 <======== 93732085 93763450 93763450 <======== 94407382 94409687 94552345 <======== 94768091 94556321 94562119 <======== 94127845 94675420 94562119 <======== 94127846 94675420 3 Stratified SamplingInformation about subgroups in the sample frame is used to improve the efficiency of the sample plan Three major reasons to use it: · Some subgroups are more homogenous than others so fewer numbers are needed for those groups to obtain the same level of precision · Group comparison is the purpose of the study (disproportionate stratified sampling) · Some elements are more important in determining outcome of research interest than are others How is this different from quota sampling? Within strata, selection of sample elements is random, not first available.
Note: Poststratification is OK. It is done after sampling to correct for MINOR differences between sample and population produced by noncooperation
Example of disproportionate stratified sampling: Here we first need to split the population into subpopulations (two in this example, presumably meaningful in the context of the study) and then sample from within those subpopulations. In the example the first subpopulation (men) has eleven members, and the second has five (women); so we select four items from the first group (each unit has sampling probability within its own subpopulation of 0.275) and two from the second (each unit has a sampling probability of 0.25). Men: 93535459 93781078 93732085 <======== 93763450 93763450 <======== 94407382 93427890 94409687 <======== 94552345 94768091 <======== 94556321  Women: 94562119 <======== 94127845 94675420 94562119 <======== 94127846
4 Area (or Cluster) Sampling· Elements are geographically grouped into relatively homogenous clusters (e.g. a city is divided into 40 areas) in the same way the stratified sample is conducted. · From these areas, 10 are randomly selected · From these larger areas, blocks within areas will be randomly selected · Within each block, attempt to survey each household · Especially useful for doortodoor personal surveys (significantly reduces costs) However, clustering increases sampling errors (people who live close together tend to be more similar)
An order of preference:
Note on the sample size: For nonprobability samples, it is highly recommended to increase the sample size and to diversify it based on the major cleavages in the population. Yet, a researcher might find it more appropriate to split the sample based on these cleavages. That is to come up with a sample for men and another one for women and so on. For probability samples, a sample size of 1000 is roughly good enough to get us results within 3% margin of errors.
