SAMPLING
Sampling is the procedure a researcher uses to gather people, places, or things to study. Research conclusions and generalizations are only as good as the sample they are based on. Samples are always subsets or small parts of the total number that could be studied. If you were to sample everybody and everything, that would be called a quota sample. Most research, however, involves non-quota samples. For example, if you were interested in state prison systems, you might sample 15 or so state prison systems. There are formulas for determining sample size, but the main thing is to be practical. For a small population of interest, you would most likely need to sample about 10-30% of that population; for a large population of interest (over 150,000), you could get by with a sample as low as 1%.
Before gathering your sample, it's important to find out as much as possible about your population. Population refers to the larger group from which the sample is taken. You should at least know some of the overall demographics; age, sex, class, etc., about your population. This information will be needed later after you get to the data analysis part of your research, but it's also important in helping you decide sample size. The greater the diversity and differences that exist in your population, the larger your sample size should be. Capturing the variability in your population allows for more variation in your sample, and since many statistical tests operate on the principles of variation, you'll be making sure the statistics used later can do their powerful stuff.
After you've learned all the theoretically important things about your population, you then have to obtain a list or contact information on those who are accessible or can be contacted. This procedure for listing all the accessible members of your population is called the sampling frame. If you were planning on doing a phone survey, for example, the phone book would be your sampling frame. Make sure your sampling frame is appropriate for the population you want to study. In this case, the Census Dept. says that 93% of us have a phone, so that's not too bad, but you have to decide if any of the unique characteristics of people you're interested in studying are lost by selecting a restrictive sampling frame. The term refers to the procedure rather than the list. It's important for researchers to discuss their sampling frame because that's what ensures that systematic error, or bias, hasn't entered into your study.
Then, you are ready to draw your sample. There are two basic approaches to sampling: probabilistic and nonprobabilistic. If the purpose of your research is to draw conclusions or make predictions affecting the population as a whole (as most research usually is), then you must a use probabilistic sampling approach. On the other hand, if you're only interested in seeing how a small group, perhaps even a representative group, is doing for purposes of illustration or explanation, then you can use a nonprobabilistic sampling approach.
The key component behind all probabilistic sampling approaches is randomization, or random selection. Don't confuse random selection with random assignment. Random selection is how you draw the sample. Random assignment is how you assign people in your sample to different groups for experimental or control group purposes. People, places, or things are randomly selected when each unit in the population has an equal chance of being selected. Various methods have been established to accomplish probabilistic sampling:
Simple random sampling -- All you need is a relatively small, self-contained, or clearly defined population to use this method. The population of the U.S. might be too big, but a city of say 60,000 or so would be appropriate. You simply obtain a list of all residents, and then using a sequence of numbers from a random numbers table (or draws of a hat, flips of a coin), select, say 10%, 20%, or some portion of names on that list, making sure you aren't drawing from any letter of the alphabet more heavily than others.
Stratified random sampling -- This method is appropriate when you're interested in correcting for gender, race, or age disparities in your population. Say you're planning to study the impact of police training on mid-level career cynicism, and you know that gender is going to be an important factor because female police officers rarely take this kind of training and/or quit before making it to their mid-level career stage. You therefore need to stratify your sample by the gender strata, making sure that you oversample females (draw more of random number of females) as opposed to males (which you would undersample). For example, if the department has 1000 employees consisting of 900 males and 100 females, and you intend on sampling 10% of the total, then you proceed randomly as usual, drawing 90 males at random and 10 females at random. If you had used the employee list of names, regardless of gender, you might not have obtained 10 females at random because there's so few of them.
Systematic random sampling -- Suppose you had a huge list of people, places, or things to select from, like 100,000 people or more. The appropriate method to use is to select every 10th, 20th, or 30th person from that list. Your decision to use every 10th, 20th, or 30th person is called your sampling interval, and as long as you do it systematically and use the entire list, you're accomplishing the same thing as random sampling.
Cluster (area) random sampling -- Suppose you have a population that is dispersed across a wide geographic region. This method allows you to divide this population into clusters (usually counties, census tracts, or other boundaries) and then randomly sample everyone in those clusters. For example, you could randomly select 5 of North Carolina's 100 counties, but you would have to make sure that almost every person in those 5 counties participated in your study. As an alternative, you could systematically sample within your clusters, and this is called multi-stage sampling, which refers generally to any mixing of sampling methods.
Various methods have also been established to accomplish nonprobabilistic sampling:
Quota sampling -- As discussed earlier, sampling everybody and everything is quota sampling. The problem with it is that bias intrudes on the sampling frame. One the researcher identifies the people to be studied, they have to resort to haphazard or accidental sampling because no effort is usually made to contact people who are difficult to reach in the quota.
Convenience sampling -- Also called haphazard or accidental, this method is based on using people who are a captive audience, just happen to be walking by, or show a special interest in your research. The use of volunteers is an example of convenience sampling.
Purposive sampling -- This is where the researcher targets a group of people believed to be typical or average, or a group of people specially picked for some unique purpose. The researcher never knows if the sample is representative of the population, and this method is largely limited to exploratory research.
Snowball sampling -- Also called network, chain, or reputational, this method begins with a few people or cases and then gradually increases the sample size as new contacts are mentioned by the people you started out with.
THE SAMPLING DISTRIBUTION
The sampling distribution is a hypothetical device that figuratively represents the distribution of a statistic (some number you've obtained from your sample) across an infinite number of samples. You have to remember than your sample is just one of a potentially infinite number of samples that could have been drawn. While it's very likely that any statistics you generate from your sample would be near the center of the sampling distribution, just by luck of the draw, the researcher normally wants to find out exactly where the center of this sampling distribution is. That's because the center of the sampling distribution represents the best estimate of the population average, and the population is what you want to make inferences to. The average of the sampling distribution is the population parameter, and inference is all about making generalizations from statistics (sample) to parameters (population).
You can use some of the information you've collected thus far to calculate the sampling distribution, or more accurately, the sampling error. In statistics, any standard deviation of a sampling distribution is referred to as the standard error (to keep it separate in our minds from standard deviation). In sampling, the standard error is referred to as sampling error. Definitions are as follows:
Standard deviation -- the spread of scores around the average in a single sample
Standard error -- the spread of averages around the average of averages in a hypothetical sampling distribution
You never actually see the sampling distribution. All you have to work with is the standard deviation of your sample. The greater your standard deviation, the greater the standard error (and your sampling error). Standard error is also related to sample size. The larger your sample, the smaller the standard error. You're not reducing bias or anything by increasing sample size, only coming closer to the total number in the population. Validity and sampling error are somewhat similar. However, you can estimate population parameters from even small samples.
The best way to estimate population parameters is to use a confidence interval approach. Take the mean score on some variable in your sample and calculate the standard deviation for it. Then, assuming a bell-shaped curve (or normal distribution which is OK to assume), add your standard deviation to the mean (going one direction on the x-axis under the curve), and then subtract your standard deviation from the mean (going the other direction). The standard rule is that 65% of cases in real life (the population) will be between these extremes. If you add and substract two standard deviations from the mean, another rule states that approximately 95% of scores in real life will fall between these two extremes. If you go out three standard deviations, you include 99% of the cases. With the 65, 95, and 99 percent rules, you are actually predicting population characteristics, and all this from just your sample. You've made the first application of your research study to the wider population of interest. All you need to know is how to calculate a standard deviation, and the formula appears below:

SOURCE: Trochim, William M. The Research Methods Knowledge Base, 2nd Edition. Internet WWW page, at URL: <http://trochim.human.cornell.edu/kb/index.htm> (version current as of 06/29/00).
REVIEW QUESTIONS:
1. How do researchers decide how large a sample to use?
2. What is a sampling frame and why is it important?
3. When should a researcher use nonprobabilistic sampling?
4. How is the logic of validity and sampling error related?
PRACTICUM:
For each of the following, determine whether the sampling method used is SR
(Simple random), ST (Stratified random), SY (Systematic random), or NP
(nonprobabilistic):
A. Drawing names out of a hat
B. Picking out typical criminals from a prison lineup
C. Going into a room and asking for volunteers
D. Randomly selecting within variable subgroups
E. Selecting every name on the first page of a phone book
F. Putting a survey on the Internet for people to respond by computer
G. Dialing random telephone numbers
H. Flipping a coin while going down a list one-by-one
ASSIGNMENT: Read the following Internet
Resources until you understand Sampling, then do the quiz.
INTERNET RESOURCES
PowerPoint
Slide Show on the Principles of Sampling
Prof. Trochim's Notes on Sampling
QUIZ: Calculate the standard deviation for each group of
scores on a 10 point scale, then use the 65, 95, and 99 percent rules to
calculate confidence intervals.
1. 3.97; 4.43; 5.65; 5.05; 6.90
2. 5.55; 5.00; 4.95; 6.00
3. 7.00; 3.50; 6.25; 4.75; 5.00
PRINTED RESOURCES
Babbie, E. (1992). The Practice of Social Research. Belmont, CA:
Wadsworth.
Griffin, J. (1958). Statistics Essential for Police Efficiency.
Springfield, IL: Thomas.
Kish, L. (1965). Survey Sampling. NY: Wiley.
Kraemer, H. & S. Thieman. (1987). How Many Subjects: Statistical Power
Analysis in Research. CA: Sage.
Lasley, J. (1999). Essentials of Criminal Justice and Criminological Research.
NJ: Prentice Hall.
Neuman, L. & B. Wiegand. (2000). Criminal Justice Research Methods.
Boston: Allyn & Bacon.
Last updated: Oct 09, 2006
Not an official webpage of APSU, copyright restrictions apply, see
Megalinks in Criminal Justice
O'Connor, T. (Date of Last Update at bottom of page). In Part of web cited
(Windows name for file at top of browser), MegaLinks in Criminal Justice.
Retrieved from http://www.apsu.edu/oconnort/rest of URL accessed on
today's date.