SAMPLING

    Sampling is the procedure a researcher uses to gather people, places, or things to study. Research conclusions and generalizations are only as good as the sample they are based on. Samples are always subsets or small parts of the total number that could be studied. If you were to sample everybody and everything, that would be called a quota sample. Most research, however, involves non-quota samples. For example, if you were interested in state prison systems, you might sample 15 or so state prison systems. There are formulas for determining sample size, but the main thing is to be practical. For a small population of interest, you would most likely need to sample about 10-30% of that population; for a large population of interest (over 150,000), you could get by with a sample as low as 1%.   

    Before gathering your sample, it's important to find out as much as possible about your population. Population refers to the larger group from which the sample is taken. You should at least know some of the overall demographics; age, sex, class, etc., about your population. This information will be needed later after you get to the data analysis part of your research, but it's also important in helping you decide sample size. The greater the diversity and differences that exist in your population, the larger your sample size should be. Capturing the variability in your population allows for more variation in your sample, and since many statistical tests operate on the principles of variation, you'll be making sure the statistics used later can do their powerful stuff. 

    After you've learned all the theoretically important things about your population, you then have to obtain a list or contact information on those who are accessible or can be contacted. This procedure for listing all the accessible members of your population is called the sampling frame. If you were planning on doing a phone survey, for example, the phone book would be your sampling frame. Make sure your sampling frame is appropriate for the population you want to study. In this case, the Census Dept. says that 93% of us have a phone, so that's not too bad, but you have to decide if any of the unique characteristics of people you're interested in studying are lost by selecting a restrictive sampling frame. The term refers to the procedure rather than the list. It's important for researchers to discuss their sampling frame because that's what ensures that systematic error, or bias, hasn't entered into your study. 

    Then, you are ready to draw your sample. There are two basic approaches to sampling: probabilistic and nonprobabilistic. If the purpose of your research is to draw conclusions or make predictions affecting the population as a whole (as most research usually is), then you must a use probabilistic sampling approach. On the other hand, if you're only interested in seeing how a small group, perhaps even a representative group, is doing for purposes of illustration or explanation, then you can use a nonprobabilistic sampling approach.

    The key component behind all probabilistic sampling approaches is randomization, or random selection. Don't confuse random selection with random assignment. Random selection is how you draw the sample. Random assignment is how you assign people in your sample to different groups for experimental or control group purposes. People, places, or things are randomly selected when each unit in the population has an equal chance of being selected. Various methods have been established to accomplish probabilistic sampling:

Various methods have also been established to accomplish nonprobabilistic sampling:

THE SAMPLING DISTRIBUTION

    The sampling distribution is a hypothetical device that figuratively represents the distribution of a statistic (some number you've obtained from your sample) across an infinite number of samples. You have to remember than your sample is just one of a potentially infinite number of samples that could have been drawn. While it's very likely that any statistics you generate from your sample would be near the center of the sampling distribution, just by luck of the draw, the researcher normally wants to find out exactly where the center of this sampling distribution is. That's because the center of the sampling distribution represents the best estimate of the population average, and the population is what you want to make inferences to. The average of the sampling distribution is the population parameter, and inference is all about making generalizations from statistics (sample) to parameters (population).

    You can use some of the information you've collected thus far to calculate the sampling distribution, or more accurately, the sampling error. In statistics, any standard deviation of a sampling distribution is referred to as the standard error (to keep it separate in our minds from standard deviation). In sampling, the standard error is referred to as sampling error. Definitions are as follows:

    You never actually see the sampling distribution. All you have to work with is the standard deviation of your sample. The greater your standard deviation, the greater the standard error (and your sampling error). Standard error is also related to sample size. The larger your sample, the smaller the standard error.  You're not reducing bias or anything by increasing sample size, only coming closer to the total number in the population. Validity and sampling error are somewhat similar. However, you can estimate population parameters from even small samples.

    The best way to estimate population parameters is to use a confidence interval approach. Take the mean score on some variable in your sample and calculate the standard deviation for it. Then, assuming a bell-shaped curve (or normal distribution which is OK to assume), add your standard deviation to the mean (going one direction on the x-axis under the curve), and then subtract your standard deviation from the mean (going the other direction). The standard rule is that 65% of cases in real life (the population) will be between these extremes.  If you add and substract two standard deviations from the mean, another rule states that approximately 95% of scores in real life will fall between these two extremes. If you go out three standard deviations, you include 99% of the cases. With the 65, 95, and 99 percent rules, you are actually predicting population characteristics, and all this from just your sample. You've made the first application of your research study to the wider population of interest. All you need to know is how to calculate a standard deviation, and the formula appears below:  

SOURCE: Trochim, William M. The Research Methods Knowledge Base, 2nd Edition. Internet WWW page, at URL: <http://trochim.human.cornell.edu/kb/index.htm> (version current as of 06/29/00).

REVIEW QUESTIONS:
1. How do researchers decide how large a sample to use?
2. What is a sampling frame and why is it important?
3. When should a researcher use nonprobabilistic sampling?
4. How is the logic of validity and sampling error related?

PRACTICUM:
For each of the following, determine whether the sampling method used is SR (Simple random), ST (Stratified random), SY (Systematic random), or NP (nonprobabilistic):
A. Drawing names out of a hat
B. Picking out typical criminals from a prison lineup
C. Going into a room and asking for volunteers
D. Randomly selecting within variable subgroups
E. Selecting every name on the first page of a phone book
F. Putting a survey on the Internet for people to respond by computer
G. Dialing random telephone numbers
H. Flipping a coin while going down a list one-by-one

ASSIGNMENT: Read the following Internet Resources until you understand Sampling, then do the quiz.
INTERNET RESOURCES
PowerPoint Slide Show on the Principles of Sampling
Prof. Trochim's Notes on Sampling

QUIZ: Calculate the standard deviation for each group of scores on a 10 point scale, then use the 65, 95, and 99 percent rules to calculate confidence intervals.
1. 3.97; 4.43; 5.65; 5.05; 6.90
2. 5.55; 5.00; 4.95; 6.00
3. 7.00; 3.50; 6.25; 4.75; 5.00

PRINTED RESOURCES
Babbie, E. (1992). The Practice of Social Research. Belmont, CA: Wadsworth.
Griffin, J. (1958). Statistics Essential for Police Efficiency. Springfield, IL: Thomas.
Kish, L. (1965). Survey Sampling. NY: Wiley.
Kraemer, H. & S. Thieman. (1987). How Many Subjects: Statistical Power Analysis in Research. CA: Sage.
Lasley, J. (1999). Essentials of Criminal Justice and Criminological Research. NJ: Prentice Hall.
Neuman, L. & B. Wiegand. (2000). Criminal Justice Research Methods. Boston: Allyn & Bacon.

Last updated: Oct 09, 2006
Not an official webpage of APSU, copyright restrictions apply, see Megalinks in Criminal Justice
O'Connor, T.  (Date of Last Update at bottom of page). In Part of web cited (Windows name for file at top of browser), MegaLinks in Criminal Justice. Retrieved from http://www.apsu.edu/oconnort/rest of URL accessed on today's date.