# Statistics Questions

Sample Means DistributionWe are interested in exploring the nature of sampling. We often want to know about characteristics of a

population – the proportion who vote or the average height or age, for example. Since it is usually impossible to

collect data from the entire population, we use samples to guess the qualities of interest. You might wonder how

good a job a sample statistic does of estimating a population parameter and that is the goal of this exploration.

In this activity you will study the ages of a large population (over 5000). We will compute the population mean

age and then generate a large number of samples, calculate the means of those samples and see how closely

those means come to the real thing. More importantly, we shall see that the distribution of those ages forms a

familiar shape.

In addition, we will repeat the sampling procedure for larger sample sizes and see how this affects the distribution

of sample means.

Sample Means Distribution

Use this data set (ages of people arrested in Pittsburgh, PA) in StatCrunch to answer the following questions.

1.

Make a histogram of the Ages. Take a screenshot (insert it below) and comment about the shape of the distribution.

Comment about the shape:

Sample Means Distribution

2.

Take random samples of the ages by going to the Data menu and

selecting Sample. In the Sample window make the sample size 50 and

the number of samples 500. Compute the samples – notice you will now

have 500 new columns in the table representing the 500 samples you’ve

taken.

Sample Means Distribution

2.

Take random samples of the ages by going to the Data menu and

selecting Sample. In the Sample window make the sample size 50 and

the number of samples 500. Compute the samples – notice you will now

have 500 new columns in the table representing the 500 samples you’ve

taken.

3.

Now go to the Stat menu and select Summary Stats -> Columns. Select

all 500 new sample columns (click once at the top of the list where it says

Sample1(AGE), hold the SHIFT key and scroll down and click at the last

sample, sample500(AGE)), and in the Statistics window select only Mean.

Then select the Store in Data Table checkbox at the bottom of the window

and calculate.

Sample Means Distribution

4.

Create a Histogram (Graph menu) of the Means column (take a

screenshot and include it below) and note the shape of the distribution of

means. Compare this with the shape of the original distribution of ages

(you may want to paste that in above it for comparison purposes).

Compare the shapes of the original population

distribution and the distribution of sample

means:

Sample Means Distribution

5.

Now find the summary statistics of the Means column and Age column (leave all the statistics selected). Take a

screenshot of your table and paste in in below. Compare the means and standard deviations of the two distributions.

a.

Note (comment about this) that the means are about the same and the standard deviation from the sample

means is significantly smaller than the standard deviation of the population.

b.

Divide the population standard deviation by √50 and note that this is about the size of the standard deviation

for sample means.

Compare the means and standard deviations of the

original population distribution and the distribution of

sample means:

Sample Means Distribution

6.

7.

We are going to start over but first select and copy the column titled Means (hover over the down arrow at the top of

the column and click – it should select the whole column. Then copy (Command -C, Mac; CTL-C, PC) then click the

refresh button on your browser (this should erase all of the new columns – if it does not, then close the window and

come back to this page to follow the link in #2 again).

On the new window, paste the means column from your previous window and re-title it Means 50 (if you forgot to copy

the means column, forget about it and keep going)

Sample Means Distribution

7.

Repeat step #2 but this time set the sample size to 200: Go to the Data

menu and select Sample. In the Sample window make the sample size

200 and the number of samples 500. Compute the samples – notice you

will now have 500 new columns in the table.

Sample Means Distribution

7.

Repeat step #2 but this time set the sample size to 200: Go to the Data

menu and select Sample. In the Sample window make the sample size

200 and the number of samples 500. Compute the samples – notice you

will now have 500 new columns in the table.

8.

Now go to the Stat menu and select Summary Stats -> Columns. Select

all 500 new sample columns (click once at the top of the list where it says

Sample1(AGE), hold the SHIFT key and scroll down and click at the last

sample, sample500(AGE)), and in the Statistics window select only Mean.

Then select the Store in Data Table checkbox at the bottom of the window

and calculate.

Sample Means Distribution

9.

Create a Histogram (Graph menu) of the Age column, the new Means column, and the Means 50

column. Set Columns per Page to 3 and compute. (take a screenshot and include it ibelow) and note

the shape of the distribution of means. Compare this with the shape of the original distribution of ages

and the means50 distribution. Pay attention to spread..

Compare the shapes of the original population Ages

distribution, this distribution of sample means, and the

means50 distribution.

Sample Means Distribution

10.

Now find the summary statistics of the Means column and Age column (leave all the statistics selected). Take a

screenshot of your table (include it below) and compare the means and standard deviations of the two distributions.

a.

Note (comment about this) that the means are about the same and the standard deviation from the sample

means is significantly smaller than the standard deviation of the population.

b.

Divide the population standard deviation by √200 and note that this is about the size of the standard deviation

for sample means.

Compare the means and standard deviations of the

original population distribution and the distribution of

sample means:

Sample Means Distribution

11.

Finally, compare the results of your two sample distributions. Make dotplots of the two sample mean distributions

(select one and then hold the Command key (ac) or CTL key (PC)) and take a screenshot and include it below). Think

about shape, center, and spread. How are they similar, how are they different? What does this suggest to you about

the effect of sample size in estimating the population mean?

Reese’s Pieces

What does it mean to be 95% confident? What does it mean to say

the confidence interval method is valid? We will turn to an applet

called Simulating Confidence Intervals to illustrate this.

Imagine using a random sample of Reeses Pieces candies to

estimate the proportion of all such candies that are orange. The applet

will simulate taking a large number of random samples and generating

a confidence interval based on each sample.

Begin by setting the applet to the correct values:

●

Statistic → Proportions

●

Distribution → Binomial

●

Method → Wald

●

π → 0.45 (Hershey’s tells us this is the population parameter)

●

Sample Size → 75

●

Confidence Level → 95%

Reese’s Pieces

Warm-up:

Begin by pressing the

button and observe that the applet takes a sample of 75 Reese’s Pieces and

creates a confidence interval from it:

Also notice the vertical line at 0.45 representing the true population parameter.

Keep pressing the

button and observe how the different samples produce different confidence intervals.

Keep pressing until one of the intervals does not overlap the vertical line at 0.45, What do you notice about this

interval? Remember that.

Reese’s Pieces

a)

As we take new samples, what do you notice about the intervals? Are they all the same? Are any colored red?

What does that denote?

b)

Does the value of the population proportion change as we take new samples?

Now change the Number of intervals to 100 and click the

button.

c)

About what percentage of the intervals seem to be successful at capturing (overlapping) the population

proportion?

d)

Use the

button to sort the intervals, and comment on what the intervals that fail to capture the

population proportion have in common.

Reese’s Pieces

e)

In practice, you only take one sample and construct one confidence interval. Can you be sure that the

confidence interval successfully captures the true (but unknown) value of the parameter? In what sense can

you be confident of this?

f)

Now change the confidence level to 80%. Before pressing the

button, what changes do you

expect to see? Then press the button. What two things change about the intervals?

g)

Now change the sample size to 300 (return the confidence level to 95%). Does this produce a dramatically

higher percentage of successful intervals? What does change about the intervals?

h)

Is it desirable to have larger or smaller confidence levels? Explain.

Reese’s Pieces

i) Is it desirable to have wider or narrower confidence intervals? Explain.

j) What’s a drawback of using a very high confidence level such as 99.9%?

k) What would it take to achieve a very high confidence and a very narrow confidence interval?

Why is this so difficult to achieve?