# University of Arizona Normal Distributions Statistics Questions

SIE 430/530 : Engineering Statistics

( Lecture 13 )

Jian Liu

Department of Systems and Industrial Engineering

The University of Arizona

jianliu@email.arizona.edu, 520-621-6548(O)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

1

1

List of Topics in Lecture 13

• Recall topics in lecture 12

• Order Statistics

• Principles of data reduction

– Sufficiency principle: sufficient statistics

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

2

2

Recall Topics in Lecture 12

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

3

3

Principles of Data Reduction

• Experimenters use the information in a

sample X1, …, Xn , denoted as X, to make

inferences about an unknown parameter q.

• Any statistic, T(X), defines a form of data

reduction or data summary.

• Data reduction in terms of a particular statistic

can be though of as a partition of the sample

space.

• Principles of data reduction:

– Sufficiency principle

– Likelihood principle

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

4

4

Sufficient Principle

• A sufficient statistic for a parameter q is a

statistic that, in a certain sense, captures all the

information about q contained in the sample.

• Definition: A statistic T(X) is a sufficient statistic for q

if the conditional distribution of the sample X given

the value of T(X) does not depend on q.

• Theorem: if p(x|q) is the joint pdf or pmf of X and

q(T(x)|q) is the pdf or pmf of T(X), then T(X) is a

sufficient statistic for q if, for every x in the sample

space, the ratio p(x|q)/q(T(x)|q) is a constant as

function of q.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

5

5

Ex. 13.1: Binomial sufficient statistic

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

6

6

Ex. 13.2: Normal sufficient statistic

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

7

7

Another Way to Identify Sufficient Statistic

Factorization Theorem: Let f(x|q) denote the joint pdf

or pmf of a sample X. A statistic T(X) is a sufficient

statistic for q if and only if there exist function

g(T(x)|q) and h(x) such that, for all sample points x

and all parameter point q ,

f(x|q) = g(T(x)|q)∙h(x)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

8

8

•

Ex. 13.3: Normal sufficient statistic (another method)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

9

9

SIE 430/530 : Engineering Statistics

( Lecture 12 )

Jian Liu

Department of Systems and Industrial Engineering

The University of Arizona

jianliu@email.arizona.edu, 520-621-6548(O)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

1

1

List of Topics in Lecture 12

• Recall topics in lecture 11

• Chi-Squared RVs

• Derived distributions:

– t-Distribution

– F-Distribution

– Applications

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

2

2

Recall Topics in Lecture 11

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

3

3

The Derived Distributions

• Student’s t-distribution

– Let X1, X2, …, Xn be a random sample of size n

from a N(µ, s2) distribution. The quantity

T=

X -µ

S/ n

has Student’s t-distribution with v=n-1 degrees of

freedom (i.e., T~tv) .

v +1

G(

)

1

1

2

fT (t | v) =

, -¥ < t < ¥
v (vp )1/ 2 (1 + t 2 / v)( v +1) / 2
G( )
2
– T is the ratio of a N(0,1) RV and sqrt(independent
c2 divided by its degrees of freedom, v)
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
4
4
Facts about Student’s t Distribution
• E X = 0, for v>1

• VarX = v/(v-2) for v>2

• Related to F distribution (F1,v= t2v )

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

5

5

•

•

•

E X = 0, for v>1

VarX = v/(v-2) for v>2

Related to F distribution (F1,v= t2v )

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

6

6

•

Ex. 12.1: the mean time it takes a crew to restart an aluminum

rolling mill after a failure is of interest. The crew was observed

over 25 occasions, and the results were X = 26.42 minutes and

variance S2 =12.28 minutes. If the repair time is normally

distributed, and the true mean (µ) is 30, what is the probability to

observe the X ≤ 26.42?

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

7

7

Variance Ratio Distribution

• Ex. 12. 2: Let X1, X2, …, Xn be a random sample of

size n from a N(µX, sX2) population. Let Y1, Y2, …, Yn

be a random sample of size n from a N(µY, sY2)

population. How to compare the variability of the two

populations?

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

8

8

The Derived Distributions

• Snedecor’s F-distribution

– Let X1, X2, …, Xn be a random sample of size n

from a N(µX, sX2) population. Let Y1, Y2, …, Ym be a

random sample of size m from a N(µY, sY2)

population. The random variable

F = ( S X2 / s X2 ) /( SY2 / s Y2 )

has Snedecor’s F-distribution with v1=n-1 and

v2=m-1 degrees of freedom (i.e., F ~ Fv1 ,v2 ) .

v +v

G( 1 2 ) æ öv1 / 2

v1

x ( v1 – 2) / 2

2

f F ( x | v) =

, 0£ x 2

v2 – 2

v

(v + v – 2)

VarX = 2( 2 ) 2 1 2

, v2 > 4

v2 – 2 v1 (v2 – 4)

• As v1 and v2 increase, F tends to normal

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

10

10

Facts about F-Distribution

• If X ~ Fv ,v , then 1/ X ~ Fv ,v

1

2

2

1

2

• If X ~ tq , then X ~ F1,q

• If X ~ Fv ,v , then

1

2

( v1 / v2 ) X

v v

~ beta ( 1 , 2 )

(1 + (v1 / v2 ) X )

2 2

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

11

11

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

12

12

• Ex. 12.3: If two random samples with size n=6 and

m=10 from two normal populations with equal

population variance, find the number b such that

S12

P( 2 £ b) = 0.95

S2

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

13

13

Summary of Sampling Distribution

• For random sample from any distribution,

standardized sample mean converges to N(0,1) as

n increases (CLT).

• In normal case, standardized statistic (minus

sample mean with S, instead of s, as the

denominator) follows Student’s t(n-1).

• Sum of n squared unit normal variates follows

Chi-squared (n)

• In the normal case, sample variance has scaled

Chi-squared distribution.

• In the normal case, ratio of sample variances

from two different samples divided by their

respective d.f. has F distribution.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

14

14

SIE 430/530 : Engineering Statistics

( Lecture 11 )

Jian Liu

Department of Systems and Industrial Engineering

The University of Arizona

jianliu@email.arizona.edu, 520-621-6548(O)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

1

1

List of Topics in Lecture 11

• Recall topics in lecture 10

• Function of a Random Vector Random

samples

• Statistic: definition and properties

• Sample from normal distribution

• Statistics of exponential family

• Central Limit Theory

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

2

2

Recall Topics in Lecture 10

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

3

3

Statistics Starts from Today!

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

4

4

Random Sample

• Definition: the random variables X1, X2, …, Xn are

called a random sample of size n from the population

f(x) if X1, X2, …, Xn are mutually independent random

variables and the marginal pdf or pmf of each Xi is the

same function f(x).

• Independently and identically distributed random

variables with pmf/pdf f(x) : iid sample

• Examples:

– Rolling a die n times

– Selecting 100 UA male students and measuring their heights.

• Note:

– sampling from a large population -> “nearly

independent”

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

5

5

Joint pdf/pmf of Random Sample

• Given the iid random sample X1, X2, …, Xn,

n

f ( x1 , x2 ,…, xn ) = f ( x1 ) f ( x2 ) ! f ( xn ) = Õ f ( xi )

n

f ( x1 , x2 ,…, xn | θ) = Õ f ( xi | θ)

i =1

i =1

•

Ex. 11.1: let X1, X2, …, Xn be the time-to-failure of n identical

circuit boards randomly selected from a population, which has

an exponential distribution exp(b), what is the probability that

ALL the boards last more than 2 years.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

6

6

Statistic

• Definition: let X1, X2, …, Xn be a random

sample of size n from the population f(x) and

let T(x1, x2, …, xn ) be a real-valued or vectorvalued function whose domain includes the

sample space of (X1, X2, …, Xn ). Then the

random variable or random vector

Y=T(x1, x2, …, xn ) is called a statistic.

• The probability distribution of a statistic Y is

called the sampling distribution of Y.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

7

7

Sample Mean and Variance

• Sample Mean: denoted by Xn, is a statistic defined as

the arithmetic average of the values in a random

sample of size n.

• Sample Variance: denoted by Sn2, is a statistic

defined as:

• Remember: the observed value of the statistic is

denoted by lowercase letters. So, x , s2 and s denote

observed values of the RVs X, S2 , and S.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

8

8

Properties of Sample Statistics

• Let X1, X2, …, Xn be a random sample of size n from

the population f(x) with mean µ (finite) and variance

s2 (finite). Then

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

9

9

MGF of Sample Mean

• Theorem: Let X1, X2, …, Xn be a random sample from a

population with mgf MX(t). Then the mgf of the sample

mean is

M X (t ) = [ M X (t / n)]n

• Ex. 11.2 (Distribution of the mean) Let X1, X2, …, Xn be

a random sample from a N(µ, s2) population. Find the

distribution of sample mean.

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

10

10

Statistics of Members of Exponential Family

•

Suppose X1, X2, …, Xn be a random sample from a population

with pdf/pmf a f(x|q), where

æ k

ö

f ( x | θ) = h( x)c(θ) exp ç å wi (θ)ti ( x) ÷

è i =1

ø

is a member of an exponential family. Define statistics T1, …, Tk

by

n

Ti ( X1 ,…, X n ) = å ti ( X j ), i = 1, 2,…, k

j =1

if the set {( w1 (q ), w2 (q ),…, wk (q )),q ÎQ}contains an open

subset of Rn, then the distribution of (T1,…, Tk) is an exponential

family of the form

k

æ

ö

n

fT (u1 ,…, uk | θ) = H (u1 ,…, uk ) [c(θ) ] exp ç å wi (θ)ui ÷

è i =1

ø

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

11

11

Ex. 11.3: Sum of Bernoulli random variables

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

12

12

Distribution of Sample Means

(Exponential Family)

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

13

13

Distribution of Sample Means (General)

• Generally, the exact distribution is difficult to

calculate.

• What can be said about the distribution of the

sample mean when the sample is drawn from an

arbitrary population?

• In many cases we can approximate the

distribution of the sample mean when n is large

by a normal distribution.

• The famous Central Limit Theorem

SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)

14

14

Central Limit Theorem (CLT)

•

Let X1, …, Xn be independent and identically distributed (iid)

random variables with E(Xi)= µ (finite) and Var(Xi)= s2 (finite).

1 n

Define X n = å i =1 X i . Then, for any value x ∈ (−∞, ∞),

n

æ n ( Xn – µ)

ö

x 1

2

lim P ç

< x÷ = ò
e- x /2 dx = F( x)
n ®¥
ç
÷ -¥ 2
s
è
ø
where Ф(x) is the standard normal distribution
•
In words, if Xis are normally distributed, the sample mean
statistic will also be normally distributed. But with CLT, when
n→∞, the function of the sample mean statistic,
will
be normally distributed regardless of the distribution of Xi’s.
•
In practice, CLT can be applied when n is sufficiently large.
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
15
15
Sampling from NORMAL Distribution
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
16
16
Properties of Sample Statistics from
Normal Distribution
• Let X1, X2, …, Xn be a random sample of size n from a
N(µ, s2) distribution, and let X = (1/ n)
å ( X - X ) Then
dfdfds
S = [1/(n - 1)]
2
n
i =1
å X and
n
i =1
i
2
i
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
17
17
Ex. 11.4: The amount (unit: ounce) of fill dispensed by a bottle machine is
normally distributed with mean at µ and s = 1.0. A sample of n=9 filled
bottles is randomly selected, and the amount of fill are measured for
each.
(i) Find the probability that the sample mean will be within 0.3 ounce of
the true mean, µ , for the chosen machine setting.
(ii) How many observations should be randomly selected if we wish the
sample mean to be within 0.3 ounce of the true mean, µ , with
probability of 0.95?
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
18
18
Facts about Chi Squared RVs
• We use the notation c p2 to denote a chi-squared
random variable with p degree of freedom.
2
– If Z is a N(0,1) rv, the Z 2 ~ c1 , that is, the square of a standard
normal rv is a chi squared rv.
2
– If X1, X2, …, Xn are independent and Xi ~ c pi , then
X1 + X 2 + ! X n ~ c p21 + p2 +!+ pn
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
19
19
Ex. 11.5 (Ex. 11.4 Cont’d): assume we select a random sample of 10
bottles and measure the fill. If we calculate the sample variance S2
from these 10 bottles, how can we define the boundaries of an interval
such that P(b1≤S2 ≤b2) = 0.90.
SIE 430/530, Engineering Statistics, Jian Liu, the University of Arizona, jianliu@email.arizona.edu, 520-621-6548(O)
20
20