# 4 questions in 7 hours from now due to 9/12/2022 10 PM ,PST time zone

PSTAT 120C, Summer 2022-f_÷rIf_

Midterm

I

t

Background

The Environmental Protection Agency, or EPA, regularly publishes data on automotive trends by year; it

has maintained its database since 1975 and is updated annually to include the most up-to-date data available

for all model years.

Real-world miles per gallon (mpg) refers to an EPA-calculated weighted average of city and highway miles

per gallon. Engine displacement (displacement) is measured in cubic centimeters (cm3 ); it is considered an

expression of engine size, or a representation of the power an engine is capable of exerting and the amount of

fuel it can be expected to consume.

For this exam, you’ll investigate the relationship(s) between mpg, weight in pounds (weight), and

displacement. You’ll use a random sample of 15 vehicles from the summary automotive trends data, which

includes information about vehicle attributes for model years ranging from 1975 to 2021.

Below, Figure 1 shows the distribution of mpg in a random sample of 15 vehicles.

Density

0.15

0.10

0.05

0.00

15

20

25

30

Miles per gallon

Figure 1: Histogram and density curve for miles per gallon of 15 randomly sampled vehicles.

Figure 2 is a matrix of the correlation coefficients (r2 ) between mpg, weight, and displacement. Note that

the diagonal is made up of the correlations between each variable and itself, or 1.

2

Midterm

Tff

displacement

mpg

weight

PSTAT 120C, Summer 2022

÷ost-÷L

1

0.8

mpg

−0.72

1

−0.87

0.6

0.4

0.2

weight

−0.72

1

0.81

0

−0.2

−0.4

displacement

−0.87

0.81

1

−0.6

−0.8

−1

Figure 2: Correlation plot of miles per gallon, weight in pounds, and engine displacement.

Finally, the data themselves are presented in Table 1 at the end of this document. They are also available as

a .csv file for download on GauchoSpace.

The overall question of interest throughout this exam is: Should miles per gallon be predicted based

on weight alone, or on the linear combination of weight and displacement?

3

PSTAT 120C, Summer 2022

FETID

Midterm

F-

1. Answer the following based on a simple linear regression, predicting mpg (y) with weight (x1 ).

a. Fit the specified model. Write the model equation, including your estimates.

5 pts for model fitting; 5 pts for equation

r

data %

select(-c(manufacturer, horsepower))

data %>%

lm(mpg ~ weight, data = .) %>%

summary()

##

## Call:

## lm(formula = mpg ~ weight, data = .)

##

## Residuals:

##

Min

1Q Median

3Q

Max

## -3.4600 -2.1210 -0.6158 1.6716 7.0659

##

## Coefficients:

##

Estimate Std. Error t value Pr(>|t|)

## (Intercept) 40.267655

5.038457

7.992 2.26e-06 ***

## weight

-0.004678

0.001267 -3.692 0.00271 **

## –## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1

##

## Residual standard error: 3.131 on 13 degrees of freedom

## Multiple R-squared: 0.5119, Adjusted R-squared: 0.4744

## F-statistic: 13.63 on 1 and 13 DF, p-value: 0.002709

1

The model equation can be written as:

$$

\hat{y}_{mpg}=40.27-0.005x_{weight}

$$

b. Create a scatterplot of mpg and weight. Add a line representing the model, with 95% confidence

bands. Does the model appear to fit the data?

5 pts for correct plot; 5 pts for discussion

data %>%

ggplot(aes(x = weight, y = mpg)) +

geom_point() +

geom_smooth(method = “lm”, se = T) +

theme_bw()

4

PSTAT 120C, Summer 2022

FILITTI

Midterm

F-

30

mpg

25

20

15

3000

3500

4000

4500

5000

weight

The model appears to fit the data reasonably well, although, since the sample size is small, this is

difficult to assess. There are some points that appear to fall far from the line, but this is likely to

be due to sampling error; overall, the model looks fine.

c. Test the null hypothesis that the slope of x1 , —1 , is equal to zero. State the hypotheses, test

statistic, rejection region(s), and p-value. Do not interpret the conclusion of this test.

2 pts for hypotheses; 2 pts for RR; 2 pts for test statistic; 2 pts for p-value; 2 pts for

not interpreting

Note that this was tested automatically when lm() was run; referring to those results is fine. You

can also test the hypothesis manually. Either way, though, you should state everything as listed.

The hypotheses and rejection region(s) are not printed by lm().

The hypotheses are H0 : —1 = 0; Ha : —1 ”= 0.

This is a two-tailed t-test with 13 degrees of freedom (n ≠ 2), so the rejection region is ≠t –2 ﬁ t –2 ,

or t Æ ≠2.16 ﬁ t Ø 2.16.

Code to fit the model, without using lm():

X