4 questions in 7 hours from now due to 9/12/2022 10 PM ,PST time zone
PSTAT 120C, Summer 2022-f_÷rIf_
Midterm
I
t
Background
The Environmental Protection Agency, or EPA, regularly publishes data on automotive trends by year; it
has maintained its database since 1975 and is updated annually to include the most up-to-date data available
for all model years.
Real-world miles per gallon (mpg) refers to an EPA-calculated weighted average of city and highway miles
per gallon. Engine displacement (displacement) is measured in cubic centimeters (cm3 ); it is considered an
expression of engine size, or a representation of the power an engine is capable of exerting and the amount of
fuel it can be expected to consume.
For this exam, you’ll investigate the relationship(s) between mpg, weight in pounds (weight), and
displacement. You’ll use a random sample of 15 vehicles from the summary automotive trends data, which
includes information about vehicle attributes for model years ranging from 1975 to 2021.
Below, Figure 1 shows the distribution of mpg in a random sample of 15 vehicles.
Density
0.15
0.10
0.05
0.00
15
20
25
30
Miles per gallon
Figure 1: Histogram and density curve for miles per gallon of 15 randomly sampled vehicles.
Figure 2 is a matrix of the correlation coefficients (r2 ) between mpg, weight, and displacement. Note that
the diagonal is made up of the correlations between each variable and itself, or 1.
2
Midterm
Tff
displacement
mpg
weight
PSTAT 120C, Summer 2022
÷ost-÷L
1
0.8
mpg
−0.72
1
−0.87
0.6
0.4
0.2
weight
−0.72
1
0.81
0
−0.2
−0.4
displacement
−0.87
0.81
1
−0.6
−0.8
−1
Figure 2: Correlation plot of miles per gallon, weight in pounds, and engine displacement.
Finally, the data themselves are presented in Table 1 at the end of this document. They are also available as
a .csv file for download on GauchoSpace.
The overall question of interest throughout this exam is: Should miles per gallon be predicted based
on weight alone, or on the linear combination of weight and displacement?
3
PSTAT 120C, Summer 2022
FETID
Midterm
F-
1. Answer the following based on a simple linear regression, predicting mpg (y) with weight (x1 ).
a. Fit the specified model. Write the model equation, including your estimates.
5 pts for model fitting; 5 pts for equation
r
data %
select(-c(manufacturer, horsepower))
data %>%
lm(mpg ~ weight, data = .) %>%
summary()
##
## Call:
## lm(formula = mpg ~ weight, data = .)
##
## Residuals:
##
Min
1Q Median
3Q
Max
## -3.4600 -2.1210 -0.6158 1.6716 7.0659
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.267655
5.038457
7.992 2.26e-06 ***
## weight
-0.004678
0.001267 -3.692 0.00271 **
## –## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
##
## Residual standard error: 3.131 on 13 degrees of freedom
## Multiple R-squared: 0.5119, Adjusted R-squared: 0.4744
## F-statistic: 13.63 on 1 and 13 DF, p-value: 0.002709
1
The model equation can be written as:
$$
\hat{y}_{mpg}=40.27-0.005x_{weight}
$$
b. Create a scatterplot of mpg and weight. Add a line representing the model, with 95% confidence
bands. Does the model appear to fit the data?
5 pts for correct plot; 5 pts for discussion
data %>%
ggplot(aes(x = weight, y = mpg)) +
geom_point() +
geom_smooth(method = “lm”, se = T) +
theme_bw()
4
PSTAT 120C, Summer 2022
FILITTI
Midterm
F-
30
mpg
25
20
15
3000
3500
4000
4500
5000
weight
The model appears to fit the data reasonably well, although, since the sample size is small, this is
difficult to assess. There are some points that appear to fall far from the line, but this is likely to
be due to sampling error; overall, the model looks fine.
c. Test the null hypothesis that the slope of x1 , —1 , is equal to zero. State the hypotheses, test
statistic, rejection region(s), and p-value. Do not interpret the conclusion of this test.
2 pts for hypotheses; 2 pts for RR; 2 pts for test statistic; 2 pts for p-value; 2 pts for
not interpreting
Note that this was tested automatically when lm() was run; referring to those results is fine. You
can also test the hypothesis manually. Either way, though, you should state everything as listed.
The hypotheses and rejection region(s) are not printed by lm().
The hypotheses are H0 : —1 = 0; Ha : —1 ”= 0.
This is a two-tailed t-test with 13 degrees of freedom (n ≠ 2), so the rejection region is ≠t –2 fi t –2 ,
or t Æ ≠2.16 fi t Ø 2.16.
Code to fit the model, without using lm():
X