MAT 137 DePaul University Descriptive Statistics Discussion
CARAT COLOR CLARITY CERT0.76
D
IF
GIA
0.31
D
VS1
GIA
0.53
D
VS1
GIA
0.71
D
VS1
GIA
1
D
VS1
GIA
0.3
D
VS2
GIA
0.52
D
VS2
GIA
1
D
VVS1
GIA
1.01
D
VVS1
GIA
0.75
D
VVS2
GIA
0.63
E
IF
GIA
0.3
E
VS1
GIA
0.31
E
VS1
GIA
0.34
E
VS1
GIA
0.35
E
VS1
GIA
0.5
E
VS1
GIA
0.5
E
VS1
GIA
0.52
E
VS1
GIA
0.54
E
VS1
GIA
0.5
E
VS1
GIA
0.56
E
VS1
GIA
0.7
E
VS1
GIA
0.71
E
VS1
GIA
0.72
E
VS1
GIA
1
E
VS1
GIA
1.01
E
VS1
GIA
1.03
E
VS1
GIA
0.33
E
VS2
GIA
0.73
E
VS2
GIA
0.83
E
VS2
GIA
1
E
VS2
GIA
1.01
E
VS2
GIA
0.62
E
VVS1
GIA
0.46
E
VVS2
GIA
0.55
E
VVS2
GIA
1
F
IF
GIA
0.31
F
VS1
GIA
0.32
F
VS1
GIA
0.34
F
VS1
GIA
0.35
F
VS1
GIA
0.36
F
VS1
GIA
0.4
F
VS1
GIA
0.41
F
VS1
GIA
PRICE
9885
1641
3921
6372
11419
1302
3490
15582
16008
7368
6512
1510
1555
1693
1738
3501
3501
3635
3767
3501
3900
5800
5881
5961
10588
10692
10900
1327
5738
7156
9757
9853
5845
2942
4138
13913
1427
1468
1551
1593
1635
1911
1956
0.5
0.52
0.53
0.55
0.56
0.6
0.7
0.71
0.9
1
1.01
1.02
1.04
0.37
0.71
0.72
0.85
0.7
0.7
1
1.01
1.02
0.54
0.51
0.7
0.77
0.5
0.51
0.52
0.53
0.71
1
0.3
0.34
0.35
0.5
0.51
0.7
0.7
1
0.32
0.34
1
1
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
G
G
G
G
G
G
G
G
G
G
G
G
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS2
VS2
VS2
VS2
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
3293
3418
3480
3605
3667
4291
5510
5586
7680
10713
10272
10372
10571
1420
5193
5263
6805
5122
5122
9480
9573
9666
4066
3851
6285
6919
3501
3567
3635
3701
5881
10588
1260
1410
1447
3016
3205
5122
5122
9619
1202
1269
9169
9203
1.06
0.3
0.64
0.5
0.31
0.48
0.53
0.55
0.57
0.59
0.63
0.74
1
1.02
0.55
0.6
0.34
0.37
0.4
0.89
0.73
0.73
0.78
1.01
0.31
0.34
0.35
0.84
0.74
1
1.01
1.06
1.1
0.66
0.57
0.72
0.36
0.43
0.7
0.71
0.86
0.7
0.71
1.04
G
G
G
G
G
G
G
G
G
G
G
G
G
G
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
I
VS2
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
IF
IF
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS1
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VS2
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
IF
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
9743
1510
4759
3432
1427
2532
3407
3529
3651
3773
4401
5815
9896
10090
3605
4291
1316
1420
1525
6709
5030
5030
5386
9153
1126
1222
1255
5705
4585
8788
8873
9302
9646
4300
3415
5662
1485
1747
5122
5193
6882
5122
5193
9563
0.31
0.45
0.73
0.75
1
0.33
0.7
0.82
0.8
1
1.01
0.73
1.01
0.56
0.8
0.9
0.75
1.01
1.05
1.07
1
1.01
1
1.01
0.52
0.81
0.7
0.81
1
0.7
1
1.01
0.8
0.56
0.8
0.81
1.01
0.56
0.57
0.73
0.82
0.85
0.5
0.51
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
D
D
E
E
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
F
VS1
VS1
VS1
VS1
VS1
VS2
VS2
VS2
VS2
VS2
VS2
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
VS1
VS1
VS2
VS2
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
IF
VS1
VS1
VS1
VS1
VS2
VS2
VS2
VS2
VS2
VVS1
VVS1
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
GIA
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
1126
1572
4221
4355
8095
1098
3861
4948
4832
7818
7895
4727
8873
2892
5441
6682
4667
8455
8781
8945
13775
13909
10588
10692
3346
6988
6867
8715
14051
6285
11419
11531
8611
3667
6905
6988
10272
3202
3256
5333
6572
6805
3778
3851
0.53
0.85
1
1.01
0.53
1
1.02
0.71
0.6
0.81
0.6
1
1.01
0.5
0.7
0.73
0.85
1
0.55
0.7
0.8
0.82
1
1.01
0.58
0.81
0.8
1
1.01
0.7
0.86
1
1.02
0.52
0.57
0.6
0.66
0.72
0.74
1
0.61
0.64
0.71
0.8
F
F
F
F
F
F
F
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
IF
VS1
VS1
VS2
VS2
VS2
VVS1
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
VVS2
IF
IF
VS1
VS1
VS1
VS2
VS2
VS2
VS2
VVS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
3995
8359
11696
11811
3701
10588
10796
6372
3925
6495
3421
9203
9293
3432
5800
6041
7711
10450
3529
5510
6905
7072
9896
9993
3792
7358
6051
9065
9153
4346
5835
8788
8959
3130
3415
3925
4300
5662
5815
9480
3616
3785
5193
6416
1.01
1.06
1
1.01
1
1
1.01
0.62
0.65
1
1.09
0.2
0.21
0.71
0.19
0.19
0.21
0.22
0.23
0.58
0.3
0.31
0.51
0.56
0.19
0.21
0.23
0.25
0.26
0.27
0.48
0.18
0.19
0.26
0.31
0.34
0.35
0.51
0.58
0.18
0.19
0.26
0.3
0.34
H
H
I
I
I
I
I
I
I
I
I
D
D
D
D
E
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
VVS2
VVS2
VS1
VS1
VS2
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VS1
VS1
VS1
VVS2
IF
IF
IF
IF
VVS1
VVS2
VVS2
VVS2
VVS2
IF
IF
IF
IF
IF
IF
VS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
HRD
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
9433
9890
8095
8175
7818
8788
8873
3615
3643
8372
9107
880
919
6160
967
1050
1149
1198
1248
4831
1580
1628
3722
4070
967
1057
1147
1485
1539
1595
2383
823
863
1365
1628
1773
1821
3722
4209
765
800
1260
1459
1636
0.47
0.55
0.76
0.18
0.18
0.19
0.2
0.21
0.23
0.25
0.29
0.4
0.5
0.2
1.01
0.2
0.19
0.3
0.58
0.7
0.18
0.35
0.56
0.7
0.78
0.18
0.19
0.24
0.25
0.27
0.32
0.33
1.01
0.3
1
0.25
0.26
0.28
0.29
0.3
0.31
0.52
1.01
0.41
F
F
F
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
H
H
H
H
H
H
H
H
H
H
I
I
I
I
I
I
I
I
I
VVS2
VVS2
VVS2
IF
IF
IF
IF
IF
IF
IF
IF
IF
IF
VS1
VS1
VS2
VVS1
VVS1
VVS1
VVS1
VVS2
VVS2
VVS2
VVS2
VVS2
IF
IF
IF
IF
IF
IF
IF
VS2
VVS2
VVS2
IF
IF
IF
IF
IF
IF
IF
VS1
VVS1
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
IGI
2651
3706
6095
803
803
842
880
919
995
1283
1471
2276
3652
705
9713
638
800
1459
3821
5607
705
1540
3470
5326
5937
725
758
1108
1149
1233
1462
1503
8873
1218
9342
1082
1121
1199
1238
1299
1337
3095
8175
1616
0.41
I
VVS2
IGI
1506
MAT 137: Business Statistics
Project 1: Descriptive Statistics
Purpose: The purpose of this project is to summarize a data set graphically and
numerically using Excel.
Problem Description: We have seen how to summarize a quantitative data set using
numerical and graphical methods, such as histogram, sample mean, and sample standard
deviation. In this project, you will work on a real data set of color and clarity of
diamonds.
Data Set Background: Diamonds are categorized according to the “four Cs”: carats, clarity,
color, and cut. Each diamond stone that is sold on the open market is provided a
certificate by an independent diamond assessor that lists these characteristics. Data for a
sample of 308 Round diamond stones that were listed for sale in the February 18, 2000
edition of Singapore’s Business Times is given in the file diamond-data.xlsx [on D2L].
For the diamonds in the sample, color is classified as either
• D – absolutely colorless (highest grade)
• E – colorless
• F – colorless
• G – near colorless (slight warmth to their tone)
• H – near colorless (faint yellow hue)
• I – near colorless (slight yellow
tint) while clarity is classified as
either
• IF – internally flawless
• VVS1 – very, very slightly included (1st degree)
• VVS2 – very, very slightly included (2nd degree)
• VS1 – very slightly included (1st degree)
• VS2 – very slightly included (2nd degree).
In addition to color and clarity, the independent certification group
• GIA – Gemological Institutive of America
• HRD – Hoge Raad Voor Diamant (Flemish for Diamond High Council)
• IGI – International Gemological Institute
the number of carats (a unit of weight used to measure the sizes of gemstones), and
the asking price (in Singapore dollars) were recorded.
This data set contains five variables:
• Carat: quantitative
• Color: qualitative with six classes [D,E,F,G,H, and I]
• Clarity: qualitative with five classes [IF, VS1,VS2, VVS1, and VVS2]
• Cert: qualitative with three classes [GIA, HRD, and IGI]
• Price: quantitative
Submission: This is an individual project. Your task for this project is to write a twopage report that uses various numerical and graphical methods to summarize the
information in the data set. Your report should not simply be a collection of graphs,
tables, and numerical statistics, but should also include text that helps to summarize
and explain what the graphs, tables, and numerical statistics are showing. You will
submit both an overall report that includes appropriate graphs, tables, numerical
statistics, and interpretations of the results within it, and also the Excel file you used
to produce your work. The graphs, tables, and other calculations within the Excel file
should be labeled clearly enough that someone else could figure out the meaning of
things within the file. When writing the report, graphs and tables can be copied and
pasted from Excel into Word.
Writing Requirement: Writing skills are important throughout one’s career. The two-page
report should start with an intro paragraph of the general description and motivation of
the project and end with a summary paragraph of the conclusions you would like to
show. The report should be written in complete English sentences organized into
paragraphs and should be readable by anyone unfamiliar with the specific project. It
should convey relevant information to the reader in a clear, concise, and effective
manner.
Answer the following questions in your written report (not point-by-point, incorporate the
answers into your report.):
1. Consider only the “Color” variable, and ignore the other columns in the Excel file.
Using appropriate graphs or charts, show what proportion of the diamonds listed for
sale were of each color. In particular, discuss the most common and least
common colors in the sample.
2. Consider only the “Clarity” variable, and ignore the other columns in the Excel file.
Using appropriate graphs or charts, show what proportion of the diamonds listed
for sale were of each clarity classification. In particular, discuss the most common
and least common clarity classification in the sample.
3. Consider only the “PRICE” variable, and ignore the other columns in the Excel file.
Describe the prices (in Singapore dollars) of the diamonds in the sample: the
smallest and largest values, the range of prices, the mean and standard deviation,
and the median and first and third quartiles.
4. Describe the distribution of prices using either a frequency histogram or a relative
frequency histogram. Discuss the shape of the “PRICE” distribution: Is it leftskewed, right-skewed, or symmetric? How is the shape of the data reflected in the
values of the mean price and median price?
5. Consider the two variables “CARAT” and “PRICE”, and ignore the other columns in
the Excel file. Describe the relationship between the size of the diamond (in carats)
and the asking price (in Singapore dollars) by creating a scatterplot with the
number of carats on the horizontal axis and the price on the vertical axis and
describing what sort of general relationship you observe.
Submission on D2L: Download Project-1 Original Data from D2L, and then rename it
diamond-data- YOURLASTNAME.xlsx before you begin. For your report, Submit
both the Excel spreadsheet with all calculations and the written report through
submission box on D2L. The due date is September 25, 2022.