Project 51
Project description

Use linear regression to determine a method of predicting the mass of an M&M from its diameter as well as possible, where the M&M may be any color and may be any type.

Background on the data set

This data set was collected in the summer of 2008. Every M & M candy from three Medium Size bags of M & Ms was measured. One bag was of plain M & Ms, (14.0 oz. or 396.9 g), one bag was of peanut M & Ms (also 14.0 oz. or 396.9 g), and one bag was of peanut butter M & Ms (12.7 oz. or 360 g). As summarized in the table below, the data set has four variables: type, color, diameter, and mass.

The variable diameter refers to the shortest distance from side to side at the candy's widest height when it is placed flat on the table with the "m" facing up. Put otherwise, when the candy is placed in that position, imagine taking horizontal cross-sections of the candy. They will be roughly elliptical. The diameter of the candy is the length of the minor axis of the largest such cross-sectional ellipse (which will generally be the cross-section at half the total height). As you might expect, this axis can be somewhat difficult to determine and was no doubt a source of measurement error, but this definition of diameter does correspond fairly well to the way that an M & M fits into a caliper.

Diameters were measured with a General Tools Ultratech Fraction+ Digital Fractional Caliper (claimed accurate up to plus or minus 0.02mm), and masses were measured with a MyWeigh Durascale 50 (claimed accurate up to plus or minus 0.01 g). The candies were measured in the order given in the data set, which although not entirely random was not intentionally systematic in any way (other than by type).

Variables in the data set
The variables in the data set are as follows:
NameUnitsDescription
typepeanut, peanut butter, or plaintype of M&M
colorblue, brown, green, orange, red, or yellowcolor of the candy
diametermillimetersdiameter of the candy
massgramsmass of the candy
Link to the data set
The full data set in csv format is at:
http://hoard.projectivespace.com/datasets/mms.csv