Project 72
Project description

  1. Suppose you would like to find a confidence interval for the proportion of students in introductory statistics at the University of Puget Sound in the Fall 2002 semester having a high school GPA above 3.5. However, you only have the resources to survey 10 students out of all the sections. To simulate taking a simple random sample from among all Fall 2002 introductory statistics students, suppose you had collected high school GPAs from only the students numbered 24, 51, 53, 80, 158, 160, 163, 181, 188, and 209 in the data set. (Those viewing the data in a spreadsheet will probably see those students on line numbers that are 1 larger than the student numbers because of the column headers.) These numbers were pseudo-randomly generated with a computer program. Pretend for the moment that you do not know the high school GPAs of the rest of the students. Using only the data from these students, who are a essentially a simple random sample of the introductory statistics students that semester, find a 95% confidence interval for the proportion of University of Puget Sound Fall 2002 introductory statistics students having a high school GPA greater than 3.5.
  2. Now compute the actual proportion of University of Puget Sound Fall 2002 introductory statistics students having a high school GPA greater than 3.5, excluding students who did not answer the survey. Is this proportion consistent with the confidence interval you obtained by sampling?

Background on the data set

Each semester in all of the Introductory Statistics sections at the University of Puget Sound, a survey is given to the students during the first week of class. The survey is voluntary and is used as an example data set throughout the class.

The data set given here is a compilation of much of the data collected in the period from the Fall 2002 semester through the Spring 2008 semester. Values that have been determined to be incorrect (such as 8-foot tall students) have been removed from this data set.

Variables in the data set
The variables in the data set are as follows:
NameUnitsDescription
semester1=Fall 2002, 2=Spring 2003, 3=Fall 2004, ... , 12=Spring 2008 course semester
genderF=female, M=malestudent gender
collegeYear1=first-year student (freshman), 2=second-year student (sophomore), etc.year number in college
heightinchesheight of student
weightpoundsstudent weight
pulsebeats per minutestudent pulse at time of survey
hsGPAtraditional 4-point grade scale pointsstudent grade point average in high school
collegeGPAtraditional 4-point grade scale pointsstudent grade point average to date in college
SATMSAT Mathematics pointsstudent SAT Mathematics score (200-800 range)
SATVSAT Verbal pointsstudent SAT Verbal score (200-800 range)
shoeSizeUS shoe size unitsstudent shoe size
financialAidN=no, Y=yesis the student on financial aid for college?
tvHourshours per weekaverage number of hours per week spent watching television during the school year
statesstatesnumber of US states the student has been to
siblingssiblingsnumber of siblings the student has
motherAgeyearsage of the student's mother
fatherAgeyearsage of the student's father
salaryUS dollarsannual salary that the student realistically expects to earn upon graduation
Link to the data set
The full data set in csv format is at:
http://hoard.projectivespace.com/datasets/introStats2002-2008.csv