Project 19
Project description

In this project, use the model that you set up in Project 9.

  1. Test the hypothesis that the mean population of all cities and towns having zero area equals $0$. What does the result of this hypothesis test tell you about the suitability of the model for cities and towns with areas near zero, given what you know about the data?
  2. Explore the data graphically to try to understand why simple linear regression is producing such seemingly odd predictions for cities and towns with very small areas. Explain what you find.
  3. What happens if you develop a model using only cities and towns with small areas, throwing all those with larger areas out of the analysis entirely? (First, do some exploring to decide for yourself how small small is.) Does this produce an improved model that is more useful for predictions for cities and towns with small areas?

Background on the data set

The data consist of the April 1, 2007 populations and land areas of the 281 cities and towns of Washington State, obtained from the 2007 Washington State Data Book.

There are some "incorporated and unincorporated" areas in Washington state (with a total population of approximately 5.9 million in 2000) not included in this list of cities and towns. Also, the website notes that: "Land area by city was derived from a 1980 survey of cities with annexed territory added. Some of the city provided 1980 land areas are not accurate. Some land areas have been corrected. Others will be corrected."

Data Source: Washington State Data Book website, maintained by the Office of Financial Management for the State of Washington. (Accessed August 31, 2008.)

Variables in the data set
The variables in the data set are as follows:
name(municipality name)name of Washington municipality
populationpeoplenumber of residents
areasquare milesarea of municipality
Link to the data set
The full data set in csv format is at: