Education Deserts

Problem Statement & Objectives:

Where one lives can have a direct/indirect impact on if and how people obtain secondary education, wherein students who live in “education deserts” may be limited in their professional development, and thus the jobs they ultimately pursue. Education Deserts are defined as places far from post-secondary educational institutions, where “college opportunities are few and far between”. As such, understanding the factors associated with education deserts, their geographic extent (e.g. county level, state level), and what groups of people are most vulnerable to living in an education desert (e.g. minorities) can provide insight on geographical trends of education deserts, but provide an idea on what communities are more likely to be impacted, and how particular context influences these post-secondary choices. To this vein, we took inspiration from previous literature, and did exploratory analysis on both published data, and other data we thought were relevant, and produced descriptive figures/statistics.

Questions of interest:

Institution background information data:

  • How were institutions sampled across by US regions (e.g., Great Lakes, Southwest)?
  • What is the distribution of urbanization levels for the universities?

Geographic extent of Education Desert:

  • What is the distribution of education desert when we examine at the level of Current Metropolitan Statistical Area/Micropolitan Statistical Area (CBSA) ?
  • What is the distribution of education desert data when we examine at the level of commute zones?
  • What is the distribution of education desert when we examine at the county level?
  • How is the ratio of education desert distributed?

Additional factors:

  • How do variables associated to GDP vary by geographic areas and education deserts?
  • Does GDP variables predict education desert?

Integrated analysis:

  • How do urbanization levels vary by education deserts?

Defining Education Desert:

We define an education desert to be an area with 2 or less universities/colleges within a specified region*.

Data sources

  • The Integrated Postsecondary Education Data System provides information about a majority of US colleges and institutions. The data is obtained through surveys conducted by the US Department of Education’s National Center for Education Statistics. The data represents information from roughly 7,000 colleges and universities that participate in federal student aid programs. Through IPEDS, we found data on directory information from these colleges and universities. This data allowed us to find the regional and urbanization level data on the institutions.
  • We use colleges and universities data from homeland infrastructure foundation-level data (HIFLD) website. The data covers all 50 states, as well as Puerto Rico and other assorted U.S. territories. Data includes the specific location of each college and university, which allow us to explore the geographical distribution of education desert.
  • To study our variables by CBSA or Commute Zones we downloaded the corresponding polygon features from here, and here, respectively. Then we merged these polygons with GDP data, and University information to map the extent of Education Desert according to these two region types, and study the potential influence of GDP on Education Desert.

How did we visualize/analyze data?

  • To find the regional distribution of the institutions, we used the IPEDS Directory Info again. One of the variables gave us information on what regional classification each institution belonged to. We used this row of data, aggregated it to give us the count of each regional classification, and graphed it in a bar chart. After this, we graphed each region in a bar chart divided by education desert classification.
  • To find the urbanization level for the institutions, we used the Directory Info from the IPEDS data set. One of the directory variables classified each institution by a locale code (city, suburb, town, or rural), and each locale code had an additional classification (large, midsize, small, or fringe). This variable allows us to understand the urbanization level of each institution. To give a broad overview based on the general locale code, we simply sorted each institution by city, suburb, town, or rural. After aggregating the data, we created a pie chart, which summarizes the broad urbanization level. Then, we used the whole row of locale data, aggregated it to give us the count of each urbanization level, and graphed it in a bar chart.
  • To visualize how education deserts are distributed in the US, we created maps in geographic information system (GIS) with HIFLD data, which provide the location of colleges and universities. Then we calculated average education desert ratio of each state.


Institution background information data:

Based on the figure, we can see that the Southeast region of the US has the most universities. The Mid East, Great Lakes, and Far West also have a large count of institutions.

Barplot of education desert by region and estimated from commute zones. Based on the figure, we can see that the Southeast and Plains have the highest proportion of education deserts to non-education deserts (Desert=0, NonDesert =1). New England has a very small amount of education deserts, and the smallest proportion based on these regions.

We associate the number of institutions with the ratio of education desert. It is expected that institution number is negatively correlated with ratio. However, the association in some regions is contrary to the hypothesis. Southeast region has the most institutions while over 75% states in southeast are covered by education desert. On the contrary, there are not many institutions in New England while the average desert ratio in that region is relatively low. One possible reason is the dispersion degree – institutions are dispersed in southeast while gathered in New England region.

Almost half of institutions are in the city, and a large majority of institutions are found at the city or suburb urbanization level.

This data goes further in depth to analyze the count at different urbanization levels, including large, midsize, and small categories for each urbanization type. Large suburbs contain the most institutions, with large cities containing the second largest number of institutions.

Geographic extent of Education Deserts:

The geographical distribution of colleges and universities in the US based on county level. The counties with lower than 2 colleges and universities are shown in red. More area covered by education deserts appear in the east region.

We calculate education desert ratio of each state as the proportion of counties in that state defined as education desert. States in red i.e. Texas, Montana, Alabama have very high ratio (over 80%) of education desert. The ratio of states shown in green i.e. California, Maryland is less than 20%.

Brief discussion:

  • We calculate the education desert ratio of each state.We found that U.S. Virgin Islands, Northern Mariana Islands, American Samoa are completely education desert, and no education deserts appear in District of Columbia, Delaware, Guam.
  • We explore the distribution of education desert on county level. We found that a large proportion of north west, south west, and Alaska are covered by education desert.

Future Work:

  1. Define ‘Education Desert’ more realistically. For simplicity, we define desert based on the number of universities and colleges without considering the school type (public, private, community), tuition, acceptance rate. Those variables do affect the access to education and worth considering.
  2. Analyze whether people from education deserts tend to move back to education deserts after graduating from a postsecondary institution. How might this lead to the reproduction of education deserts?
  3. Analyze the relationship between urbanization levels and education desert locations. At what urbanization level are most education deserts found?

Data Access/Code

Data and analysis for GDP, CBSA and Commute Zones can be found here: Note, not all preliminary data/code are uploaded but can be made available upon contact of the Education Project group.