The Data

The data this week comes from Data.World and Data.World and was originally from the NCES.

The datasets describe different subsets of students: high school, bachelor’s degree, and HBCU. I’m not 100% sure how each group is defined (e.g., are there graduate students included in the HBCU group?). But let’s look at the HBCU enrollment data in more detail.

hbcu_all <- readr::read_csv("hbcu_all.csv")
hbcu_black <- readr::read_csv("hbcu_black.csv")
names(hbcu_all)
##  [1] "Year"             "Total enrollment" "Males"            "Females"         
##  [5] "4-year"           "2-year"           "Total - Public"   "4-year - Public" 
##  [9] "2-year - Public"  "Total - Private"  "4-year - Private" "2-year - Private"

The main tidying for the dataset is to create a few long columns instead of the current wide data. I looked at the different columns and thought about which variables we could create. I came up with the following list of new columns that I’d like to have:

I found the following help in advanced pivoting, and realized that if I changed the name of the variables, they would be in a format that would make it easy to use pivot_longer() quite directly. Note that the majority of the columns don’t have any gender information (that is, gender is combined into one column), so I indicate the combined gender information with “X”.

names(hbcu_all) <- c("year", "total_all_X", "total_all_M", "total_all_F",
                     "4yr_all_X", "2yr_all_X", "total_public_X", "4yr_public_X", "2yr_public_X",
                     "total_private_X", "4yr_private_X", "2yr_private_X")

names(hbcu_black) <- c("year", "total_all_X", "total_all_M", "total_all_F",
                     "4yr_all_X", "2yr_all_X", "total_public_X", "4yr_public_X", "2yr_public_X",
                     "total_private_X", "4yr_private_X", "2yr_private_X")


hbcu <- hbcu_all %>%
  pivot_longer(
    cols = -year,
    names_to = c("degree", "funding", "gender"),
    names_sep = "_",
    values_to = "enrollment"
  ) %>%
  mutate(race = "all")

hbcuB <- hbcu_black %>%
    pivot_longer(
    cols = -year,
    names_to = c("degree", "funding", "gender"),
    names_sep = "_",
    values_to = "enrollment"
  ) %>%
  mutate(race = "Black")


hbcu_full <- rbind(hbcu, hbcuB)

Visualize changes in enrollment

#devtools::install_github("ciannabp/inauguration")
library(inauguration)
#inauguration("inauguration_2021_bernie")
hbcu_full %>%
  filter(gender == "X") %>%
  ggplot(aes(x = year, y = enrollment, color = race, 
             group = interaction(degree, funding, race), linetype = funding)) +
  geom_line() +
  facet_wrap(~degree) +
  scale_color_manual(values = inauguration("inauguration_2021")[c(6,1)])

hbcu_full %>%
  filter(gender == "X") %>%
  ggplot(aes(x = year, y = enrollment, color = degree, 
             group = interaction(degree, funding, race), linetype = funding)) +
  geom_line() +
  facet_wrap(~race) +
  scale_color_manual(values = inauguration("inauguration_2021")[c(2,3,5)])

Important Closing

praise()
## [1] "You are remarkable!"