The data this week comes from Data.World and Data.World and was originally from the NCES.
The datasets describe different subsets of students: high school, bachelor’s degree, and HBCU. I’m not 100% sure how each group is defined (e.g., are there graduate students included in the HBCU group?). But let’s look at the HBCU enrollment data in more detail.
hbcu_all <- readr::read_csv("hbcu_all.csv")
hbcu_black <- readr::read_csv("hbcu_black.csv")
names(hbcu_all)
## [1] "Year" "Total enrollment" "Males" "Females"
## [5] "4-year" "2-year" "Total - Public" "4-year - Public"
## [9] "2-year - Public" "Total - Private" "4-year - Private" "2-year - Private"
The main tidying for the dataset is to create a few long columns instead of the current wide data. I looked at the different columns and thought about which variables we could create. I came up with the following list of new columns that I’d like to have:
I found the following help in advanced pivoting, and realized that if I changed the name of the variables, they would be in a format that would make it easy to use pivot_longer()
quite directly. Note that the majority of the columns don’t have any gender information (that is, gender is combined into one column), so I indicate the combined gender information with “X”.
names(hbcu_all) <- c("year", "total_all_X", "total_all_M", "total_all_F",
"4yr_all_X", "2yr_all_X", "total_public_X", "4yr_public_X", "2yr_public_X",
"total_private_X", "4yr_private_X", "2yr_private_X")
names(hbcu_black) <- c("year", "total_all_X", "total_all_M", "total_all_F",
"4yr_all_X", "2yr_all_X", "total_public_X", "4yr_public_X", "2yr_public_X",
"total_private_X", "4yr_private_X", "2yr_private_X")
hbcu <- hbcu_all %>%
pivot_longer(
cols = -year,
names_to = c("degree", "funding", "gender"),
names_sep = "_",
values_to = "enrollment"
) %>%
mutate(race = "all")
hbcuB <- hbcu_black %>%
pivot_longer(
cols = -year,
names_to = c("degree", "funding", "gender"),
names_sep = "_",
values_to = "enrollment"
) %>%
mutate(race = "Black")
hbcu_full <- rbind(hbcu, hbcuB)
#devtools::install_github("ciannabp/inauguration")
library(inauguration)
#inauguration("inauguration_2021_bernie")
hbcu_full %>%
filter(gender == "X") %>%
ggplot(aes(x = year, y = enrollment, color = race,
group = interaction(degree, funding, race), linetype = funding)) +
geom_line() +
facet_wrap(~degree) +
scale_color_manual(values = inauguration("inauguration_2021")[c(6,1)])
hbcu_full %>%
filter(gender == "X") %>%
ggplot(aes(x = year, y = enrollment, color = degree,
group = interaction(degree, funding, race), linetype = funding)) +
geom_line() +
facet_wrap(~race) +
scale_color_manual(values = inauguration("inauguration_2021")[c(2,3,5)])
praise()
## [1] "You are remarkable!"