Code
library(tidyverse) # ggplot, lubridate, dplyr, stringr, readr...
library(praise)
library(tidytext)
library(rvest)library(tidyverse) # ggplot, lubridate, dplyr, stringr, readr...
library(praise)
library(tidytext)
library(rvest)This week we’re exploring the weather prediction of Zurich’s infamous exploding snowman!
The Boeoegg is a snowman effigy made of cotton wool and stuffed with fireworks, created every year for Zurich’s “Sechselaeuten” spring festival. The saying goes that the quicker the Boeoeg’s head explodes, the finer the summer will be.
Thank you to Matt for curating this week’s dataset.
sechselaeuten <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-12-02/sechselaeuten.csv') |>
filter(year > 1950)
global <- read_csv("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series/globe/land_ocean/tavg/12/0/1923-2025/data.csv",
skip = 3) |>
mutate(year = as.numeric(str_sub(Date, 1, 4)),
month = as.numeric(str_sub(Date, 5, 6)))Using data from NOAA, we can find the average global temperature difference (per month) from the 1901-2000 average. The variable temp represents the degrees different (in Celsius) from the 1901-2000 average for each of the summers in the Sechselaeuten dataset.
summer <- global |>
filter(month %in% c(7, 8, 9)) |>
group_by(year) |>
summarize(temp = mean(Anomaly))
global_snowman <- sechselaeuten |>
inner_join(summer, by = "year")While the lore suggests that the time to explosion might predict the weather in the following summer, it also seems plausible that the temperature might predict the time to explosion. While not a strong predictive model, it does seems as though the variables are correlated. (I think the average temp is averaged over all months in the year prior???)
global_snowman |>
ggplot(aes(y = duration, x = tre200m0, color = year)) +
geom_text(aes(color = year), label = "snowman", size = 3,
family = "Font Awesome 5 Free Solid") +
labs(x = "Average monthly air temp, Celsius",
y = "Time from ignition until explosion, minutes")Looking at a few more variable relationships:
global_snowman |>
ggplot(aes(x = year, y = duration)) +
geom_point()global_snowman |>
ggplot(aes(x = temp, y = duration)) +
geom_point() +
labs(x = "")cor(global_snowman$duration, global_snowman$temp, use = "pairwise.complete.obs")[1] 0.3841555
cor(global_snowman$duration, global_snowman$year, use = "pairwise.complete.obs")[1] 0.3792312
cor(global_snowman$duration, global_snowman$tre200m0, use = "pairwise.complete.obs")[1] 0.2611826
praise()[1] "You are stylish!"