Stack Overflow Annual Developer Survey 2024

Author

Jo Hardin

Published

September 3, 2024

library(tidyverse) # ggplot, lubridate, dplyr, stringr, readr...
library(praise)

Data

This week’s dataset is derived from the 2024 Stack Overflow Annual Developer Survey. Conducted in May 2024, the survey gathered responses from over 65,000 developers across seven key sections.

qname_levels_single_response_crosswalk <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-03/qname_levels_single_response_crosswalk.csv')
stackoverflow_survey_questions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-03/stackoverflow_survey_questions.csv')
stackoverflow_survey_single_response <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-03/stackoverflow_survey_single_response.csv') |>
  mutate(ai_sent2 = case_when(
    ai_sent == 1 ~ "favorable",
    ai_sent == 2 ~ "indifferent",
    ai_sent == 3 ~ "unfavorable",
    ai_sent == 4 ~ "unsure",
    ai_sent == 5 ~ "very favorable",
    ai_sent == 6 ~ "very unfavorable"
  )) |> 
  mutate(ai_sent2 = factor(ai_sent2,
                           levels = c("very unfavorable",
                                     "unfavorable",
                                     "unsure",
                                     "indifferent",
                                     "favorable",
                                     "very favorable")))

AI sentiment and number of years programming

How is AI sentiment distributed across the number of years an individual has spent programming?

stackoverflow_survey_single_response |> 
  ggplot() + 
  geom_bar(aes(fill = ai_sent2, x = years_code),
           position = "fill") +
  labs(fill = "AI sentiment",
       y = "",
       x = "number of years spent programming")

The proportion of developers with a particular AI sentiment, broken down by the number of years the developer has been coding.

Developers across the world

How much faith do developers across the world have in AI? We measure the average sentiment (now mapped to a 1-5 scale that is more meaningful) and average per country. Thank you to @Sarah Penir for the useful code at https://sarahpenir.github.io/r/making-maps/.

survey_country <- stackoverflow_survey_single_response |> 
  mutate(ai_sent = case_when(
    ai_sent == 1 ~ 4,
    ai_sent == 2 ~ 3,
    ai_sent == 3 ~ 2,
    ai_sent == 4 ~ NA,
    ai_sent == 5 ~ 5,
    ai_sent == 6 ~ 1
  )) |> 
  rename(region = country) |> 
  mutate(region = recode(region, 
                         "United States of America" = "USA",
                         "United Kingdom of Great Britain and Northern Ireland" = "UK",
                         "Republic of Korea" = "South Korea",
                         "Democratic People's Republic of Korea" = "North Korea",
                         "Congo, Republic of the..." = "Republic of Congo",
                         "Russian Federation" = "Russia",
                         "United Republic of Tanzania" = "Tanzania",
                         "Côte d'Ivoire" = "Ivory Coast",
                         "Venezuela, Bolivarian Republic of..." = "Venezuela"
)) |>
  group_by(region) |> 
  summarize(ave_sent = mean(ai_sent, na.rm = TRUE),
            n_devel = n()) 
world <- map_data("world")
full_world <- left_join(world, survey_country, by = "region")
full_world |> 
ggplot(aes(x=long, y = lat, group = group)) +
  geom_polygon(aes(fill = ave_sent)) + 
  scale_fill_distiller(palette ="RdBu", direction = -1) + 
  coord_fixed(1.3) +
  theme_void() + 
  labs(fill = "average sentiment")

AI favorability averaged over country.

Note that in Mali, the average AI favorability is extremely high. One might think that the extreme average is due to having very few developers in Mali. Below is a map showing how many developers were surveyed in each country. Indeed, there were very few developers from Mali who filled out the survey (n=2).

full_world |> 
ggplot(aes(x=long, y = lat, group = group)) +
  geom_polygon(aes(fill = log(n_devel, 10))) + 
  scale_fill_distiller(palette ="RdBu", direction = -1) + 
  coord_fixed(1.3) +
  theme_void() + 
  labs(fill = "number of developers\n log10 scale")

Number of developers per country on log10 scale.
praise()
[1] "You are slick!"