Chapter 3 Functions

This section explains the various functions used throughout our research.

3.1 Data Uploading Functions

The functions here serve as cleaning functions such that a ready to use dataset gets uploaded to the database as a data table.

convertLogicalToInt function: input: dataset output: dataset with logical values set to binary

getDFName: input: name of dataset (this could also be name of link from web scrape) output: clean dataset name that includes state and city

uploadLinksToDatabase: input: string link from web scrape outout: 0 - indicating that the code finished executing This function will take in a RDS link and write the dataset to the database

3.2 Veil of Darkness Functions

Veil of Darkness functions work with lutz and lubridate to determine whether a stop took place in the dark.

The first step in getting the sunset and sunrise times is to get the coordinates for the city. To do this, we can Google search the city name and webscrape the Google search results.

The two code chunk below has functions that will perform this process. get_cityNames takes in a datatable’s name and cleans it so that the google search engine result will return the desired page that has the coordinates.

get_coordinates takes in said clean city names and scrape the google search web result. The function will return a vector of doubles. The first index is for latitude and second index has longitude.

Next, we will define two helper functions in getting that call the lutz package and retrieve the times.

outsunriseset input: latitude (dbl), longitude(dbl), date(Date or Posix), timezime (tz), direction(string) output: Date with time of the desired sun direction

time_to_minute is the helper function the Stanford Open Policing Project uses in their tutorial to help convert times into a numeric values that’s easier to manipulate - this will be useful when splicing out times between sunset and dusk to remove ambiguouity in the intertwilight zone. input: time (character) output: minutes (double)

add_night_day function utilizes oursunriseset function to mutate sunrise and sunset times and a binary variables to see if the stop happened in the night of day. In addition, this function takes out the intertwilightzone i.e. stops between sunset and dusk and dawn and sunrise.

3.3 Nationwide Functions

These are functions used when analyzing multiple datatables from the database.

relevant_datasets: input: - all dataset names which can be acquired through SHOW TABLES SQL command - A vector containing string of variables names we wish to dissect output: a character string of dataset names

query_data: input: name (character) output: dataframe Note: the command variable must be modified to the variables you are examining

fix_ages quickly sets any ages to a dbl data type

logistic_regression input: - city dataset (dataframe) - name of city (character) output: a dataframe where the columns are the coefficients (only returns one row of coefficient matrix; this function is insteaded to be used in a for loop or mapply so that we run the logistic regression on every dataset)