Reading in data

The data is read in, and all NA records are omitted to eliminate any entries that weather data is not present for.

dat <- read.csv("~/INST377/Food_Inspection_Build/dc2020/dots_data.csv") 
dat <- na.omit(dat)

Including Plots

For this model, we are focusing on one location to simplify it, the South Gate, South View. The data is then partitioned randomly into a 20/80 split of testing/training data. The knots forthe model are also created, simpler than the final model to make the model smoother.

dat_filter <- filter(dat, location == 'South_Gate_South_View')
set.seed(123) 
training.samples <- dat_filter$time_of_day %>% 
    createDataPartition(p = 0.8, list = FALSE) 
train.data <- dat_filter[training.samples, ] 
test.data <- dat_filter[-training.samples, ] 
knots <- quantile(train.data$time_of_day, p = c(0, 0.25, 0.5, 0.75, 1))

Model of Cars

The model is built using spline regression to emulate the curves of the dataset, while temperature is used as a linear component.

cmodel <- lm(cars ~ bs(time_of_day, knots = knots) + Temperature, data = train.data) 
ggplot(train.data, aes(time_of_day, cars), time_of_day) +
  geom_point() +
  stat_smooth(method = lm, formula = y ~ splines::bs(x, df = 5))

predictions <- cmodel %>% predict(test.data)

## Warning in predict.lm(., test.data): prediction from a rank-deficient fit
## may be misleading

data.frame(   RMSE = RMSE(predictions, test.data$cars),   
              R2 = R2(predictions, test.data$cars) )

##       RMSE       R2
## 1 136.4363 0.830443

summary(cmodel)

## 
## Call:
## lm(formula = cars ~ bs(time_of_day, knots = knots) + Temperature, 
##     data = train.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -546.52  -75.94   -3.26   82.51  489.27 
## 
## Coefficients: (2 not defined because of singularities)
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      239.4151    34.8120   6.877 1.03e-11 ***
## bs(time_of_day, knots = knots)1  -73.1228    37.1033  -1.971   0.0490 *  
## bs(time_of_day, knots = knots)2  -25.3999    40.2329  -0.631   0.5280    
## bs(time_of_day, knots = knots)3 -363.8199    36.5769  -9.947  < 2e-16 ***
## bs(time_of_day, knots = knots)4  -98.0190    39.4937  -2.482   0.0132 *  
## bs(time_of_day, knots = knots)5  686.8060    32.8200  20.926  < 2e-16 ***
## bs(time_of_day, knots = knots)6  630.4445    47.6572  13.229  < 2e-16 ***
## bs(time_of_day, knots = knots)7        NA         NA      NA       NA    
## bs(time_of_day, knots = knots)8        NA         NA      NA       NA    
## Temperature                        0.2050     0.5025   0.408   0.6833    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 144.8 on 1071 degrees of freedom
## Multiple R-squared:  0.8092, Adjusted R-squared:  0.8079 
## F-statistic: 648.9 on 7 and 1071 DF,  p-value: < 2.2e-16

Testing the data shows a RMSE of 136.4, and an R^2 of 0.830. The coefficient of Temperature is 0.205 with a pvalue of 0.6844, meaning that temperature is not a significant predictor for cars driving through the South Gate, South View.

Model of Pedestrians

The model is built the same way as above, but there are spikes in pedestrian traffic as classes transition during the week, making a model less reliable.

tmodel <- lm(pedestrians ~ bs(time_of_day, knots = knots) + Temperature, data = train.data) 
ggplot(train.data, aes(time_of_day, pedestrians), time_of_day) +
  geom_point() +
  stat_smooth(method = lm, formula = y ~ splines::bs(x, df = 5))

predictions <- tmodel %>% predict(test.data)

## Warning in predict.lm(., test.data): prediction from a rank-deficient fit
## may be misleading

data.frame(   RMSE = RMSE(predictions, test.data$cars),   
              R2 = R2(predictions, test.data$cars) )

##       RMSE        R2
## 1 331.4444 0.7901993

summary(tmodel)

## 
## Call:
## lm(formula = pedestrians ~ bs(time_of_day, knots = knots) + Temperature, 
##     data = train.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -247.11  -36.48   -6.08   19.73  698.71 
## 
## Coefficients: (2 not defined because of singularities)
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -31.1998    23.3243  -1.338    0.181    
## bs(time_of_day, knots = knots)1   32.9545    24.8596   1.326    0.185    
## bs(time_of_day, knots = knots)2  -19.5978    26.9564  -0.727    0.467    
## bs(time_of_day, knots = knots)3   11.7775    24.5068   0.481    0.631    
## bs(time_of_day, knots = knots)4 -132.1387    26.4611  -4.994 6.91e-07 ***
## bs(time_of_day, knots = knots)5  397.1060    21.9897  18.059  < 2e-16 ***
## bs(time_of_day, knots = knots)6  299.0134    31.9308   9.364  < 2e-16 ***
## bs(time_of_day, knots = knots)7        NA         NA      NA       NA    
## bs(time_of_day, knots = knots)8        NA         NA      NA       NA    
## Temperature                        1.4470     0.3367   4.298 1.88e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97 on 1071 degrees of freedom
## Multiple R-squared:  0.6389, Adjusted R-squared:  0.6365 
## F-statistic: 270.7 on 7 and 1071 DF,  p-value: < 2.2e-16

Testing the data shows a RMSE of 331.4, and an R^2 of 0.7902. The coefficient of Temperature is 1.47 with a pvalue of approximately 0, meaning that at with an alpha of 0.01, for every additional degree Fahrenheit, there are 1.47 more people walking every 15 minutes at the South Gate, South View. Due to outliers and spikes previously mentioned, any change in the seed drastically changes this data, but the Temperature coefficient tends to stay positive.

Conclusions

As temperature rises, cars decrease and pedestrians increase at the South Gate, South View. However, the effect of this is minute compared to the effect of time of day, so for the final model, temperature will not be included.

Effect of Temperature on the Model

Luke Gibson

February 29, 2020

Reading in data

Including Plots

Model of Cars

Model of Pedestrians

Conclusions