Today, we will work with daily water temperature and air temperature data observed for 31 rivers in Spain. The goal of this tutorial is to identify the best model for predicting the maximum water temperature given the maximum air temperature. In the preview below, W
represents the daily maximum water temperature and A
represents the daily maximum air temperature. The data contains almost a full year of data for each of the 31 different rivers.
## # A tibble: 6 x 8
## JULIAN_DAY YEAR L W A TIME MONTH DAY
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2003 103 14.2 21.2 1 1 1
## 2 2 2003 103 14.4 16.8 2 1 2
## 3 3 2003 103 14.4 15.4 3 1 3
## 4 4 2003 103 10.9 10.8 4 1 4
## 5 5 2003 103 10.8 11.7 5 1 5
## 6 6 2003 103 10.7 12.4 6 1 6
Please modify the code chunk to visualize the relationship between A
and W
with scatterplot (set alpha=0.3
) and smooth line.
#
Do You Think This Relationship is the Same for All Locations? Please write a function to visualize the relationship between A
and W
with scatterplot (set alpha=0.3
) and smooth line for a specified Location L
.
Test your function to see if it works:
WAPLOT.func(103)
WAPLOT.func(105)
WAPLOT.func(918)
set.seed(216)
TEST.LOCATIONS=sample(x=unique(DATA$L),size=3,replace=F)
TRAIN = anti_join(DATA,tibble(L=TEST.LOCATIONS),by="L")
TEST = semi_join(DATA,tibble(L=TEST.LOCATIONS),by="L")
Please write a function to visualize the relationship between A
and W
with scatterplot (set alpha=0.3
) and smooth line for a specified Dataset.
Test the function:
WAPLOT2.func(TRAIN)
WAPLOT2.func(TEST)
Please complete the code chunk to fit a linear regression model on the training set. (Formula: W~A
)
TRAIN2 = TRAIN %>% add_predictions(linmod,var="linpred")
TEST2 = TEST %>% add_predictions(linmod,var="linpred")
TRAIN3 = TRAIN2 %>% add_residuals(linmod,var="linres")
TEST3 = TEST2 %>% add_residuals(linmod,var="linres")
poly2mod=lm(W~A+I(A^2),data=TRAIN)
poly3mod=lm(W~A+I(A^2)+I(A^3),data=TRAIN)
poly4mod=lm(W~A+I(A^2)+I(A^3)+I(A^4),data=TRAIN)
anova(linmod,poly2mod,poly3mod,poly4mod,test="Chisq")
Please add the predictions of polynomial regression models to the training and testing sets with column names poly2pred
, poly3pred
and poly4pred
.
TRAIN4 =TRAIN3 %>%
COMPLETE
TEST4 =TEST3 %>%
COMPLETE
Please add the residuals of polynomial regression models to the training and testing sets with column names poly2res
, poly3res
and poly4res
.
TRAIN5 =TRAIN4 %>%
COMPLETE
TEST5 =TEST4 %>%
COMPLETE
\[y=d+\frac{a}{1+e^{(b-cx)}}\]
Please write a function to calculate mean squared error for the nonlinear logistic model.
logistic.model=function(COEF,DATA){
pred=COEF[1]+COEF[2]/(1+exp(COEF[3]-COEF[4]*DATA$A))
}
MSE.logistic=function(COEF,DATA){
COMPLETE
return(mse)
}
logistic.mod=optim(
par=c(min(TRAIN$W,na.rm=T),
max(TRAIN$W,na.rm=T)-min(TRAIN$W,na.rm=T),
median(TRAIN$A,na.rm=T),
1), #Smart Starting Values
fn=MSE.logistic, #Function to Minimize
DATA=TRAIN #Required Argument
)
print(logistic.mod)
TRAIN6=TRAIN5 %>% mutate(logpred=COMPLETE,
logres=COMPLETE)
TEST6=TEST5 %>% mutate(logpred=COMPLETE,
logres=COMPLETE)
The function save.image()
in R can be used to save all objects in the global environment. This is very helpful when you want work off your results without rerunning all previous R code. The name of the exported information should contain the file extension .Rdata. These files can be extremely large depending how much RAM was utilized in your R session. The function load()
can be used to import a previous workspace.
For more information on .Rdata file types, see https://fileinfo.com/extension/rdata for help.
save.image("Tutorial.Rdata")
load("PATH_TO_THE_FILE")
bias.func=function(res){
bias=mean(res,na.rm=T)
return(bias)
}
mae.func=function(res){
mae=mean(abs(res),na.rm=T)
return(mae)
}
rmse.func=function(res){
mse=mean(res^2,na.rm=T)
rmse=sqrt(mse)
return(rmse)
}
Please use apply
function to calculate Bias, MAE and rMSE for the models.
ex.res.mat=TEST6 %>% select(linres,poly2res,poly3res,poly4res,logres)
apply(COMPLETE)
apply(COMPLETE)
apply(COMPLETE)