R - Correlation & Regression
Change scale of x,y coordinates:
+ coord_trans(x = “log10”, y = “log10”), or
+ scale_y_log10() + scale_x_log10()
cor(x,y) to calculate correlation coefficient, use can be used to avoid NA values
ncbirths %>%
summarize(N = n(), r = cor(weight, weeks, use = “pairwise.complete.obs”))
Add best fit line to ggplot, least R2
geom_smooth(method = “lm”, se = FALSE)
Detail of the best fit line:
lm_obj <- lm(y ~ x, data = df)
useful function for the lm object/model (mod)
coef(mod)
fitted.values(mod)
residuals(mod)
summary(mod)
df.residual(mod)
Making prediction from a model and newdata (should have variables with the SAME names as the model)
predict(mod, newdata)
broom package, augment(mod) parse model results/parameters into a dataframe
Leverage concept: Points that are close to the center of the plot have low leverage, while those far from the center have high leverage. Leverage is the .hat column after augment()
Influence = Leverage and residual, how each individual observation affects the slope of the regression line. Measured by Cook’s distance. augment .cooksd
Comments
Post a Comment