R - Correlation & Regression

 


Change scale of x,y coordinates:

+ coord_trans(x = “log10”, y = “log10”), or
+ scale_y_log10() + scale_x_log10()

cor(x,y) to calculate correlation coefficient, use can be used to avoid NA values

ncbirths %>%
summarize(N = n(), r = cor(weight, weeks, use = “pairwise.complete.obs”))

Add best fit line to ggplot, least R2

geom_smooth(method = “lm”, se = FALSE)

Detail of the best fit line:

lm_obj <- lm(y ~ x, data = df)

useful function for the lm object/model (mod)

coef(mod)
fitted.values(mod)
residuals(mod)
summary(mod)
df.residual(mod)

Making prediction from a model and newdata (should have variables with the SAME names as the model)

predict(mod, newdata)

broom package, augment(mod) parse model results/parameters into a dataframe

Leverage concept: Points that are close to the center of the plot have low leverage, while those far from the center have high leverage. Leverage is the .hat column after augment()

Influence = Leverage and residual, how each individual observation affects the slope of the regression line. Measured by Cook’s distance. augment .cooksd

Comments

Popular posts from this blog

Jonas - Javascript

R - Supervised Learning

Consulting Interview