R Programming/Estimation utilities
This page deals with methods which are available for most estimation commands. This can be useful for all kind of regression models.
Formulas
[edit | edit source]Most estimation commands use a formula interface. The outcome is left of the ~
and the covariates are on the right.
y ~ x1 + x2
It is easy to include multinomial variable as predictive variables in a model. If the variable is not already a factor, one just need to use the as.factor()
function. This will create a set of dummy variables.
y ~ as.factor(x)
For instance, we can use the Star data in the Ecdat package :
library("Ecdat")
data(Star)
summary(lm(tmathssk ~ as.factor(classk), data = Star))
I()
takes arguments "as is". For instance, if you want to include in your equation a modified variable such as a squarred term or the addition of two variables, you may use I()
.
lm(y ~ x1 + I(x1^2) + x2)
lm(y ~ I(x1 + x2))
lm(I(y-100) ~ I(x1-100) + I(x2 - 100))
It is easy to include interaction between variables by using :
or *
. :
adds all interaction terms whereas *
adds interaction terms and individual terms.
lm(y~x1:x2) # interaction term only
lm(y~x1*x2) # interaction and individual terms
It is also possible to generate polynomials using the poly()
function with option raw = TRUE
.
lm(y ~ poly(x, degree = 3, raw = TRUE))
There is also an advanced formula interface which is useful for instrumental variables models and mixed models. For instance ivreg()
(AER) uses this advanced formulas interface. The instrumental variables are entered after the |
. See the Instrumental Variables section if you want to learn more.
library("AER")
ivreg(y ~ x | z)
Output
[edit | edit source]In addition to the summary()
and print()
functions which display the output for most estimation commands, some authors have developed simplified output functions. One of them is the display()
function in the arm package. Another one is the coefplot()
in the arm package which displays the coefficients with confidence intervals in a plot. According to the standards defined by Nathaniel Beck[1], Jeff Gill developped graph.summary()
[2]. This command does not show useless auxiliary statistics.
R code | Output |
---|---|
source("http://artsci.wustl.edu/~jgill/Models/graph.summary.R")
N <- 1000
u <- rnorm(N)
x1 <- 1 + rnorm(N)
x2 <- 1 + rnorm(N) + x1
y <- 1 + x1 + x2 + u
graph.summary(lm(y ~ x1 + x2))
|
Family: gaussian
Link function: identity
Coef Std.Err. 0.95 Lower 0.95 Upper CIs:ZE+RO
(Intercept) 0.980 0.056 0.871 1.089 |o|
x1 1.040 0.043 0.955 1.125 |o|
x2 0.984 0.031 0.923 1.045 |o|
N: 1000 Estimate of Sigma: 0.998
|
library("arm")
display(lm(y ~ x1 + x2))
|
lm(formula = y ~ x1 + x2)
coef.est coef.se
(Intercept) 0.89 0.05
x1 1.05 0.04
x2 1.02 0.03
---
n = 1000, k = 3
residual sd = 0.96, R-Squared = 0.86
|
Weights
[edit | edit source] This section is a stub. You can help Wikibooks by expanding it. |
Tests
[edit | edit source] This section is a stub. You can help Wikibooks by expanding it. |
Confidence intervals
[edit | edit source] This section is a stub. You can help Wikibooks by expanding it. |
Delta Method
[edit | edit source]- If you want to know the standard error of a transformation of one of your parameter, you need to use the delta method
deltamethod()
in the msm package[3].delta.method()
in the alr3 package.deltaMethod
in the car package.
Zelig : the pseudo-bootstrap method
[edit | edit source]Zelig[4] is a postestimation package which simulates in the distribution of the estimated parameters and computes the quantities of interest such as marginal effects or predicted probabilities. This is especially useful for non-linear models. Zelig comes with a set of vignettes which explain how to deal with each kind of model. There are three commands.
zelig()
estimates the model and draws from the distribution of estimated parameters.setx()
fixes the values of explanatory variables.sim()
computes the quantities of interest.
References
[edit | edit source]- ↑ Nathaniel Beck "Making regression and related output more helpful to users" The Political Methodologist 2010 http://politics.as.nyu.edu/docs/IO/2576/beck_tpm_edited.pdf
- ↑ Jeff Gill
graph.summary()
http://artsci.wustl.edu/~jgill/Models/graph.summary.s - ↑ See the example on the UCLA Statistics webpage : http://www.ats.ucla.edu/stat/r/faq/deltamethod.htm
- ↑ Kosuke Imai, Gary King and Olivia Lau (2009). Zelig: Everyone's Statistical Software. R package version 3.4-5. http://CRAN.R-project.org/package=Zelig