Jump to content

Stata/Linear Models

From Wikibooks, open books for an open world

References

[edit | edit source]

Simple Linear Model

[edit | edit source]

We generate a simple fake data set :

clear
set obs 1000
gen u = invnorm(uniform())
gen x = invnorm(uniform()) 
gen y = 1 + x + u
reg y x
eret list /*gives the list of all stored results */
predict yhat /*gives the predicted value of y*/
predict res, res /*gives the residuals*/

leanout is a prefix which simplifies the output[1]. This command does not display useless ancillary statistics and focus and confidence intervals rather than null hypothesis testing.

ssc install leanout{{typo help inline|reason=similar to cleanout|date=September 2022}}
leanout : reg y x 

Performing multiple regression on the same subsample

[edit | edit source]

Sometimes you want to perform multiple regressions on the same subsample. This is not obvious since when one of the variable of the model is missing the observation is dropped. One way to be sure that you use the same subsample is to use the 'e(sample)' command which returns the list of all used observations. In the example below qui store the result of 'e(sample)' in variables 'samp1' and 'samp2' and we perform the model conditioning on 'samp1==1 & samp2 == 1'. Thus we are sure that both estimation are done using the same observations.

 
. clear
. set obs 1000
. gen u = invnorm(uniform())
. gen x = invnorm(uniform())
. gen y1 = 1 + x + u if uniform() < .8
. gen y2 = 1 + x + u if uniform() < .9 
. qui reg y1 x
. gen samp1 = e(sample)
. ta samp1 
. qui reg y2 x
. gen samp2 = e(sample)
. ta samp2
. eststo clear
. eststo : qui : reg y1 x if samp1 & samp2 
. eststo : qui : reg y2 x if samp1 & samp2
. esttab ,  star(* 0.1 ** 0.05 *** 0.01) se

Instrumental Variables

[edit | edit source]

Here is a data generating process for an instrumental variable setting. u is correlated with x which gives endogeneity. z is independant of u and correlated with x, which makes it eligible as a valid instrument for x.

clear
set obs 1000
gen u = invnorm(uniform())
gen z = invnorm(uniform())
gen x = invnorm(uniform()) + z + u
gen y = 1 + 2*x + u

It easy to see that the standard least square estimate is biased and the IV estimate is unbiased.

eststo clear
eststo : reg y x 
eststo : ivreg y (x=z)
esttab , se

You can perform an overidentification test using overid or ivreg2

clear
set obs 1000
gen u = invnorm(uniform())
gen z1 = invnorm(uniform())
gen z2 = invnorm(uniform())
gen x = invnorm(uniform()) + z1 - 2*z2 + u
gen y = 2*x + u

ivreg y (x=z1 z2)
overid
ivreg2 y (x=z1 z2)

Seemingly Unrelated Equations

[edit | edit source]
. clear
. set obs 1000
. local s11 = 1
. local s12 = .5 
. local s22 = 1
. local s13 = .5
. local s23 = .5
. local s33 = 1 
. forvalues k = 1/3{
  2.  tempvar u`k'
  3.  gen `u`k'' = invnorm(uniform())
  4.  }
. gen eta1 = `s11' * `u1'
. gen eta2 = `s12' * `u1' + `s22' * `u2'
. gen eta3 = `s13' * `u1' + `s23' * `u2' + `s33' * `u3' 
. gen x = invnorm(uniform()) 
. forvalues k=1/3{
  2.  gen z`k' = invnorm(uniform())
  3.  }
. gen y1 = 1 + 2*x + z1 + eta1 
. gen y2 = - 1 + x + z2 + eta2 
. gen y3 = 4 + z3 + eta3
. global eq1 =  "y1 x z1"
. global eq2 =  "y2 x z2"
. global eq3 =  "y3 x z3" 
. reg $eq1
. reg $eq2
. reg $eq3
. sureg (toto1 : $eq1) (toto2 : $eq2) (toto3 : $eq3)

Linear Panel Data

[edit | edit source]
  • xtset
  • xtreg
  • xtabond
  • xtabond2
  • ivreg2
  • xtivreg2
  • ivendog
  • ivhettest
  • overid[check spelling] : overidentification test
  • xtoverid : overidentification test
  • xttest2
  • ivgmm0
  • xtarsim
  • xtdpd
  • xtdpdsys

Random effect estimator

[edit | edit source]

We assume . With f independant of x and z and u independant of x and z.

. clear
. set obs 1000
. gen id = _n
. gen f = invnorm(uniform())
. gen z = uniform()
. expand 10
. gen u = invnorm(uniform())
. gen x = uniform()
. gen y = 1 + x + z + f + u
. eststo clear
. eststo : qui : reg y x z
. eststo : qui : reg y x z, robust
. eststo : qui : reg y x z, cluster(id)
. eststo : qui : xtreg y x z, i(id) re
. eststo : qui : xtreg y x z, i(id) mle
. eststo : qui : xtmixed y x z || id : , mle
. esttab * , se  

Dynamic Linear Panel Data

[edit | edit source]

Layard and Nickel unemployment dataset.

. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta, clear
(Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist)

You can also generate fake data :

clear
	set obs 10000
	set seed 123456
	gen id = _n
	gen f= invnorm(uniform())
	forvalues t=1/5{
		gen u`t' = invnorm(uniform())
		}
	gen y1 = f/.3 + u1 
	forvalues t=2/5{
		local z=`t'-1
		gen y`t' =  .7 * y`z' +  f +  u`t'
	}
save wide, replace
reshape long y, i(id) j(year)
drop u* f
tsset siren an 
save long, replace

It is easy to see that standard random effect and fixed effect models are biased but instrumented random and fixed effect are unbiased :

eststo clear
eststo : qui : xtreg y l.y, re 
eststo : qui : xtreg y l.y, fe 
eststo : qui : xtivreg y (l.y= l2.d.y) , re 
eststo : qui : xtivreg y (l.y= l2.y) , fd 
esttab  ,se
eststo clear
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  nomata  robust
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  ivstyle( , e(diff)) nomata  robust
eststo : qui : xi : xtabond2 y l.y, iv(l.y l2.y l3.y, equation(diff))   nomata  robust
esttab , se

References

[edit | edit source]
  1. Nathaniel Beck "leanout: A prefix to regress (and similar commands) to produce less output that is more useful" Stata Journal, forthcoming http://politics.as.nyu.edu/docs/IO/2576/sj_driver.pdf
Previous: Descriptive Statistics Index Next: Maximum Likelihood