Stata/Data Management
Read and import data
[edit | edit source]Usually, data are loaded into memory using the use
command. The clear
option makes it sure that the current database in memory will be removed without saving the last changes.
use "W:\Data\…\table.dta" , clear
The cd
command allows to specify a working directory and makes it easier to load tables into memory.
cd "W:\Data\" use table, clear
Stata9 users can import Stata10 datasets using the use10
command.
use10 table, clear
Some example datasets are stored in the Stata directory. They can be loaded into memory using the sysuse
command.
. sysuse cancer, clear . sysuse smoking, clear . sysuse auto, clear . sysuse jspmix, clear
You can import a Comma Separated Value (CSV) format using insheet
insheet using "W:\Data\…\table.csv", delim(";")
See also
[edit | edit source]- 'webuse' for internet data
- 'xmluse' for xml files
- 'infile' for text files
- 'input' for entering data from keyboard
- 'infix'
- 'fdause' for SAS xport data
- If none of these command works, you may use Stat/Transfer
- FTRANS: module to batch convert file formats
Save and export data
[edit | edit source]- save
save table, replace
If you use Stata10 you can export to Stata9 format using saveold
saveold table, replace
- outsheet : export to tab delimited or csv format.
outsheet using "W:\Data\…\table.csv", replace comma
See also
- outfile
- xmlsave
- fdasave
Append and merge
[edit | edit source]The standard Stata command is merge
. However, the user-written command mmerge
is safer and gives a better output. This command may be installed using ssc install mmerge
command or using findit mmerge
.
- dmerge
- joinby merge all possible pairs between the datasets
- append if you have two datasets with the same variable but different observations, you can make one dataset using the append command.
use data_1, clear append data_2 br
Describe a datasets
[edit | edit source]- des
- des, s
- codebook
- codebook2
Detect missing values
[edit | edit source]- tabmiss
- npresent
- nmissing
You can convert missing values to values using the mvencode command.
mvencode exg ga dvg verts eco dr dvd fn reg mnr div, mv(0) override
Variables
[edit | edit source]Very often you have to convert variable from a string to a numerical format. There are several way to do it. If you already have numeric values in your string variable, you should use destring. Otherwise you should use the encode command. Encode will automatically create a numerical variable and will use as a value label the string values of the previous variable.
- gen
- egen
- replace
- recode
- drop
- keep
- rename
'vallist' gives the list of all categories of a categorical variable in Stata.
vallist codep
Dealing with labels
[edit | edit source]- lab var
- lab list
- lab define
- lab value
Expand
[edit | edit source]- You can expand a dataset (ie multiplying observations by a given factor) using the expand command.
This is useful for generating panel data models. In the first example, we draw 10 observations in a standard normal distribution and we replicate each observation once.
clear set obs 10 gen u = invnorm(uniform()) expand 2 sort u br
It is also possible to pass an integer variable as an argument to expand.
clear set obs 10 gen u = uniform() gen var = 1 + int(10 * uniform()) expand var sort u br
clear set obs 10 gen u = invnorm(uniform()) expandcl 2 , gen(cl)
Data Storage types
[edit | edit source]All numeric types in Stata are normal "signed" quantities except that the highest 27 values are reserved for the "missing" types (., .a, .b, ..., .z). The storage size of the each variable is as follows:
Variable | Size (in bytes) |
---|---|
byte | 1 |
int | 2 |
long | 4 |
float | 4 |
double | 8 |
string | 1 per-letter (therefore only ASCII characters, not full Unicode/UTF-8) |