Jump to content

Statistical Analysis: an Introduction using R/R/Vectors

From Wikibooks, open books for an open world
One of the most fundamental objects in R is the vector, used to store multiple measurements of the same type (e.g. data variables). There are several different sorts of data that can be stored in a vector. Most common is the numeric vector, in which each element of the vector is simply a number. Other commonly used types of vector are character vectors (where each element is a piece of text) and logical vectors (where each element is either TRUE or FALSE[1]). In this topic we will use some example vectors provided by the "datasets" package, containing data on States of the USA (see ?state). R is an inherently vector-based program; in fact the numbers we have been using in previous calculations are just treated as vectors with a single element. This means that most basic functions in R will behave sensibly when given a vector as a argument, as shown below.
Input:
state.area                #a NUMERIC vector giving the area of US states, in square miles
state.name                #a CHARACTER vector (note the quote marks) of state names 
sq.km <- state.area*2.59  #Arithmetic works on numeric vectors, e.g. convert sq miles to sq km
sq.km                     #... the new vector has the calculation applied to each element in turn
sqrt(sq.km)               #Many mathematical functions also apply to each element in turn 
range(state.area)         #But some functions return different length vectors (here, just the max & min).
length(state.area)        #and some, like this useful one, just return a single value.
Result:
> state.area #a NUMERIC vector giving the area of US states, in square miles
[1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876   6450  83557  56400

[14] 36291 56290 82264 40395 48523 33215 10577 8257 58216 84068 47716 69686 147138 [27] 77227 110540 9304 7836 121666 49576 52586 70665 41222 69919 96981 45333 1214 [40] 31055 77047 42244 267339 84916 9609 40815 68192 24181 56154 97914 > state.name #a CHARACTER vector (note the quote marks) of state names

[1] "Alabama"            "Alaska"             "Arizona"            "Arkansas"          
[5] "California"         "Colorado"           "Connecticut"        "Delaware"          
[9] "Florida"            "Georgia"            "Hawaii"             "Idaho"             

[13] "Illinois" "Indiana" "Iowa" "Kansas" [17] "Kentucky" "Louisiana" "Maine" "Maryland" [21] "Massachusetts" "Michigan" "Minnesota" "Mississippi" [25] "Missouri" "Montana" "Nebraska" "Nevada" [29] "New Hampshire" "New Jersey" "New Mexico" "New York" [33] "North Carolina" "North Dakota" "Ohio" "Oklahoma" [37] "Oregon" "Pennsylvania" "The smallest state" "South Carolina" [41] "South Dakota" "Tennessee" "Texas" "Utah" [45] "Vermont" "Virginia" "Washington" "West Virginia" [49] "Wisconsin" "Wyoming" > sq.km <- state.area*2.59 #Standard arithmatic works on numeric vectors, e.g. convert sq miles to sq km > sq.km #... giving another vector with the calculation performed on each element in turn

[1]  133667.31 1527470.63  295024.31  137539.36  411014.87  269999.73   12973.31    5327.63
[9]  151670.40  152488.84   16705.50  216412.63  146076.00   93993.69  145791.10  213063.76

[17] 104623.05 125674.57 86026.85 27394.43 21385.63 150779.44 217736.12 123584.44 [25] 180486.74 381087.42 200017.93 286298.60 24097.36 20295.24 315114.94 128401.84 [33] 136197.74 183022.35 106764.98 181090.21 251180.79 117412.47 3144.26 80432.45 [41] 199551.73 109411.96 692408.01 219932.44 24887.31 105710.85 176617.28 62628.79 [49] 145438.86 253597.26 > sqrt(sq.km) #Many mathematical functions also apply to each element in turn

[1]  365.60540 1235.90883  543.16140  370.86299  641.10441  519.61498  113.90044   72.99062
[9]  389.44884  390.49819  129.24976  465.20171  382.19890  306.58390  381.82601  461.58830

[17] 323.45487 354.50609 293.30334 165.51263 146.23826 388.30328 466.62203 351.54579 [25] 424.83731 617.32278 447.23364 535.06878 155.23324 142.46136 561.35100 358.33202 [33] 369.04978 427.81111 326.74911 425.54695 501.17940 342.65503 56.07370 283.60615 [41] 446.71213 330.77479 832.11058 468.96955 157.75712 325.13205 420.25859 250.25745 [49] 381.36447 503.58441 > range(state.area) #But some functions return different length vectors (here, just the max & min). [1] 1214 589757 > length(state.area) #and some, like this useful one, just return a single value. [1] 50

Note that the first part of your output may look slightly different to that above. Depending on the width of your screen, the number of elements printed on each line of output may differ. This is the reason for the numbers in square brackets, which are produced when vectors are printed to the screen. These bracketed numbers give the position of the first element on that line, which is a useful visual aid. For instance, looking at the printout of state.name, and counting across from the second line, we can tell that the eighth state is Delaware.
You may occasionally need to create your own vectors from scratch (although most vectors are obtained from processing data in already-existing files). The most commonly used function for constructing vectors is c(), so named because it concatenates objects together. However, if you wish to create vectors consisting of regular sequences of numbers (e.g. 2,4,6,8,10,12, or 1,1,2,2,1,1,2,2) there are several alternative functions you can use, including seq(), rep(), and the : operator.
Input:
c("one", "two", "three", "pi")  #Make a character vector
c(1,2,3,pi)                     #Make a numeric vector
seq(1,3)                        #Create a sequence of numbers
1:3                             #A shortcut for the same thing (but less flexible)
i <- 1:3                        #You can store a vector
i
i <- c(i,pi)                    #To add more elements, you must assign again, e.g. using c() 
i                             
i <- c(i, "text")               #A vector cannot contain different data types, so ... 
i                               #... R converts all elements to the same type
i+1                             #The numbers are now strings of text: arithmetic is impossible 
rep(1, 10)                      #The "rep" function repeats its first argument
rep(3:1,10)                     #The first argument can also be a vector
huge.vector <- 0:(10^7)         #R can easily cope with very big vectors
#huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers
rm(huge.vector)                 #"rm" removes objects. Deleting huge unused objects is sensible
Result:
> c("one", "two", "three", "pi") #Make a character vector

[1] "one" "two" "three" "pi" > c(1,2,3,pi) #Make a numeric vector [1] 1.000000 2.000000 3.000000 3.141593 > seq(1,3) #Create a sequence of numbers [1] 1 2 3 > 1:3 #A shortcut for the same thing (but less flexible) [1] 1 2 3 > i <- 1:3 #You can store a vector > i [1] 1 2 3 > i <- c(i,pi) #To add more elements, you must assign again, e.g. using c() > i [1] 1.000000 2.000000 3.000000 3.141593 > i <- c(i, "text") #A vector cannot contain different data types, so ... > i #... R converts all elements to the same type [1] "1" "2" "3" "3.14159265358979" "text" > i+1 #The numbers are now strings of text: arithmetic is impossible Error in i + 1 : non-numeric argument to binary operator > rep(1, 10) #The "rep" function repeats its first argument

[1] 1 1 1 1 1 1 1 1 1 1

> rep(3:1,10) #The first argument can also be a vector

[1] 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1

> huge.vector <- 0:(10^7) #R can easily cope with very big vectors > #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers > rm(huge.vector) #"rm" removes objects. Deleting huge unused objects is sensible


Notes

[edit | edit source]
  1. These are special words in R, and cannot be used as names for objects. The objects T and F are temporary shortcuts for TRUE and FALSE, but if you use them, watch out: since T and F are just normal object names you can change their meaning by overwriting them.