Biomedical Engineering Theory And Practice/R Language
Data Types
[edit | edit source]R has a various objects for holding data, including scalars, vectors, matrices, arrays, data frames, and lists.
Scalar and Constant
[edit | edit source]"Scalar" generally means "one-dimensional"vector. Constants only have one value ever. You can constants is similar to zero-dimensional values (a single point).
- Scalar
> x<-3
> y<-6
> z<-x+y
> z
[1] 9
- Constant
> 2+3
[1] 5
> 5-4
[1] 1
> 6*4
[1] 24
Vector
[edit | edit source]Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form the vector. Here are examples of each type of vector:
> a<-c(1,2,5,-3,-6,5) #nummeric vector
> b<-c("one","two","three") #character vector
> d<-c(TRUE,FALSE,TRUE,FALSE,TRUE,TRUE) #logical vector
> a[c(2,4)]
[1] 2 -3
> a[4]
[1] -3
> a[2:4]
[1] 2 5 -3
Matrix
[edit | edit source]A matrix is a two-dimensional array where each element has the same mode (numeric, character, or logical). Matrices are created with the "matrix" function . The general format is as follows:
> mymatrix <- matrix(vector, nrow=number of rows, ncol=number of columns,byrows=logical value, dimnames=list(vector-of-rownames,vector-of-colnames))
> A<-matrix(1:9,nrow=3)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> A[2,1]
[1] 2
> A<-matrix(1:9,nrow=3,byrow=T)
> A
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Array
[edit | edit source]> myarray <-array(vector, dimensions, dimnames)
> dim1 <- c("A1","A2","A3")
> dim2 <- c("B1","B2","B3","B4")
> dim3 <- c("C1","C2")
> x<-array(1:24,c(3,4,2),dimnames=list(dim1,dim2,dim3))
> x
, , C1
B1 B2 B3 B4
A1 1 4 7 10
A2 2 5 8 11
A3 3 6 9 12
, , C2
B1 B2 B3 B4
A1 13 16 19 22
A2 14 17 20 23
A3 15 18 21 24
Data Frame
[edit | edit source]>mydata <-data.frame(col1,col2,col3....)
> patientID<-LETTERS[1:4]
> age<-c(24,35,28,52)
> diabetes<-c("Type1","Type2","Type1","Type2")
> stats<-c("Poor","Improved","Poor","Excellent")
> patientDATA<-data.frame(patientID,age,diabetes,stats,row.names=letters[1:4])
> patientDATA
patientID age diabetes stats
a A 24 Type1 Poor
b B 35 Type2 Improved
c C 28 Type1 Poor
d D 52 Type2 Excellent
Factors
[edit | edit source]> patientID<-LETTERS[1:4]
> age<-c(24,35,28,52)
> diabetes<-c("Type1","Type2","Type1","Type2")
> stats<-c("Poor","Improved","Poor","Excellent")
> status <- factor(stats, order=TRUE)
> patientdata <- data.frame(patientID, age, diabetes, status)
> str(patientdata)
'data.frame': 4 obs. of 4 variables:
$ patientID: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
$ age : num 24 35 28 52
$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 2
$ status : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 3 1
> summary(patientdata)
patientID age diabetes status
A:1 Min. :24.00 Type1:2 Excellent:1
B:1 1st Qu.:27.00 Type2:2 Improved :1
C:1 Median :31.50 Poor :2
D:1 Mean :34.75
3rd Qu.:39.25
Max. :52.00
Lists
[edit | edit source]>mylist <- list(name1=object1,name2=object2,...)
> x<-"TheList"
> y<-c(25,19,20)
> z<-matrix(1:10,nrow=2,byrow=TRUE)
> theta<-LETTERS[1:10]
> delta<-c(2+3i,4-6i)
> mylist<-list(title=x,components=y,z,theta,delta)
> mylist
$title
[1] "TheList"
$components
[1] 25 19 20
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[[4]]
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
[[5]]
[1] 2+3i 4-6i
> mylist[[2]]
[1] 25 19 20
> mylist[["components"]]
[1] 25 19 20
> lapply(mylist,length)
$title
[1] 1
$components
[1] 3
[[3]]
[1] 10
[[4]]
[1] 10
[[5]]
[1] 2
> lapply(mylist,class)
$title
[1] "character"
$components
[1] "numeric"
[[3]]
[1] "matrix"
[[4]]
[1] "character"
[[5]]
[1] "complex"
> lapply(mylist,mean)
$title
[1] NA
$components
[1] 21.33333
[[3]]
[1] 5.5
[[4]]
[1] NA
[[5]]
[1] 3-1.5i
Warning messages:
1: In mean.default(X[[1L]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[4L]], ...) :
Basic Functions
[edit | edit source]Arithmetic Operators
[edit | edit source]The arithmetic operators and their examples which are used in R programming are listed in the table below.
Function | R Command | Example |
---|---|---|
Exponentiation, | > a^b | > 3+6
[1] 9 |
Multiplication, | > a*b | > 22*5
[1] 110 |
Division, | > a/b | > 30/3
[1] 10 |
Addition, | > a+b | > 10+9
[1] 19 |
Subtraction, | > a-b | > 10-3
[1] 7 |
Integer(Quotient) | > a%/%b | > 20%/%3
[1] 6 |
Modulo(Remainder) | > a%%b | > 20%%3
[1] 2 |
Complex Number
[edit | edit source]> x<-5.2-3i
R command | R command | ||||
---|---|---|---|---|---|
Complex number |
> Re(x)
[1] 5.2
|
Real part | > Im(x)
[1] -3
|
||
Imaginary part | > Im(x)
[1] -3
|
Modulus | > Mod(x)
[1] 6.003332
|
||
Argument | > Arg(x)
[1] -0.5232783
|
Conjugate | > Conj(x)
[1] 5.2+3i
|
||
Membership |
> is.complex(x)
[1] TRUE
|
Coercion |
> as.complex(19.6)
[1] 19.6+0i
|
Rounding
[edit | edit source]Function | R Command | Function | R Command |
---|---|---|---|
Greatest integer less than | > floor(9.9)
[1] 9
> floor(-9.9)
[1] -10
|
Next integer | > ceiling(9.9)
[1] 10
> ceiling(-9.9)
[1] -9
|
Rounding function | > round(9.9)
[1] 10
> round(9.2)
[1] 9
|
Strip off the decimal | > trunc(8.6)
[1] 8
> trunc(-8.6)
[1] -8
|
Trigonometric Functions
[edit | edit source]Function | Trigometric Function | Trigometric Inverse Function | Hyperbolic Function | Hyperbolic Inverse Function |
---|---|---|---|---|
sine | sin(x) | asin(x) | sinh(x) | asinh(x) |
cosine | cos(x) | acos(x) | cosh(x) | acosh(x) |
tangent | tan(x) | atan(x) | tanh(x) | atanh(x) |
Log and Exponential Functions
[edit | edit source]Function | R command | R Example |
---|---|---|
Absolute, | abs(x) | > abs(-7.4)
[1] 7.4 |
Log to the base e, | > log(10)
[1] 2.302585 |
|
Log to the base 10, | log10(x) | > log10(100)
[1] 2 |
Log to the base n of x | log(x,n) | > log(64,4)
[1] 3 |
exp(x) | > exp(3)
[1] 20.08554 | |
sqrt(x) | > sqrt(25)
[1] 5 | |
factorial(x) | > factorial(10)
[1] 3628800 | |
combinations(n,r) | > choose(5,4)
[1] 5 |
Relational Operators and Logical Variables
[edit | edit source]Relational Operators
[edit | edit source]Relational Operator | |
---|---|
Equal | == |
Not equal | != |
Less than | < |
Greater than | > |
Less than or equal | <= |
Greater than or equal | >= |
- TRUE=1,FALSE=0
> x<-c(6,3,4)
> y<-c(5,15,9)
> z<-(x<y)
> z
[1] FALSE TRUE TRUE
> z<-(x<y)+5
> z
[1] 5 6 6
Logical Operators
[edit | edit source]& | |||||
---|---|---|---|---|---|
False(0) | False(0) | True(1) | False(0) | False(0) | False(0) |
False(0) | True(1) | True(1) | False(0) | True(1) | True(1) |
True(1) | False(0) | False(0) | False(0) | True(1) | True(1) |
True(1) | True(1) | False(0) | True(1) | True(1) | False(0) |
> x<-c(6,2,8)
> y<-c(14,6,7)
> z<-c(4,5,11)
> z1<-x>y
> z1
[1] FALSE FALSE TRUE
> z2<-y>z
> z2
[1] TRUE TRUE FALSE
> z3<-(x>y) & (y>z)
> z3
[1] FALSE FALSE FALSE
> z1<-xor(x,y) > z1 [1] FALSE FALSE FALSE
Sequence Generation and Repeats
[edit | edit source]> x1
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0
> x2 <- seq(from=0.4,by=0.01,length=15)
> x2
[1] 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54
> x3<-seq(1.4,2.1,0.3)
> x3
[1] 1.4 1.7 2.0
> x4<-rep(15,7)
> x4
[1] 15 15 15 15 15 15 15
> x5<-rep(1:4,3)
> x5
[1] 1 2 3 4 1 2 3 4 1 2 3 4
> x6<-rep(1:3,each=2,times=3)
> x6
[1] 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
> x7<-rep(c("a","b","c"),c(1,2,3))
> x7
[1] "a" "b" "b" "c" "c" "c"
Random Number Generation
[edit | edit source]> set.seed(100)
> runif(5)
[1] 0.5465586 0.1702621 0.6249965 0.8821655 0.2803538
> runif(5)
[1] 0.3984879 0.7625511 0.6690217 0.2046122 0.3575249
> x<-c(5,10,8,6,9,11,14,16,18)
> sample(x)
[1] 6 11 18 9 8 16 10 5 14
> sample(x)
[1] 16 9 10 8 18 14 11 6 5
> sample(x,4)
[1] 10 11 14 5
Vector Functions
[edit | edit source]Length and Statistics
[edit | edit source]> x<-c(6,9,11,14,12,2,33,76,0,90)
Function | R command | Function | R command |
---|---|---|---|
Length | > length(x)
[1] 10
|
Mean | > mean(x)
[1] 25.3
|
Max | > max(x)
[1] 90
|
Min | > min(x)
[1] 0
|
Distribution | > quantile(x)
0% 25% 50% 75% 100%
0.00 6.75 11.50 28.25 90.00
|
Sort | > sort(x)
[1] 0 2 6 9 11 12 14 33 76 90
|
Function | R command |
---|---|
Reference the 5th element of Vector from the vector | > x[5]
[1] 12
|
Delete the 3rd element of vector from the vector | > x1<-x[-3]
> x1
[1] 6 9 14 12 2 33 76 0 90
|
Delete the last element of vector from the vector | > x2<-x[-length(x)]
> x2
[1] 6 9 11 14 12 2 33 76 0
|
Delete 1st and the last element of vector from the vector | > x3<-x[c(-1,-length(x))]
> x3
[1] 9 11 14 12 2 33 76 0
|
Remove the smallest 2 and the largest 3 element from the vector | > trim <-function(x)sort(x)[-c(1,2,length(x)-2,length(x)-1,length(x))]
> trim(x)
[1] 6 9 11 12 14
|
R code | ||
---|---|---|
Sum | > sum(x)
[1] 253
|
|
Mean,Median | > mean(x)
[1] 25.3
|
> median(x)
[1] 11.5
|
Range | > range(x)
[1] 0 90 |
|
Standard Deviation,variance | > sd(x)
[1] 31.87841
|
> var(x)
[1] 1016.233
|
Which is the largest and smallest number | > which(x==max(x))
[1] 10
|
> which(x==min(x))
[1] 9
|
sort and reverse sort | > sort(x)
[1] 0 2 6 9 11 12 14 33 76 90
|
> rev(sort(x))
[1] 90 76 33 14 12 11 9 6 2 0
|
> x<-matrix(rpois(15,1.2),nrow=3)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 2 1 0 3 3
[2,] 0 2 3 1 2
[3,] 2 2 0 1 1
> mean(x[,5])
[1] 2
> var(x[3,])
[1] 0.7
> rowSums(x)
[1] 9 8 6
> colSums(x)
[1] 4 5 3 5 6
> rowMeans(x)
[1] 1.8 1.6 1.2
> colMeans(x)
[1] 1.333333 1.666667 1.000000 1.666667 2.000000
Parallel min and max
[edit | edit source]> x<-c(2,5,10,-6,29,45)
> y<-c(5,9,15,-22,38,88)
> z<-c(9,10,2,7,55,24)
> q<-c(22,3,5,6,-23,88)
> pmin(x,y,z,q)
[1] 2 3 2 -22 -23 24
> pmax(x,y,z,q)
[1] 22 10 15 7 55 88
'table' and 'tapply'
[edit | edit source]> data(ChickWeight)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
.......................
576 234 18 50 4
577 264 20 50 4
578 264 21 50 4
> tapply(ChickWeight$weight,ChickWeight$Time,mean)
0 2 4 6 8 10 12 14
41.06000 49.22000 59.95918 74.30612 91.24490 107.83673 129.24490 143.81250
16 18 20 21
168.08511 190.19149 209.71739 218.68889
> tapply(ChickWeight$weight,ChickWeight$Diet,median)
1 2 3 4
88.0 104.5 125.5 129.5
> codon1=c("UUU","UUC","UUA","UUG","UUA","UUG","UUC")
> table(codon1)
codon1
UUA UUC UUG UUU
2 2 2 1
> aminoacid=list(Phe=c("UUU","UUC"),Leu=c("UUA","UUG"))
> codon=as.factor(codon1)
> levels(codon)=aminoacid
> codon
[1] Phe Phe Leu Leu Leu Leu Phe
Levels: Phe Leu
> table(codon)
codon
Phe Leu
3 4
'apply'
[edit | edit source]> x<-matrix(1:15,nrow=3,byrow=T)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
> apply(x,1,sum)
[1] 15 40 65
> apply(x,2,sum)
[1] 18 21 24 27 30
> apply(x,1,sqrt)
[,1] [,2] [,3]
[1,] 1.000000 2.449490 3.316625
[2,] 1.414214 2.645751 3.464102
[3,] 1.732051 2.828427 3.605551
[4,] 2.000000 3.000000 3.741657
[5,] 2.236068 3.162278 3.872983
> apply(x,2,sqrt)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.000000 1.414214 1.732051 2.000000 2.236068
[2,] 2.449490 2.645751 2.828427 3.000000 3.162278
[3,] 3.316625 3.464102 3.605551 3.741657 3.872983
Closets
[edit | edit source]> x<-c(3,22,15,11,50,85)
> x-10
[1] -7 12 5 1 40 75
> abs(x-10)
[1] 7 12 5 1 40 75
> min(abs(x-10))
[1] 1
> which(abs(x-10)==min(abs(x-10)))
[1] 4
Sort,Rank,Order
[edit | edit source]> x<-c(2,5,10,-6,29,45)
> # rank: the rank of unsorted vector
> rank(x)
[1] 2 3 4 1 5 6
> # order:the rank of the sorted vector
> order(x)
[1] 4 1 2 3 5 6
Unique and Duplicated
[edit | edit source]> x<-c("a","b","c","a","a","a","b","c")
> table(x)
x
a b c
4 2 2
> unique(x)
[1] "a" "b" "c"
> duplicated(x)
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> x[!duplicated(x)]
[1] "a" "b" "c"
Run length
[edit | edit source]> x<-rpois(20,0.5)
> x
[1] 2 0 0 1 0 1 0 0 1 1 0 0 2 0 0 0 0 0 0 0
> rle(x)
Run Length Encoding
lengths: int [1:10] 1 2 1 1 1 2 2 2 1 7
values : int [1:10] 2 0 1 0 1 0 1 0 2 0
Set functions
[edit | edit source]> setA <-c("I","II","III","IV","V")
> setB <-c("III","IV","V","VI")
> union(setA,setB)
[1] "I" "II" "III" "IV" "V" "VI"
> intersect(setA,setB)
[1] "III" "IV" "V"
> setdiff(setA,setB)
[1] "I" "II"
> setdiff(setB,setA)
[1] "VI"