R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has
R and statistics
It is an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN family of Internet sites (via https://CRAN.R-project.org) and elsewhere.
Using R interactively
Make the working directory
Getting help with functions and features
help(solve)
?solve
help("[[")
help.start()
## starting httpd help server ... done
## If the browser launched by 'xdg-open' is already running, it is *not*
## restarted, and you must switch to its window.
## Otherwise, be patient ...
??solve
help.search("solve")
Data permanency and removing objects
The entities that R creates and manipulates are known as objects. These may be variables, arrays of numbers, character strings, functions, or more general structures built from such components.
During an R session, objects are created and stored by name.
The R command
x <- 5
y <- 7
z <-seq(1,10)
z
## [1] 1 2 3 4 5 6 7 8 9 10
objects()
## [1] "x" "y" "z"
ls()
## [1] "x" "y" "z"
can be used to display the names of (most of) the objects which are currently stored within R. The collection of objects currently stored is called the workspace.
rm(x, y, z)
ls()
## character(0)
x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
x
## [1] 10.4 5.6 3.1 6.4 21.7
assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
x
## [1] 10.4 5.6 3.1 6.4 21.7
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
x
## [1] 10.4 5.6 3.1 6.4 21.7
1/x
## [1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
y <-c(x,0,x)
ls()
## [1] "x" "y"
x
## [1] 10.4 5.6 3.1 6.4 21.7
y
## [1] 10.4 5.6 3.1 6.4 21.7 0.0 10.4 5.6 3.1 6.4 21.7
v <- 2*x+y+1
## Warning in 2 * x + y: longer object length is not a multiple of shorter object
## length
v
## [1] 32.2 17.8 10.3 20.2 66.1 21.8 22.6 12.8 16.9 50.8 43.5
generates a new vector v of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times.
x/y
## Warning in x/y: longer object length is not a multiple of shorter object length
## [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 Inf 0.5384615
## [8] 0.5535714 2.0645161 3.3906250 0.4792627
x*y
## Warning in x * y: longer object length is not a multiple of shorter object
## length
## [1] 108.16 31.36 9.61 40.96 470.89 0.00 58.24 17.36 19.84 138.88
## [11] 225.68
x-y
## Warning in x - y: longer object length is not a multiple of shorter object
## length
## [1] 0.0 0.0 0.0 0.0 0.0 10.4 -4.8 -2.5 3.3 15.3 -11.3
log(x)
## [1] 2.341806 1.722767 1.131402 1.856298 3.077312
exp(x)
## [1] 3.285963e+04 2.704264e+02 2.219795e+01 6.018450e+02 2.655769e+09
sin(x)
## [1] -0.82782647 -0.63126664 0.04158066 0.11654920 0.28705265
cos(x)
## [1] -0.5609843 0.7755659 -0.9991352 0.9931849 -0.9579148
tan(x)
## [1] 1.47566791 -0.81394328 -0.04161665 0.11734895 -0.29966407
sqrt(x)
## [1] 3.224903 2.366432 1.760682 2.529822 4.658326
min(x)
## [1] 3.1
max(x)
## [1] 21.7
sum(x)
## [1] 47.2
range(x)
## [1] 3.1 21.7
length(x)
## [1] 5
Two statistical functions are mean(x) which calculates the sample mean, which is the same as sum(x)/length(x), and var(x) which gives sum((x-mean(x))^2)/(length(x)-1)
mean(x)
## [1] 9.44
var(x)
## [1] 53.853
sd(x)
## [1] 7.33846
sqrt(var(x))
## [1] 7.33846
sum(x)
## [1] 47.2
length(x)
## [1] 5
sum(x)/length(x)
## [1] 9.44
sum((x-mean(x))^2)/(length(x)-1)
## [1] 53.853
To work with complex numbers, supply an explicit complex part. Thus
sqrt(-17)
## Warning in sqrt(-17): NaNs produced
## [1] NaN
sqrt(-17+0i)
## [1] 0+4.123106i
x <-1:30
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
y <-2*1:15
y
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
x1 <-seq(1,30)
x1
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
seq(from=30, to=1)
## [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6
## [26] 5 4 3 2 1
seq(-5,5, by=.2)
## [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
## [31] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8
## [46] 4.0 4.2 4.4 4.6 4.8 5.0
seq(length=51, from=-5, by=.2)
## [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
## [31] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8
## [46] 4.0 4.2 4.4 4.6 4.8 5.0
rep(x,times=5)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## [51] 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## [76] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10
## [101] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5
## [126] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
rep(x, each=5)
## [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
## [26] 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10
## [51] 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15 15
## [76] 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19 19 19 19 20 20 20 20 20
## [101] 21 21 21 21 21 22 22 22 22 22 23 23 23 23 23 24 24 24 24 24 25 25 25 25 25
## [126] 26 26 26 26 26 27 27 27 27 27 28 28 28 28 28 29 29 29 29 29 30 30 30 30 30
x <-seq(2,30, by=2)-1
x
## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
temp <- x>13
temp
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE
temp <- x>=13
temp
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE
temp <- x<13
temp
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp <- x<=13
temp
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp1 <- x==13
temp1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp2 <- x!=13
temp2
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE
!temp1
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE
temp1|temp2
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
temp1&temp2
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA.
z <- c(1:3,NA)
ind <- is.na(z)
ind
## [1] FALSE FALSE FALSE TRUE
Note that there is a second kind of “missing” values which are produced by numerical computation, the so-called Not a Number, NaN, values. Examples are
0/0
## [1] NaN
Inf-Inf
## [1] NaN
x <-c("x-value", "New iteration reusults")
x
## [1] "x-value" "New iteration reusults"
x[1]
## [1] "x-value"
x[2]
## [1] "New iteration reusults"
length(x)
## [1] 2
The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings.
labs <- paste(c("X","Y"), 1:10, sep="")
labs
## [1] "X1" "Y2" "X3" "Y4" "X5" "Y6" "X7" "Y8" "X9" "Y10"
A logical vector
x <- c(1:3,NA)
x
## [1] 1 2 3 NA
y<-x[!is.na(x)]
y
## [1] 1 2 3
x[is.na(x)] <- 4
x
## [1] 1 2 3 4
A vector of positive integral quantities.
x <-seq(1,15)
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x[1:10]
## [1] 1 2 3 4 5 6 7 8 9 10
y<-seq(16,30)
c(x,y)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
c("x","y")[rep(c(1,2,2,1),times=4)]
## [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x"
A vector of negative integral quantities.
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
y <- x[-(1:5)]
y
## [1] 6 7 8 9 10 11 12 13 14 15
A vector of character strings.
fruit <- c(5, 10, 1, 20)
names(fruit) <- c("orange", "banana", "apple", "peach")
fruit
## orange banana apple peach
## 5 10 1 20
lunch <- fruit[c("apple","orange")]
lunch
## apple orange
## 1 5
The entities R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode, namely numeric 1 , complex, logical, character and raw.
Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw.
R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. lists are known as recursive rather than atomic structures since their components can themselves be lists in their own right.
The other recursive structures are those of mode function and expression.
z <-0:9
mode(z)
## [1] "numeric"
length(z)
## [1] 10
digits <- as.character(z)
mode(digits)
## [1] "character"
length(digits)
## [1] 10
d <- as.integer(digits)
mode(d)
## [1] "numeric"
length(d)
## [1] 10
e<-numeric() # An "empty" object
mode(e)
## [1] "numeric"
length(e)
## [1] 0
e[3] <-17
mode(e)
## [1] "numeric"
length(e)
## [1] 3
alpha <- 2*1:5
alpha
## [1] 2 4 6 8 10
length(alpha) <- 3 # truncate the size of alpha
alpha
## [1] 2 4 6
Getting and setting attributes
The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object.
The function attr(object, name) can be used to select a specific attribute.
z <-1:100
z
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
attr(z,"dim") <-c(10,10)
z
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 11 21 31 41 51 61 71 81 91
## [2,] 2 12 22 32 42 52 62 72 82 92
## [3,] 3 13 23 33 43 53 63 73 83 93
## [4,] 4 14 24 34 44 54 64 74 84 94
## [5,] 5 15 25 35 45 55 65 75 85 95
## [6,] 6 16 26 36 46 56 66 76 86 96
## [7,] 7 17 27 37 47 57 67 77 87 97
## [8,] 8 18 28 38 48 58 68 78 88 98
## [9,] 9 19 29 39 49 59 69 79 89 99
## [10,] 10 20 30 40 50 60 70 80 90 100
attributes(z)
## $dim
## [1] 10 10
All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example “numeric”, “logical”, “character” or “list”, but “matrix”, “array”,“factor” and “data.frame” are other possible values.
z <-as.data.frame(z)
z
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 1 11 21 31 41 51 61 71 81 91
## 2 2 12 22 32 42 52 62 72 82 92
## 3 3 13 23 33 43 53 63 73 83 93
## 4 4 14 24 34 44 54 64 74 84 94
## 5 5 15 25 35 45 55 65 75 85 95
## 6 6 16 26 36 46 56 66 76 86 96
## 7 7 17 27 37 47 57 67 77 87 97
## 8 8 18 28 38 48 58 68 78 88 98
## 9 9 19 29 39 49 59 69 79 89 99
## 10 10 20 30 40 50 60 70 80 90 100
unclass(z)
## $V1
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $V2
## [1] 11 12 13 14 15 16 17 18 19 20
##
## $V3
## [1] 21 22 23 24 25 26 27 28 29 30
##
## $V4
## [1] 31 32 33 34 35 36 37 38 39 40
##
## $V5
## [1] 41 42 43 44 45 46 47 48 49 50
##
## $V6
## [1] 51 52 53 54 55 56 57 58 59 60
##
## $V7
## [1] 61 62 63 64 65 66 67 68 69 70
##
## $V8
## [1] 71 72 73 74 75 76 77 78 79 80
##
## $V9
## [1] 81 82 83 84 85 86 87 88 89 90
##
## $V10
## [1] 91 92 93 94 95 96 97 98 99 100
##
## attr(,"row.names")
## [1] 1 2 3 4 5 6 7 8 9 10
z$V1
## [1] 1 2 3 4 5 6 7 8 9 10
Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia1 and their individual state of origin is specified by a character vector of state mnemonics as
state <-c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")
length(state)
## [1] 30
statef <-factor(state)
statef
## [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa
## [20] vic qld nsw nsw wa sa act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa
levels(statef)
## [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"
incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)
incrmean <-tapply(incomes, statef, mean)
incrmean
## act nsw nt qld sa tas vic wa
## 44.50000 57.33333 55.50000 53.60000 55.00000 60.50000 56.00000 52.25000
The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here incomes, defined by the levels of the second component, here statef as if they were separate vector structures.
stdError <- function(x) sqrt(var(x)/length(x))
incrster <-tapply(incomes, statef, stdError)
incrster
## act nsw nt qld sa tas vic wa
## 1.500000 4.310195 4.500000 4.106093 2.738613 0.500000 5.244044 2.657536
** Ordered factors **
The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.
ordered(state)
## [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa
## [20] vic qld nsw nsw wa sa act nsw vic vic act
## Levels: act < nsw < nt < qld < sa < tas < vic < wa
ordered(state, c("wa", "vic", "tas","sa","qld", "nt","nsw", "act"))
## [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa
## [20] vic qld nsw nsw wa sa act nsw vic vic act
## Levels: wa < vic < tas < sa < qld < nt < nsw < act
Array
An array can be considered as a multiply subscripted collection of data entries, for example numeric. R allows simple facilities for creating and handling arrays, and in particular the special case of matrices.
A dimension vector is a vector of non-negative integers. If its length is k then the array is k-dimensional, e.g. a matrix is a 2-dimensional array. The dimensions are indexed from one up to the values given in the dimension vector.
z <-1:1500
dim(z) <- c(3,5,100)
class(z)
## [1] "array"
dim(z)
## [1] 3 5 100
z[,1,1]
## [1] 1 2 3
z[1,,1]
## [1] 1 4 7 10 13
z[1,1,]
## [1] 1 16 31 46 61 76 91 106 121 136 151 166 181 196 211
## [16] 226 241 256 271 286 301 316 331 346 361 376 391 406 421 436
## [31] 451 466 481 496 511 526 541 556 571 586 601 616 631 646 661
## [46] 676 691 706 721 736 751 766 781 796 811 826 841 856 871 886
## [61] 901 916 931 946 961 976 991 1006 1021 1036 1051 1066 1081 1096 1111
## [76] 1126 1141 1156 1171 1186 1201 1216 1231 1246 1261 1276 1291 1306 1321 1336
## [91] 1351 1366 1381 1396 1411 1426 1441 1456 1471 1486
z[,,2]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 16 19 22 25 28
## [2,] 17 20 23 26 29
## [3,] 18 21 24 27 30
x <- array(1:20, dim=c(4,5))
x
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
i <- array(c(1:3,3:1), dim=c(3,2))
i
## [,1] [,2]
## [1,] 1 3
## [2,] 2 2
## [3,] 3 1
x[i]
## [1] 9 6 3
x[i]<-0
x
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 0 13 17
## [2,] 2 0 10 14 18
## [3,] 0 7 11 15 19
## [4,] 4 8 12 16 20
xb <-matrix(c(1:20),4,5)
xb
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
h <- 1:24
Z <- array(h, dim=c(3,4,2))
Z
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 13 16 19 22
## [2,] 14 17 20 23
## [3,] 15 18 21 24
X <- array(1, c(3,4,2))
X
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 1 1 1 1
## [3,] 1 1 1 1
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 1 1 1 1
## [3,] 1 1 1 1
D <- 2*Z*X+Z+1
D
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 4 13 22 31
## [2,] 7 16 25 34
## [3,] 10 19 28 37
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 40 49 58 67
## [2,] 43 52 61 70
## [3,] 46 55 64 73
DD <- aperm(D, c(2,1,3))
DD
## , , 1
##
## [,1] [,2] [,3]
## [1,] 4 7 10
## [2,] 13 16 19
## [3,] 22 25 28
## [4,] 31 34 37
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 40 43 46
## [2,] 49 52 55
## [3,] 58 61 64
## [4,] 67 70 73
t(D[,,1])
## [,1] [,2] [,3]
## [1,] 4 7 10
## [2,] 13 16 19
## [3,] 22 25 28
## [4,] 31 34 37
t(D[,,2])
## [,1] [,2] [,3]
## [1,] 40 43 46
## [2,] 49 52 55
## [3,] 58 61 64
## [4,] 67 70 73
cX <- cbind(D[,,1],D[,,2])
cX
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 4 13 22 31 40 49 58 67
## [2,] 7 16 25 34 43 52 61 70
## [3,] 10 19 28 37 46 55 64 73
rX <- rbind(D[,, 1],D[,,2])
rX
## [,1] [,2] [,3] [,4]
## [1,] 4 13 22 31
## [2,] 7 16 25 34
## [3,] 10 19 28 37
## [4,] 40 49 58 67
## [5,] 43 52 61 70
## [6,] 46 55 64 73
cbind(1, D[,,1], D[,,2])
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1 4 13 22 31 40 49 58 67
## [2,] 1 7 16 25 34 43 52 61 70
## [3,] 1 10 19 28 37 46 55 64 73
The concatenation function, c(), with arrays
vec1 <-as.vector(1:10)
vec1
## [1] 1 2 3 4 5 6 7 8 9 10
class(vec1)
## [1] "integer"
vec2 <-c(1:10)
vec2
## [1] 1 2 3 4 5 6 7 8 9 10
class(vec2)
## [1] "integer"
Frequancy table from factors
statefr <- table(statef)
statefr
## statef
## act nsw nt qld sa tas vic wa
## 2 6 2 5 4 2 5 4
statefr <- tapply(statef, statef, length)
statefr
## act nsw nt qld sa tas vic wa
## 2 6 2 5 4 2 5 4
factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef
table(incomef,statef)
## statef
## incomef act nsw nt qld sa tas vic wa
## (35,45] 1 1 0 1 0 0 1 0
## (45,55] 1 1 1 1 2 0 1 3
## (55,65] 0 3 1 3 2 2 2 1
## (65,75] 0 1 0 0 0 0 1 0
Lst <- list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))
length(Lst)
## [1] 4
Lst
## $name
## [1] "Fred"
##
## $wife
## [1] "Mary"
##
## $no.children
## [1] 3
##
## $child.ages
## [1] 4 7 9
Lst$name
## [1] "Fred"
Lst$wife
## [1] "Mary"
Lst$no.children
## [1] 3
Lst$child.ages
## [1] 4 7 9
Lst[[1]]
## [1] "Fred"
Lst[[2]]
## [1] "Mary"
Lst[[3]]
## [1] 3
Lst[[4]]
## [1] 4 7 9
Child_name <-c("Eric", "Chile", "Mary")
Lst_new <- c(Lst, Child_name)
length(Lst_new)
## [1] 7
Lst_new
## $name
## [1] "Fred"
##
## $wife
## [1] "Mary"
##
## $no.children
## [1] 3
##
## $child.ages
## [1] 4 7 9
##
## [[5]]
## [1] "Eric"
##
## [[6]]
## [1] "Chile"
##
## [[7]]
## [1] "Mary"
Lst1 <-list(Child_name=Child_name)
Lst <- c(Lst, Lst1)
length(Lst)
## [1] 5
Lst
## $name
## [1] "Fred"
##
## $wife
## [1] "Mary"
##
## $no.children
## [1] 3
##
## $child.ages
## [1] 4 7 9
##
## $Child_name
## [1] "Eric" "Chile" "Mary"
a data frame is a list with class “data.frame”. There are restrictions on lists that may be made into data frames, namely
The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.
Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.
Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same number of rows.
A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.
accountants <- data.frame(home=statef, loot=incomes, shot=incomef)
accountants
## home loot shot
## 1 tas 60 (55,65]
## 2 sa 49 (45,55]
## 3 qld 40 (35,45]
## 4 nsw 61 (55,65]
## 5 nsw 64 (55,65]
## 6 nt 60 (55,65]
## 7 wa 59 (55,65]
## 8 wa 54 (45,55]
## 9 qld 62 (55,65]
## 10 vic 69 (65,75]
## 11 nsw 70 (65,75]
## 12 vic 42 (35,45]
## 13 qld 56 (55,65]
## 14 qld 61 (55,65]
## 15 sa 61 (55,65]
## 16 tas 61 (55,65]
## 17 sa 58 (55,65]
## 18 nt 51 (45,55]
## 19 wa 48 (45,55]
## 20 vic 65 (55,65]
## 21 qld 49 (45,55]
## 22 nsw 49 (45,55]
## 23 nsw 41 (35,45]
## 24 wa 48 (45,55]
## 25 sa 52 (45,55]
## 26 act 46 (45,55]
## 27 nsw 59 (55,65]
## 28 vic 46 (45,55]
## 29 vic 58 (55,65]
## 30 act 43 (35,45]
attach(accountants)
home
## [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa
## [20] vic qld nsw nsw wa sa act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa
loot
## [1] 60 49 40 61 64 60 59 54 62 69 70 42 56 61 61 61 58 51 48 65 49 49 41 48 52
## [26] 46 59 46 58 43
shot
## [1] (55,65] (45,55] (35,45] (55,65] (55,65] (55,65] (55,65] (45,55] (55,65]
## [10] (65,75] (65,75] (35,45] (55,65] (55,65] (55,65] (55,65] (55,65] (45,55]
## [19] (45,55] (55,65] (45,55] (45,55] (35,45] (45,55] (45,55] (45,55] (55,65]
## [28] (45,55] (55,65] (35,45]
## Levels: (35,45] (45,55] (55,65] (65,75]
detach()
Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. R input facilities are simple and their requirements are fairly strict and even rather inflexible. There is a clear presumption by the designers of R that you will be able to modify your input files using other tools,
The read.csv() function
To read an entire data frame directly, the external file will normally have a special form.
The first line of the file should have a name for each variable in the data frame.
Each additional line of the file has as its first item a row label and the values for each variable.
BankData <- read.csv("BANK1.csv")
BankData[1:5, ]
## X Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 1 1 1 3 1 92 69 Male 1 No 32.0
## 2 2 2 1 1 81 57 Female 1 No 39.1
## 3 3 3 1 1 83 60 Female 0 No 33.2
## 4 4 4 2 1 87 55 Female 7 No 30.6
## 5 5 5 3 1 92 67 Male 0 No 29.0
BankData <- read.csv("BANK1.csv", head=FALSE)
BankData[1:5, ]
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 NA Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 2 1 1 3 1 92 69 Male 1 No 32
## 3 2 2 1 1 81 57 Female 1 No 39.1
## 4 3 3 1 1 83 60 Female 0 No 33.2
## 5 4 4 2 1 87 55 Female 7 No 30.6
Accessing builtin datasets
data()
data(infert)
Loading data from other R packages
data(package="rpart")
data(Puromycin, package="datasets")
attributes(Puromycin)
## $names
## [1] "conc" "rate" "state"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
##
## $class
## [1] "data.frame"
##
## $reference
## [1] "A1.3, p. 269"
Puromycin
## conc rate state
## 1 0.02 76 treated
## 2 0.02 47 treated
## 3 0.06 97 treated
## 4 0.06 107 treated
## 5 0.11 123 treated
## 6 0.11 139 treated
## 7 0.22 159 treated
## 8 0.22 152 treated
## 9 0.56 191 treated
## 10 0.56 201 treated
## 11 1.10 207 treated
## 12 1.10 200 treated
## 13 0.02 67 untreated
## 14 0.02 51 untreated
## 15 0.06 84 untreated
## 16 0.06 86 untreated
## 17 0.11 98 untreated
## 18 0.11 115 untreated
## 19 0.22 131 untreated
## 20 0.22 124 untreated
## 21 0.56 144 untreated
## 22 0.56 158 untreated
## 23 1.10 160 untreated
edit(Puromycin)
## conc rate state
## 1 0.02 76 treated
## 2 0.02 47 treated
## 3 0.06 97 treated
## 4 0.06 107 treated
## 5 0.11 123 treated
## 6 0.11 139 treated
## 7 0.22 159 treated
## 8 0.22 152 treated
## 9 0.56 191 treated
## 10 0.56 201 treated
## 11 1.10 207 treated
## 12 1.10 200 treated
## 13 0.02 67 untreated
## 14 0.02 51 untreated
## 15 0.06 84 untreated
## 16 0.06 86 untreated
## 17 0.11 98 untreated
## 18 0.11 115 untreated
## 19 0.22 131 untreated
## 20 0.22 124 untreated
## 21 0.56 144 untreated
## 22 0.56 158 untreated
## 23 2.20 160 untreated
All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code.
library()
## Warning in library(): libraries '/usr/local/lib/R/site-library', '/usr/lib/R/
## site-library' contain no packages
library(boot) # to load a particular package
search() # to see which packages are currently loaded
## [1] ".GlobalEnv" "package:boot" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
# Some packages may be loaded but not available on the search list, these will be included in the list given by
loadedNamespaces()
## [1] "grDevices" "digest" "R6" "jsonlite" "magrittr" "evaluate"
## [7] "datasets" "stringi" "rlang" "cachem" "utils" "cli"
## [13] "jquerylib" "bslib" "graphics" "boot" "rmarkdown" "base"
## [19] "tools" "stringr" "xfun" "yaml" "fastmap" "compiler"
## [25] "stats" "htmltools" "knitr" "methods" "sass"
Standard packages
The standard (or base) packages are considered part of the R source code. They contain the basic functions that allow R to work, and the datasets and standard statistical and graphical functions that are described in this manual. They should be automatically available in any R installation.
Contributed packages and CRAN
There are thousands of contributed packages for R, written by many different authors.
Some of these packages implement specialized statistical methods, others give access to data or hardware, and others are designed to complement textbooks.
Some (the recommended packages) are distributed with every binary distribution of R. Most are available for download from CRAN (https://CRAN.R-project.org/ and its mirrors) and other repositories such as Bioconductor (https://www.bioconductor.org/).
The R FAQ contains a list of CRAN packages current at the time of release, but the collection of available packages changes very frequently.
Namespace
Packages have namespaces, which do three things:
they allow the package writer to hide functions and data that are meant only for internal use,
they prevent functions from breaking when a user (or other package writer) picks a name that clashes with one in the package,
and they provide a way to refer to an object within a particular package.
The double-colon operator :: selects definitions from a particular namespace.