1. Introduction and preliminaries

 

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has

 

R and statistics

It is an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN family of Internet sites (via https://CRAN.R-project.org) and elsewhere.

 

Using R interactively

help(solve)

?solve

help("[[")

help.start()
## starting httpd help server ... done
## If the browser launched by 'xdg-open' is already running, it is *not*
##     restarted, and you must switch to its window.
## Otherwise, be patient ...

 

??solve

help.search("solve")

 

Data permanency and removing objects

x <- 5

y <- 7

z <-seq(1,10)

z
##  [1]  1  2  3  4  5  6  7  8  9 10
objects()
## [1] "x" "y" "z"
ls()
## [1] "x" "y" "z"

can be used to display the names of (most of) the objects which are currently stored within R. The collection of objects currently stored is called the workspace.

 

rm(x, y, z)

ls()
## character(0)

 

2. Simple manipulations; numbers and vectors

 

x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

x
## [1] 10.4  5.6  3.1  6.4 21.7
assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

x
## [1] 10.4  5.6  3.1  6.4 21.7
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x

x
## [1] 10.4  5.6  3.1  6.4 21.7
1/x
## [1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
y <-c(x,0,x)

ls()
## [1] "x" "y"
x
## [1] 10.4  5.6  3.1  6.4 21.7
y
##  [1] 10.4  5.6  3.1  6.4 21.7  0.0 10.4  5.6  3.1  6.4 21.7

 

v <- 2*x+y+1
## Warning in 2 * x + y: longer object length is not a multiple of shorter object
## length
v
##  [1] 32.2 17.8 10.3 20.2 66.1 21.8 22.6 12.8 16.9 50.8 43.5

generates a new vector v of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times.

x/y
## Warning in x/y: longer object length is not a multiple of shorter object length
##  [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000       Inf 0.5384615
##  [8] 0.5535714 2.0645161 3.3906250 0.4792627
x*y
## Warning in x * y: longer object length is not a multiple of shorter object
## length
##  [1] 108.16  31.36   9.61  40.96 470.89   0.00  58.24  17.36  19.84 138.88
## [11] 225.68
x-y
## Warning in x - y: longer object length is not a multiple of shorter object
## length
##  [1]   0.0   0.0   0.0   0.0   0.0  10.4  -4.8  -2.5   3.3  15.3 -11.3
log(x)
## [1] 2.341806 1.722767 1.131402 1.856298 3.077312
exp(x)
## [1] 3.285963e+04 2.704264e+02 2.219795e+01 6.018450e+02 2.655769e+09
sin(x)
## [1] -0.82782647 -0.63126664  0.04158066  0.11654920  0.28705265
cos(x)
## [1] -0.5609843  0.7755659 -0.9991352  0.9931849 -0.9579148
tan(x)
## [1]  1.47566791 -0.81394328 -0.04161665  0.11734895 -0.29966407
sqrt(x)
## [1] 3.224903 2.366432 1.760682 2.529822 4.658326
min(x)
## [1] 3.1
max(x)
## [1] 21.7
sum(x)
## [1] 47.2
range(x)
## [1]  3.1 21.7
length(x)
## [1] 5

 

Two statistical functions are mean(x) which calculates the sample mean, which is the same as sum(x)/length(x), and var(x) which gives sum((x-mean(x))^2)/(length(x)-1)

mean(x)
## [1] 9.44
var(x)
## [1] 53.853
sd(x)
## [1] 7.33846
sqrt(var(x))
## [1] 7.33846
sum(x)
## [1] 47.2
length(x)
## [1] 5
sum(x)/length(x)
## [1] 9.44
sum((x-mean(x))^2)/(length(x)-1)
## [1] 53.853

 

To work with complex numbers, supply an explicit complex part. Thus

sqrt(-17)
## Warning in sqrt(-17): NaNs produced
## [1] NaN
sqrt(-17+0i)
## [1] 0+4.123106i

 

x <-1:30

x
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
y <-2*1:15

y
##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30
x1 <-seq(1,30)


x1
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
seq(from=30, to=1)
##  [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6
## [26]  5  4  3  2  1
seq(-5,5, by=.2)
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8
## [31]  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6  3.8
## [46]  4.0  4.2  4.4  4.6  4.8  5.0
seq(length=51, from=-5, by=.2)
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8
## [31]  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6  3.8
## [46]  4.0  4.2  4.4  4.6  4.8  5.0
rep(x,times=5)
##   [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
##  [26] 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
##  [51] 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##  [76] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10
## [101] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5
## [126]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
rep(x, each=5)
##   [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5
##  [26]  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10 10 10 10 10
##  [51] 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15 15
##  [76] 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19 19 19 19 20 20 20 20 20
## [101] 21 21 21 21 21 22 22 22 22 22 23 23 23 23 23 24 24 24 24 24 25 25 25 25 25
## [126] 26 26 26 26 26 27 27 27 27 27 28 28 28 28 28 29 29 29 29 29 30 30 30 30 30

 

x <-seq(2,30, by=2)-1

x
##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29
temp <- x>13

temp
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE
temp <- x>=13

temp
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE
temp <- x<13

temp
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp <- x<=13

temp
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp1 <- x==13

temp1
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
temp2 <- x!=13

temp2
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE
!temp1
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE
temp1|temp2
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
temp1&temp2
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE

 

In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA.

z <- c(1:3,NA) 

ind <- is.na(z)

ind 
## [1] FALSE FALSE FALSE  TRUE

 

Note that there is a second kind of “missing” values which are produced by numerical computation, the so-called Not a Number, NaN, values. Examples are

0/0
## [1] NaN
Inf-Inf
## [1] NaN

 

x <-c("x-value", "New iteration reusults")

x
## [1] "x-value"                "New iteration reusults"
x[1]
## [1] "x-value"
x[2]
## [1] "New iteration reusults"
length(x)
## [1] 2

The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings.

labs <- paste(c("X","Y"), 1:10, sep="")

labs
##  [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"

 

 

A logical vector

x <- c(1:3,NA)

x
## [1]  1  2  3 NA
y<-x[!is.na(x)]

y
## [1] 1 2 3
x[is.na(x)] <- 4

x
## [1] 1 2 3 4

 

A vector of positive integral quantities.

x <-seq(1,15)
x
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
x[1:10]
##  [1]  1  2  3  4  5  6  7  8  9 10
y<-seq(16,30)

c(x,y)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
c("x","y")[rep(c(1,2,2,1),times=4)]
##  [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x"

 

A vector of negative integral quantities.

x
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
y <- x[-(1:5)]

y
##  [1]  6  7  8  9 10 11 12 13 14 15

 

A vector of character strings.

fruit <- c(5, 10, 1, 20)

names(fruit) <- c("orange", "banana", "apple", "peach")

fruit
## orange banana  apple  peach 
##      5     10      1     20
lunch <- fruit[c("apple","orange")]

lunch
##  apple orange 
##      1      5

 

3. Objects, their modes and attributes

 

The entities R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode, namely numeric 1 , complex, logical, character and raw.

z <-0:9

mode(z)
## [1] "numeric"
length(z)
## [1] 10
digits <- as.character(z)

mode(digits)
## [1] "character"
length(digits)
## [1] 10
d <- as.integer(digits)

mode(d)
## [1] "numeric"
length(d)
## [1] 10

 

e<-numeric()  # An "empty" object

mode(e)
## [1] "numeric"
length(e)
## [1] 0
e[3] <-17

mode(e)
## [1] "numeric"
length(e)
## [1] 3
alpha <- 2*1:5

alpha
## [1]  2  4  6  8 10
length(alpha) <- 3 # truncate the size of alpha

alpha
## [1] 2 4 6

 

z <-1:100

z
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100
attr(z,"dim") <-c(10,10)

z
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1   11   21   31   41   51   61   71   81    91
##  [2,]    2   12   22   32   42   52   62   72   82    92
##  [3,]    3   13   23   33   43   53   63   73   83    93
##  [4,]    4   14   24   34   44   54   64   74   84    94
##  [5,]    5   15   25   35   45   55   65   75   85    95
##  [6,]    6   16   26   36   46   56   66   76   86    96
##  [7,]    7   17   27   37   47   57   67   77   87    97
##  [8,]    8   18   28   38   48   58   68   78   88    98
##  [9,]    9   19   29   39   49   59   69   79   89    99
## [10,]   10   20   30   40   50   60   70   80   90   100
attributes(z)
## $dim
## [1] 10 10

 

All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example “numeric”, “logical”, “character” or “list”, but “matrix”, “array”,“factor” and “data.frame” are other possible values.

z <-as.data.frame(z)

z
##    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1   1 11 21 31 41 51 61 71 81  91
## 2   2 12 22 32 42 52 62 72 82  92
## 3   3 13 23 33 43 53 63 73 83  93
## 4   4 14 24 34 44 54 64 74 84  94
## 5   5 15 25 35 45 55 65 75 85  95
## 6   6 16 26 36 46 56 66 76 86  96
## 7   7 17 27 37 47 57 67 77 87  97
## 8   8 18 28 38 48 58 68 78 88  98
## 9   9 19 29 39 49 59 69 79 89  99
## 10 10 20 30 40 50 60 70 80 90 100
unclass(z)
## $V1
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $V2
##  [1] 11 12 13 14 15 16 17 18 19 20
## 
## $V3
##  [1] 21 22 23 24 25 26 27 28 29 30
## 
## $V4
##  [1] 31 32 33 34 35 36 37 38 39 40
## 
## $V5
##  [1] 41 42 43 44 45 46 47 48 49 50
## 
## $V6
##  [1] 51 52 53 54 55 56 57 58 59 60
## 
## $V7
##  [1] 61 62 63 64 65 66 67 68 69 70
## 
## $V8
##  [1] 71 72 73 74 75 76 77 78 79 80
## 
## $V9
##  [1] 81 82 83 84 85 86 87 88 89 90
## 
## $V10
##  [1]  91  92  93  94  95  96  97  98  99 100
## 
## attr(,"row.names")
##  [1]  1  2  3  4  5  6  7  8  9 10
z$V1
##  [1]  1  2  3  4  5  6  7  8  9 10

 

4. ordered and unordered factors

 

Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia1 and their individual state of origin is specified by a character vector of state mnemonics as

state <-c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")

length(state)
## [1] 30
statef <-factor(state)


statef
##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa
levels(statef)
## [1] "act" "nsw" "nt"  "qld" "sa"  "tas" "vic" "wa"

 

incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)

incrmean <-tapply(incomes, statef, mean)

incrmean
##      act      nsw       nt      qld       sa      tas      vic       wa 
## 44.50000 57.33333 55.50000 53.60000 55.00000 60.50000 56.00000 52.25000

The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here incomes, defined by the levels of the second component, here statef as if they were separate vector structures.

stdError <- function(x) sqrt(var(x)/length(x))

incrster <-tapply(incomes, statef, stdError)

incrster
##      act      nsw       nt      qld       sa      tas      vic       wa 
## 1.500000 4.310195 4.500000 4.106093 2.738613 0.500000 5.244044 2.657536

** Ordered factors **

The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.

ordered(state)
##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act < nsw < nt < qld < sa < tas < vic < wa
ordered(state, c("wa", "vic", "tas","sa","qld", "nt","nsw", "act"))
##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: wa < vic < tas < sa < qld < nt < nsw < act

 

5. Array and matrices

z <-1:1500

dim(z) <- c(3,5,100)

class(z)
## [1] "array"
dim(z)
## [1]   3   5 100
z[,1,1]
## [1] 1 2 3
z[1,,1]
## [1]  1  4  7 10 13
z[1,1,]
##   [1]    1   16   31   46   61   76   91  106  121  136  151  166  181  196  211
##  [16]  226  241  256  271  286  301  316  331  346  361  376  391  406  421  436
##  [31]  451  466  481  496  511  526  541  556  571  586  601  616  631  646  661
##  [46]  676  691  706  721  736  751  766  781  796  811  826  841  856  871  886
##  [61]  901  916  931  946  961  976  991 1006 1021 1036 1051 1066 1081 1096 1111
##  [76] 1126 1141 1156 1171 1186 1201 1216 1231 1246 1261 1276 1291 1306 1321 1336
##  [91] 1351 1366 1381 1396 1411 1426 1441 1456 1471 1486
z[,,2]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   16   19   22   25   28
## [2,]   17   20   23   26   29
## [3,]   18   21   24   27   30

 

x <- array(1:20, dim=c(4,5))

x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20
i <- array(c(1:3,3:1), dim=c(3,2))

i
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    2
## [3,]    3    1
x[i]
## [1] 9 6 3
x[i]<-0

x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    0   13   17
## [2,]    2    0   10   14   18
## [3,]    0    7   11   15   19
## [4,]    4    8   12   16   20
xb <-matrix(c(1:20),4,5)

xb
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20

 

h <- 1:24

Z <- array(h, dim=c(3,4,2))

Z
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24
X <- array(1, c(3,4,2))

X
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1
D <- 2*Z*X+Z+1

D
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    4   13   22   31
## [2,]    7   16   25   34
## [3,]   10   19   28   37
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   40   49   58   67
## [2,]   43   52   61   70
## [3,]   46   55   64   73
DD <- aperm(D, c(2,1,3))

DD
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]   13   16   19
## [3,]   22   25   28
## [4,]   31   34   37
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   40   43   46
## [2,]   49   52   55
## [3,]   58   61   64
## [4,]   67   70   73
t(D[,,1])
##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]   13   16   19
## [3,]   22   25   28
## [4,]   31   34   37
t(D[,,2])
##      [,1] [,2] [,3]
## [1,]   40   43   46
## [2,]   49   52   55
## [3,]   58   61   64
## [4,]   67   70   73
cX <- cbind(D[,,1],D[,,2])

cX
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    4   13   22   31   40   49   58   67
## [2,]    7   16   25   34   43   52   61   70
## [3,]   10   19   28   37   46   55   64   73
rX <- rbind(D[,, 1],D[,,2])

rX
##      [,1] [,2] [,3] [,4]
## [1,]    4   13   22   31
## [2,]    7   16   25   34
## [3,]   10   19   28   37
## [4,]   40   49   58   67
## [5,]   43   52   61   70
## [6,]   46   55   64   73
cbind(1, D[,,1], D[,,2])
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    1    4   13   22   31   40   49   58   67
## [2,]    1    7   16   25   34   43   52   61   70
## [3,]    1   10   19   28   37   46   55   64   73

 

The concatenation function, c(), with arrays

vec1 <-as.vector(1:10)

vec1
##  [1]  1  2  3  4  5  6  7  8  9 10
class(vec1)
## [1] "integer"
vec2 <-c(1:10)

vec2 
##  [1]  1  2  3  4  5  6  7  8  9 10
class(vec2)
## [1] "integer"

  Frequancy table from factors

statefr <- table(statef)

statefr
## statef
## act nsw  nt qld  sa tas vic  wa 
##   2   6   2   5   4   2   5   4
statefr <- tapply(statef, statef, length)

statefr 
## act nsw  nt qld  sa tas vic  wa 
##   2   6   2   5   4   2   5   4
factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef

table(incomef,statef)
##          statef
## incomef   act nsw nt qld sa tas vic wa
##   (35,45]   1   1  0   1  0   0   1  0
##   (45,55]   1   1  1   1  2   0   1  3
##   (55,65]   0   3  1   3  2   2   2  1
##   (65,75]   0   1  0   0  0   0   1  0

 

6. List and Data Frames

 

Lst <- list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))

length(Lst)
## [1] 4
Lst
## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
Lst$name
## [1] "Fred"
Lst$wife
## [1] "Mary"
Lst$no.children
## [1] 3
Lst$child.ages
## [1] 4 7 9
Lst[[1]]
## [1] "Fred"
Lst[[2]]
## [1] "Mary"
Lst[[3]]
## [1] 3
Lst[[4]]
## [1] 4 7 9
Child_name <-c("Eric", "Chile", "Mary")

Lst_new <- c(Lst, Child_name)

length(Lst_new)
## [1] 7
Lst_new
## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
## 
## [[5]]
## [1] "Eric"
## 
## [[6]]
## [1] "Chile"
## 
## [[7]]
## [1] "Mary"
Lst1 <-list(Child_name=Child_name)

Lst <- c(Lst, Lst1)

length(Lst)
## [1] 5
Lst
## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
## 
## $Child_name
## [1] "Eric"  "Chile" "Mary"

a data frame is a list with class “data.frame”. There are restrictions on lists that may be made into data frames, namely

A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.

accountants <- data.frame(home=statef, loot=incomes, shot=incomef)

accountants
##    home loot    shot
## 1   tas   60 (55,65]
## 2    sa   49 (45,55]
## 3   qld   40 (35,45]
## 4   nsw   61 (55,65]
## 5   nsw   64 (55,65]
## 6    nt   60 (55,65]
## 7    wa   59 (55,65]
## 8    wa   54 (45,55]
## 9   qld   62 (55,65]
## 10  vic   69 (65,75]
## 11  nsw   70 (65,75]
## 12  vic   42 (35,45]
## 13  qld   56 (55,65]
## 14  qld   61 (55,65]
## 15   sa   61 (55,65]
## 16  tas   61 (55,65]
## 17   sa   58 (55,65]
## 18   nt   51 (45,55]
## 19   wa   48 (45,55]
## 20  vic   65 (55,65]
## 21  qld   49 (45,55]
## 22  nsw   49 (45,55]
## 23  nsw   41 (35,45]
## 24   wa   48 (45,55]
## 25   sa   52 (45,55]
## 26  act   46 (45,55]
## 27  nsw   59 (55,65]
## 28  vic   46 (45,55]
## 29  vic   58 (55,65]
## 30  act   43 (35,45]
attach(accountants)

home
##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa
loot 
##  [1] 60 49 40 61 64 60 59 54 62 69 70 42 56 61 61 61 58 51 48 65 49 49 41 48 52
## [26] 46 59 46 58 43
shot 
##  [1] (55,65] (45,55] (35,45] (55,65] (55,65] (55,65] (55,65] (45,55] (55,65]
## [10] (65,75] (65,75] (35,45] (55,65] (55,65] (55,65] (55,65] (55,65] (45,55]
## [19] (45,55] (55,65] (45,55] (45,55] (35,45] (45,55] (45,55] (45,55] (55,65]
## [28] (45,55] (55,65] (35,45]
## Levels: (35,45] (45,55] (55,65] (65,75]
detach()

 

7. Reading data from files

 

Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. R input facilities are simple and their requirements are fairly strict and even rather inflexible. There is a clear presumption by the designers of R that you will be able to modify your input files using other tools,

 

The read.csv() function

To read an entire data frame directly, the external file will normally have a special form.

BankData <- read.csv("BANK1.csv")

BankData[1:5, ]
##   X Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 1 1        1       3        1      92     69   Male        1    No   32.0
## 2 2        2       1        1      81     57 Female        1    No   39.1
## 3 3        3       1        1      83     60 Female        0    No   33.2
## 4 4        4       2        1      87     55 Female        7    No   30.6
## 5 5        5       3        1      92     67   Male        0    No   29.0
BankData <- read.csv("BANK1.csv",  head=FALSE)

BankData[1:5, ]
##   V1       V2      V3       V4      V5     V6     V7       V8    V9    V10
## 1 NA Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 2  1        1       3        1      92     69   Male        1    No     32
## 3  2        2       1        1      81     57 Female        1    No   39.1
## 4  3        3       1        1      83     60 Female        0    No   33.2
## 5  4        4       2        1      87     55 Female        7    No   30.6

 

Accessing builtin datasets

data()

data(infert)

Loading data from other R packages

data(package="rpart")

data(Puromycin, package="datasets")

attributes(Puromycin)
## $names
## [1] "conc"  "rate"  "state"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## 
## $class
## [1] "data.frame"
## 
## $reference
## [1] "A1.3, p. 269"
Puromycin
##    conc rate     state
## 1  0.02   76   treated
## 2  0.02   47   treated
## 3  0.06   97   treated
## 4  0.06  107   treated
## 5  0.11  123   treated
## 6  0.11  139   treated
## 7  0.22  159   treated
## 8  0.22  152   treated
## 9  0.56  191   treated
## 10 0.56  201   treated
## 11 1.10  207   treated
## 12 1.10  200   treated
## 13 0.02   67 untreated
## 14 0.02   51 untreated
## 15 0.06   84 untreated
## 16 0.06   86 untreated
## 17 0.11   98 untreated
## 18 0.11  115 untreated
## 19 0.22  131 untreated
## 20 0.22  124 untreated
## 21 0.56  144 untreated
## 22 0.56  158 untreated
## 23 1.10  160 untreated
edit(Puromycin)
##    conc rate     state
## 1  0.02   76   treated
## 2  0.02   47   treated
## 3  0.06   97   treated
## 4  0.06  107   treated
## 5  0.11  123   treated
## 6  0.11  139   treated
## 7  0.22  159   treated
## 8  0.22  152   treated
## 9  0.56  191   treated
## 10 0.56  201   treated
## 11 1.10  207   treated
## 12 1.10  200   treated
## 13 0.02   67 untreated
## 14 0.02   51 untreated
## 15 0.06   84 untreated
## 16 0.06   86 untreated
## 17 0.11   98 untreated
## 18 0.11  115 untreated
## 19 0.22  131 untreated
## 20 0.22  124 untreated
## 21 0.56  144 untreated
## 22 0.56  158 untreated
## 23 2.20  160 untreated

Packages

All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code.

library()
## Warning in library(): libraries '/usr/local/lib/R/site-library', '/usr/lib/R/
## site-library' contain no packages
library(boot) # to load a particular package

search() # to see which packages are currently loaded
##  [1] ".GlobalEnv"        "package:boot"      "package:stats"    
##  [4] "package:graphics"  "package:grDevices" "package:utils"    
##  [7] "package:datasets"  "package:methods"   "Autoloads"        
## [10] "package:base"
# Some packages may be loaded but not available on the search list, these will be included in the list given by

loadedNamespaces() 
##  [1] "grDevices" "digest"    "R6"        "jsonlite"  "magrittr"  "evaluate" 
##  [7] "datasets"  "stringi"   "rlang"     "cachem"    "utils"     "cli"      
## [13] "jquerylib" "bslib"     "graphics"  "boot"      "rmarkdown" "base"     
## [19] "tools"     "stringr"   "xfun"      "yaml"      "fastmap"   "compiler" 
## [25] "stats"     "htmltools" "knitr"     "methods"   "sass"

 

Standard packages

The standard (or base) packages are considered part of the R source code. They contain the basic functions that allow R to work, and the datasets and standard statistical and graphical functions that are described in this manual. They should be automatically available in any R installation.

 

Contributed packages and CRAN

 

Namespace

Packages have namespaces, which do three things: