R introduction

1. Introduction and preliminaries

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has

an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular matrices,
a large, coherent, integrated collection of intermediate tools for data analysis,
graphical facilities for data analysis and display either directly at the computer or on hard- copy, and
a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages.

R and statistics

It is an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN family of Internet sites (via https://CRAN.R-project.org) and elsewhere.

Using R interactively

Make the working directory
Getting help with functions and features

help(solve)

?solve

help("[[")

help.start()

## starting httpd help server ... done

## If the browser launched by 'xdg-open' is already running, it is *not*
##     restarted, and you must switch to its window.
## Otherwise, be patient ...

The help.search command (alternatively ??) allows searching for help in various ways. For example,

??solve

help.search("solve")

Data permanency and removing objects

The entities that R creates and manipulates are known as objects. These may be variables, arrays of numbers, character strings, functions, or more general structures built from such components.
During an R session, objects are created and stored by name.
The R command

x <- 5

y <- 7

z <-seq(1,10)

z

##  [1]  1  2  3  4  5  6  7  8  9 10

objects()

## [1] "x" "y" "z"

ls()

## [1] "x" "y" "z"

can be used to display the names of (most of) the objects which are currently stored within R. The collection of objects currently stored is called the workspace.

To remove objects the function rm is available:

rm(x, y, z)

ls()

## character(0)

2. Simple manipulations; numbers and vectors

Vectors and assignment

x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

x

## [1] 10.4  5.6  3.1  6.4 21.7

assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

x

## [1] 10.4  5.6  3.1  6.4 21.7

c(10.4, 5.6, 3.1, 6.4, 21.7) -> x

x

## [1] 10.4  5.6  3.1  6.4 21.7

1/x

## [1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295

y <-c(x,0,x)

ls()

## [1] "x" "y"

## [1] 10.4  5.6  3.1  6.4 21.7

##  [1] 10.4  5.6  3.1  6.4 21.7  0.0 10.4  5.6  3.1  6.4 21.7

Vector arithmetic

v <- 2*x+y+1

## Warning in 2 * x + y: longer object length is not a multiple of shorter object
## length

##  [1] 32.2 17.8 10.3 20.2 66.1 21.8 22.6 12.8 16.9 50.8 43.5

generates a new vector v of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times.

x/y

## Warning in x/y: longer object length is not a multiple of shorter object length

##  [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000       Inf 0.5384615
##  [8] 0.5535714 2.0645161 3.3906250 0.4792627

x*y

## Warning in x * y: longer object length is not a multiple of shorter object
## length

##  [1] 108.16  31.36   9.61  40.96 470.89   0.00  58.24  17.36  19.84 138.88
## [11] 225.68

x-y

## Warning in x - y: longer object length is not a multiple of shorter object
## length

##  [1]   0.0   0.0   0.0   0.0   0.0  10.4  -4.8  -2.5   3.3  15.3 -11.3

log(x)

## [1] 2.341806 1.722767 1.131402 1.856298 3.077312

exp(x)

## [1] 3.285963e+04 2.704264e+02 2.219795e+01 6.018450e+02 2.655769e+09

sin(x)

## [1] -0.82782647 -0.63126664  0.04158066  0.11654920  0.28705265

cos(x)

## [1] -0.5609843  0.7755659 -0.9991352  0.9931849 -0.9579148

tan(x)

## [1]  1.47566791 -0.81394328 -0.04161665  0.11734895 -0.29966407

sqrt(x)

## [1] 3.224903 2.366432 1.760682 2.529822 4.658326

min(x)

## [1] 3.1

max(x)

## [1] 21.7

sum(x)

## [1] 47.2

range(x)

## [1]  3.1 21.7

length(x)

## [1] 5

Two statistical functions are mean(x) which calculates the sample mean, which is the same as sum(x)/length(x), and var(x) which gives sum((x-mean(x))^2)/(length(x)-1)

mean(x)

## [1] 9.44

var(x)

## [1] 53.853

sd(x)

## [1] 7.33846

sqrt(var(x))

## [1] 7.33846

sum(x)

## [1] 47.2

length(x)

## [1] 5

sum(x)/length(x)

## [1] 9.44

sum((x-mean(x))^2)/(length(x)-1)

## [1] 53.853

To work with complex numbers, supply an explicit complex part. Thus

sqrt(-17)

## Warning in sqrt(-17): NaNs produced

## [1] NaN

sqrt(-17+0i)

## [1] 0+4.123106i

Generating regular sequences

x <-1:30

x

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

y <-2*1:15

y

##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30

x1 <-seq(1,30)


x1

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

seq(from=30, to=1)

##  [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6
## [26]  5  4  3  2  1

seq(-5,5, by=.2)

##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8
## [31]  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6  3.8
## [46]  4.0  4.2  4.4  4.6  4.8  5.0

seq(length=51, from=-5, by=.2)

##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2
## [16] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8
## [31]  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6  3.8
## [46]  4.0  4.2  4.4  4.6  4.8  5.0

rep(x,times=5)

##   [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
##  [26] 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
##  [51] 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##  [76] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10
## [101] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  1  2  3  4  5
## [126]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

rep(x, each=5)

##   [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5
##  [26]  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10 10 10 10 10
##  [51] 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15 15
##  [76] 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19 19 19 19 20 20 20 20 20
## [101] 21 21 21 21 21 22 22 22 22 22 23 23 23 23 23 24 24 24 24 24 25 25 25 25 25
## [126] 26 26 26 26 26 27 27 27 27 27 28 28 28 28 28 29 29 29 29 29 30 30 30 30 30

Logical vectors

x <-seq(2,30, by=2)-1

x

##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29

temp <- x>13

temp

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE

temp <- x>=13

temp

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE

temp <- x<13

temp

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE

temp <- x<=13

temp

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE

temp1 <- x==13

temp1

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE

temp2 <- x!=13

temp2

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE

!temp1

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE

temp1|temp2

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

temp1&temp2

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE

Missing values

In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA.

z <- c(1:3,NA) 

ind <- is.na(z)

ind

## [1] FALSE FALSE FALSE  TRUE

Note that there is a second kind of “missing” values which are produced by numerical computation, the so-called Not a Number, NaN, values. Examples are

0/0

## [1] NaN

Inf-Inf

## [1] NaN

Character vectors

x <-c("x-value", "New iteration reusults")

x

## [1] "x-value"                "New iteration reusults"

x[1]

## [1] "x-value"

x[2]

## [1] "New iteration reusults"

length(x)

## [1] 2

The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings.

labs <- paste(c("X","Y"), 1:10, sep="")

labs

##  [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"

Index vectors; selecting and modifying subsets of a data set

A logical vector

x <- c(1:3,NA)

x

## [1]  1  2  3 NA

y<-x[!is.na(x)]

y

## [1] 1 2 3

x[is.na(x)] <- 4

x

## [1] 1 2 3 4

A vector of positive integral quantities.

x <-seq(1,15)
x

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

x[1:10]

##  [1]  1  2  3  4  5  6  7  8  9 10

y<-seq(16,30)

c(x,y)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

c("x","y")[rep(c(1,2,2,1),times=4)]

##  [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x"

A vector of negative integral quantities.

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

y <- x[-(1:5)]

y

##  [1]  6  7  8  9 10 11 12 13 14 15

A vector of character strings.

fruit <- c(5, 10, 1, 20)

names(fruit) <- c("orange", "banana", "apple", "peach")

fruit

## orange banana  apple  peach 
##      5     10      1     20

lunch <- fruit[c("apple","orange")]

lunch

##  apple orange 
##      1      5

3. Objects, their modes and attributes

The entities R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode, namely numeric 1 , complex, logical, character and raw.

Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw.
R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. lists are known as recursive rather than atomic structures since their components can themselves be lists in their own right.
The other recursive structures are those of mode function and expression.

z <-0:9

mode(z)

## [1] "numeric"

length(z)

## [1] 10

digits <- as.character(z)

mode(digits)

## [1] "character"

length(digits)

## [1] 10

d <- as.integer(digits)

mode(d)

## [1] "numeric"

length(d)

## [1] 10

Changin the length of an object

e<-numeric()  # An "empty" object

mode(e)

## [1] "numeric"

length(e)

## [1] 0

e[3] <-17

mode(e)

## [1] "numeric"

length(e)

## [1] 3

alpha <- 2*1:5

alpha

## [1]  2  4  6  8 10

length(alpha) <- 3 # truncate the size of alpha

alpha

## [1] 2 4 6

Getting and setting attributes
The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object.
The function attr(object, name) can be used to select a specific attribute.

z <-1:100

z

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

attr(z,"dim") <-c(10,10)

z

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1   11   21   31   41   51   61   71   81    91
##  [2,]    2   12   22   32   42   52   62   72   82    92
##  [3,]    3   13   23   33   43   53   63   73   83    93
##  [4,]    4   14   24   34   44   54   64   74   84    94
##  [5,]    5   15   25   35   45   55   65   75   85    95
##  [6,]    6   16   26   36   46   56   66   76   86    96
##  [7,]    7   17   27   37   47   57   67   77   87    97
##  [8,]    8   18   28   38   48   58   68   78   88    98
##  [9,]    9   19   29   39   49   59   69   79   89    99
## [10,]   10   20   30   40   50   60   70   80   90   100

attributes(z)

## $dim
## [1] 10 10

The class of an object

All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example “numeric”, “logical”, “character” or “list”, but “matrix”, “array”,“factor” and “data.frame” are other possible values.

z <-as.data.frame(z)

z

##    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1   1 11 21 31 41 51 61 71 81  91
## 2   2 12 22 32 42 52 62 72 82  92
## 3   3 13 23 33 43 53 63 73 83  93
## 4   4 14 24 34 44 54 64 74 84  94
## 5   5 15 25 35 45 55 65 75 85  95
## 6   6 16 26 36 46 56 66 76 86  96
## 7   7 17 27 37 47 57 67 77 87  97
## 8   8 18 28 38 48 58 68 78 88  98
## 9   9 19 29 39 49 59 69 79 89  99
## 10 10 20 30 40 50 60 70 80 90 100

unclass(z)

## $V1
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $V2
##  [1] 11 12 13 14 15 16 17 18 19 20
## 
## $V3
##  [1] 21 22 23 24 25 26 27 28 29 30
## 
## $V4
##  [1] 31 32 33 34 35 36 37 38 39 40
## 
## $V5
##  [1] 41 42 43 44 45 46 47 48 49 50
## 
## $V6
##  [1] 51 52 53 54 55 56 57 58 59 60
## 
## $V7
##  [1] 61 62 63 64 65 66 67 68 69 70
## 
## $V8
##  [1] 71 72 73 74 75 76 77 78 79 80
## 
## $V9
##  [1] 81 82 83 84 85 86 87 88 89 90
## 
## $V10
##  [1]  91  92  93  94  95  96  97  98  99 100
## 
## attr(,"row.names")
##  [1]  1  2  3  4  5  6  7  8  9 10

z$V1

##  [1]  1  2  3  4  5  6  7  8  9 10

4. ordered and unordered factors

Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia1 and their individual state of origin is specified by a character vector of state mnemonics as

state <-c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")

length(state)

## [1] 30

statef <-factor(state)


statef

##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa

levels(statef)

## [1] "act" "nsw" "nt"  "qld" "sa"  "tas" "vic" "wa"

The function tapply() and ragged arrays

incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)

incrmean <-tapply(incomes, statef, mean)

incrmean

##      act      nsw       nt      qld       sa      tas      vic       wa 
## 44.50000 57.33333 55.50000 53.60000 55.00000 60.50000 56.00000 52.25000

The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here incomes, defined by the levels of the second component, here statef as if they were separate vector structures.

stdError <- function(x) sqrt(var(x)/length(x))

incrster <-tapply(incomes, statef, stdError)

incrster

##      act      nsw       nt      qld       sa      tas      vic       wa 
## 1.500000 4.310195 4.500000 4.106093 2.738613 0.500000 5.244044 2.657536

** Ordered factors **

The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.

ordered(state)

##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act < nsw < nt < qld < sa < tas < vic < wa

ordered(state, c("wa", "vic", "tas","sa","qld", "nt","nsw", "act"))

##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: wa < vic < tas < sa < qld < nt < nsw < act

5. Array and matrices

Array
An array can be considered as a multiply subscripted collection of data entries, for example numeric. R allows simple facilities for creating and handling arrays, and in particular the special case of matrices.
A dimension vector is a vector of non-negative integers. If its length is k then the array is k-dimensional, e.g. a matrix is a 2-dimensional array. The dimensions are indexed from one up to the values given in the dimension vector.

z <-1:1500

dim(z) <- c(3,5,100)

class(z)

## [1] "array"

dim(z)

## [1]   3   5 100

z[,1,1]

## [1] 1 2 3

z[1,,1]

## [1]  1  4  7 10 13

z[1,1,]

##   [1]    1   16   31   46   61   76   91  106  121  136  151  166  181  196  211
##  [16]  226  241  256  271  286  301  316  331  346  361  376  391  406  421  436
##  [31]  451  466  481  496  511  526  541  556  571  586  601  616  631  646  661
##  [46]  676  691  706  721  736  751  766  781  796  811  826  841  856  871  886
##  [61]  901  916  931  946  961  976  991 1006 1021 1036 1051 1066 1081 1096 1111
##  [76] 1126 1141 1156 1171 1186 1201 1216 1231 1246 1261 1276 1291 1306 1321 1336
##  [91] 1351 1366 1381 1396 1411 1426 1441 1456 1471 1486

z[,,2]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   16   19   22   25   28
## [2,]   17   20   23   26   29
## [3,]   18   21   24   27   30

Index Matrics

x <- array(1:20, dim=c(4,5))

x

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20

i <- array(c(1:3,3:1), dim=c(3,2))

i

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    2
## [3,]    3    1

x[i]

## [1] 9 6 3

x[i]<-0

x

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    0   13   17
## [2,]    2    0   10   14   18
## [3,]    0    7   11   15   19
## [4,]    4    8   12   16   20

xb <-matrix(c(1:20),4,5)

xb

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20

The array() function

h <- 1:24

Z <- array(h, dim=c(3,4,2))

Z

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24

X <- array(1, c(3,4,2))

X

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1

D <- 2*Z*X+Z+1

D

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    4   13   22   31
## [2,]    7   16   25   34
## [3,]   10   19   28   37
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   40   49   58   67
## [2,]   43   52   61   70
## [3,]   46   55   64   73

DD <- aperm(D, c(2,1,3))

DD

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]   13   16   19
## [3,]   22   25   28
## [4,]   31   34   37
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   40   43   46
## [2,]   49   52   55
## [3,]   58   61   64
## [4,]   67   70   73

t(D[,,1])

##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]   13   16   19
## [3,]   22   25   28
## [4,]   31   34   37

t(D[,,2])

##      [,1] [,2] [,3]
## [1,]   40   43   46
## [2,]   49   52   55
## [3,]   58   61   64
## [4,]   67   70   73

Forming partitioned matrices, cbind() and rbind()

cX <- cbind(D[,,1],D[,,2])

cX

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    4   13   22   31   40   49   58   67
## [2,]    7   16   25   34   43   52   61   70
## [3,]   10   19   28   37   46   55   64   73

rX <- rbind(D[,, 1],D[,,2])

rX

##      [,1] [,2] [,3] [,4]
## [1,]    4   13   22   31
## [2,]    7   16   25   34
## [3,]   10   19   28   37
## [4,]   40   49   58   67
## [5,]   43   52   61   70
## [6,]   46   55   64   73

cbind(1, D[,,1], D[,,2])

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    1    4   13   22   31   40   49   58   67
## [2,]    1    7   16   25   34   43   52   61   70
## [3,]    1   10   19   28   37   46   55   64   73

The concatenation function, c(), with arrays

vec1 <-as.vector(1:10)

vec1

##  [1]  1  2  3  4  5  6  7  8  9 10

class(vec1)

## [1] "integer"

vec2 <-c(1:10)

vec2

##  [1]  1  2  3  4  5  6  7  8  9 10

class(vec2)

## [1] "integer"

Frequancy table from factors

statefr <- table(statef)

statefr

## statef
## act nsw  nt qld  sa tas vic  wa 
##   2   6   2   5   4   2   5   4

statefr <- tapply(statef, statef, length)

statefr

## act nsw  nt qld  sa tas vic  wa 
##   2   6   2   5   4   2   5   4

factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef

table(incomef,statef)

##          statef
## incomef   act nsw nt qld sa tas vic wa
##   (35,45]   1   1  0   1  0   0   1  0
##   (45,55]   1   1  1   1  2   0   1  3
##   (55,65]   0   3  1   3  2   2   2  1
##   (65,75]   0   1  0   0  0   0   1  0

6. List and Data Frames

List

Lst <- list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))

length(Lst)

## [1] 4

Lst

## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9

Lst$name

## [1] "Fred"

Lst$wife

## [1] "Mary"

Lst$no.children

## [1] 3

Lst$child.ages

## [1] 4 7 9

Lst[[1]]

## [1] "Fred"

Lst[[2]]

## [1] "Mary"

Lst[[3]]

## [1] 3

Lst[[4]]

## [1] 4 7 9

Child_name <-c("Eric", "Chile", "Mary")

Lst_new <- c(Lst, Child_name)

length(Lst_new)

## [1] 7

Lst_new

## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
## 
## [[5]]
## [1] "Eric"
## 
## [[6]]
## [1] "Chile"
## 
## [[7]]
## [1] "Mary"

Lst1 <-list(Child_name=Child_name)

Lst <- c(Lst, Lst1)

length(Lst)

## [1] 5

Lst

## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
## 
## $Child_name
## [1] "Eric"  "Chile" "Mary"

Data Frames

a data frame is a list with class “data.frame”. There are restrictions on lists that may be made into data frames, namely

The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.
Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.
Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same number of rows.

A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.

accountants <- data.frame(home=statef, loot=incomes, shot=incomef)

accountants

##    home loot    shot
## 1   tas   60 (55,65]
## 2    sa   49 (45,55]
## 3   qld   40 (35,45]
## 4   nsw   61 (55,65]
## 5   nsw   64 (55,65]
## 6    nt   60 (55,65]
## 7    wa   59 (55,65]
## 8    wa   54 (45,55]
## 9   qld   62 (55,65]
## 10  vic   69 (65,75]
## 11  nsw   70 (65,75]
## 12  vic   42 (35,45]
## 13  qld   56 (55,65]
## 14  qld   61 (55,65]
## 15   sa   61 (55,65]
## 16  tas   61 (55,65]
## 17   sa   58 (55,65]
## 18   nt   51 (45,55]
## 19   wa   48 (45,55]
## 20  vic   65 (55,65]
## 21  qld   49 (45,55]
## 22  nsw   49 (45,55]
## 23  nsw   41 (35,45]
## 24   wa   48 (45,55]
## 25   sa   52 (45,55]
## 26  act   46 (45,55]
## 27  nsw   59 (55,65]
## 28  vic   46 (45,55]
## 29  vic   58 (55,65]
## 30  act   43 (35,45]

attach(accountants)

home

##  [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  tas sa  nt  wa 
## [20] vic qld nsw nsw wa  sa  act nsw vic vic act
## Levels: act nsw nt qld sa tas vic wa

loot

##  [1] 60 49 40 61 64 60 59 54 62 69 70 42 56 61 61 61 58 51 48 65 49 49 41 48 52
## [26] 46 59 46 58 43

shot

##  [1] (55,65] (45,55] (35,45] (55,65] (55,65] (55,65] (55,65] (45,55] (55,65]
## [10] (65,75] (65,75] (35,45] (55,65] (55,65] (55,65] (55,65] (55,65] (45,55]
## [19] (45,55] (55,65] (45,55] (45,55] (35,45] (45,55] (45,55] (45,55] (55,65]
## [28] (45,55] (55,65] (35,45]
## Levels: (35,45] (45,55] (55,65] (65,75]

detach()

7. Reading data from files

Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. R input facilities are simple and their requirements are fairly strict and even rather inflexible. There is a clear presumption by the designers of R that you will be able to modify your input files using other tools,

The read.csv() function

To read an entire data frame directly, the external file will normally have a special form.

The first line of the file should have a name for each variable in the data frame.
Each additional line of the file has as its first item a row label and the values for each variable.

BankData <- read.csv("BANK1.csv")

BankData[1:5, ]

##   X Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 1 1        1       3        1      92     69   Male        1    No   32.0
## 2 2        2       1        1      81     57 Female        1    No   39.1
## 3 3        3       1        1      83     60 Female        0    No   33.2
## 4 4        4       2        1      87     55 Female        7    No   30.6
## 5 5        5       3        1      92     67   Male        0    No   29.0

BankData <- read.csv("BANK1.csv",  head=FALSE)

BankData[1:5, ]

##   V1       V2      V3       V4      V5     V6     V7       V8    V9    V10
## 1 NA Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary
## 2  1        1       3        1      92     69   Male        1    No     32
## 3  2        2       1        1      81     57 Female        1    No   39.1
## 4  3        3       1        1      83     60 Female        0    No   33.2
## 5  4        4       2        1      87     55 Female        7    No   30.6

Accessing builtin datasets

data()

data(infert)

Loading data from other R packages

data(package="rpart")

data(Puromycin, package="datasets")

attributes(Puromycin)

## $names
## [1] "conc"  "rate"  "state"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## 
## $class
## [1] "data.frame"
## 
## $reference
## [1] "A1.3, p. 269"

Puromycin

##    conc rate     state
## 1  0.02   76   treated
## 2  0.02   47   treated
## 3  0.06   97   treated
## 4  0.06  107   treated
## 5  0.11  123   treated
## 6  0.11  139   treated
## 7  0.22  159   treated
## 8  0.22  152   treated
## 9  0.56  191   treated
## 10 0.56  201   treated
## 11 1.10  207   treated
## 12 1.10  200   treated
## 13 0.02   67 untreated
## 14 0.02   51 untreated
## 15 0.06   84 untreated
## 16 0.06   86 untreated
## 17 0.11   98 untreated
## 18 0.11  115 untreated
## 19 0.22  131 untreated
## 20 0.22  124 untreated
## 21 0.56  144 untreated
## 22 0.56  158 untreated
## 23 1.10  160 untreated

edit(Puromycin)

##    conc rate     state
## 1  0.02   76   treated
## 2  0.02   47   treated
## 3  0.06   97   treated
## 4  0.06  107   treated
## 5  0.11  123   treated
## 6  0.11  139   treated
## 7  0.22  159   treated
## 8  0.22  152   treated
## 9  0.56  191   treated
## 10 0.56  201   treated
## 11 1.10  207   treated
## 12 1.10  200   treated
## 13 0.02   67 untreated
## 14 0.02   51 untreated
## 15 0.06   84 untreated
## 16 0.06   86 untreated
## 17 0.11   98 untreated
## 18 0.11  115 untreated
## 19 0.22  131 untreated
## 20 0.22  124 untreated
## 21 0.56  144 untreated
## 22 0.56  158 untreated
## 23 2.20  160 untreated

Packages

All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code.

library()

## Warning in library(): libraries '/usr/local/lib/R/site-library', '/usr/lib/R/
## site-library' contain no packages

library(boot) # to load a particular package

search() # to see which packages are currently loaded

##  [1] ".GlobalEnv"        "package:boot"      "package:stats"    
##  [4] "package:graphics"  "package:grDevices" "package:utils"    
##  [7] "package:datasets"  "package:methods"   "Autoloads"        
## [10] "package:base"

# Some packages may be loaded but not available on the search list, these will be included in the list given by

loadedNamespaces()

##  [1] "grDevices" "digest"    "R6"        "jsonlite"  "magrittr"  "evaluate" 
##  [7] "datasets"  "stringi"   "rlang"     "cachem"    "utils"     "cli"      
## [13] "jquerylib" "bslib"     "graphics"  "boot"      "rmarkdown" "base"     
## [19] "tools"     "stringr"   "xfun"      "yaml"      "fastmap"   "compiler" 
## [25] "stats"     "htmltools" "knitr"     "methods"   "sass"

Standard packages

The standard (or base) packages are considered part of the R source code. They contain the basic functions that allow R to work, and the datasets and standard statistical and graphical functions that are described in this manual. They should be automatically available in any R installation.

Contributed packages and CRAN

There are thousands of contributed packages for R, written by many different authors.
Some of these packages implement specialized statistical methods, others give access to data or hardware, and others are designed to complement textbooks.
Some (the recommended packages) are distributed with every binary distribution of R. Most are available for download from CRAN (https://CRAN.R-project.org/ and its mirrors) and other repositories such as Bioconductor (https://www.bioconductor.org/).
The R FAQ contains a list of CRAN packages current at the time of release, but the collection of available packages changes very frequently.

Namespace

Packages have namespaces, which do three things:

they allow the package writer to hide functions and data that are meant only for internal use,
they prevent functions from breaking when a user (or other package writer) picks a name that clashes with one in the package,
and they provide a way to refer to an object within a particular package.
The double-colon operator :: selects definitions from a particular namespace.