# Single Vector
<- "papaya"
fruit
fruit
[1] "papaya"
In the Chapter 3, we learned about different data types in R (numbers, text, logical values, etc.). Just as we organize physical items - R needs special containers to organize and store different types of data. These containers are called data structures.
Each data structure in R has specific characteristics:
Before exploring each data structure, we can use these functions to inspect any container:
length()
: Shows how many elements are storeddim()
: Shows the dimensions (rows and columns)str()
: Displays the structure and contents in detailAn atomic vector is the simplest data structure in R. Think of it as a single row or column that can hold only one type of data. To create vectors we can pass
# Single Vector
<- "papaya"
fruit
fruit
[1] "papaya"
To combine multiple values to an object, use c()
.
# Multivalue vector
<- c("papaya", "orange", "apple", "pineapple", "grape",
fruits "strawberries", "avocado")
fruits
[1] "papaya" "orange" "apple" "pineapple" "grape"
[6] "strawberries" "avocado"
To get the number of elements in an object use length()
.
length(fruit) # shows the number of elements in fruit
[1] 1
length(fruits) # shows the number of elements in fruits
[1] 7
Next we will see the different operations such as slicing, subsetting and so on which we can perform on vector objects.
Simple operations like addition, subtraction, multiplication, and other arithmetic are part of the operations that can be performed on vectors.
# Arithmetic Operation
<- 15
x <- 20
y + x y
[1] 35
%% x y
[1] 5
/ x y
[1] 1.333333
When the length of vectors is greater than one, element-wise operations can be performed on them. Each element in the longer object will be operated on by the short one. When we have objects with more than one element, the elements are operated on one-to-one if they are divisible without remainder. If the division is with remainder, we get a warning about the vectors not being multiples. Although this does not stop the operation, the operation gets recycled with the short-length object operating on the longer one with its first element. This phenomenon of starting the operation again in R is called vector recycling
.
# Vector recycling
<- 1:5 # Creates vector of 1, 2, 3, 4, 5
a <- 2
b <- 7:9 # Creates vector of 7, 8, 9
c
+ b a
[1] 3 4 5 6 7
/ b a
[1] 0.5 1.0 1.5 2.0 2.5
/ c a
Warning in a/c: longer object length is not a multiple of shorter object length
[1] 0.1428571 0.2500000 0.3333333 0.5714286 0.6250000
When operating on vectors of different lengths, R will “recycle” the shorter vector’s values to match the longer vector’s length. This can be useful but also dangerous if not used carefully!
To access a value in a vector we can get it via its index. Indexing allows us access or modify specific elements in different data structures. We use the squared-brackets [] to index the position of elements. To get the first element in the fruits
vector, papaya, we specify its index position, 1, within the squared brackets.
Vector | papaya | orange | apple | pineapple | grape | strawberries | avocado |
---|---|---|---|---|---|---|---|
Index number | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
fruits
[1] "papaya" "orange" "apple" "pineapple" "grape"
[6] "strawberries" "avocado"
1] fruits[
[1] "papaya"
To return multiple element we can include the elements index position within c()
and pass it to the squared brackets. You can return a ranged of elements using :
.
# remove elements by specific position
c(7, 5, 2)] fruits[
[1] "avocado" "grape" "orange"
# remove the range of elements that fall within that number.
2:5] fruits[
[1] "orange" "apple" "pineapple" "grape"
There are two ways to add new elements to a vector. First, use the append()
function
<- append(fruits, "banana") # adding new element
fruits length(fruits) # The number of elements have increased by one.
[1] 8
# The new elements fruits
[1] "papaya" "orange" "apple" "pineapple" "grape"
[6] "strawberries" "avocado" "banana"
Secondly, assign the new value to a new index number
fruit
[1] "papaya"
2] <- "mango" # add new element to a new index number
fruit[ fruit
[1] "papaya" "mango"
Vectors are altered using the index number of the element to be changed.
7] # Position to be changed fruits[
[1] "avocado"
7] <- "tomato" # Replace avocado with tomato
fruits[# new elements fruits
[1] "papaya" "orange" "apple" "pineapple" "grape"
[6] "strawberries" "tomato" "banana"
Subtract out the element to be removed from a vector with it’s index number.
-1] # Show values left after removing first value. fruits[
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
<- fruits[-1] # Reassign to variable to confirm the change. fruits
To confirm an object data structure, use their is.*()
variant. For vectors, is.vector()
, for matrix, is.matrix()
and so on. To convert from one structure to another, use their as.*()
variant.
A matrix is a two-dimensional data structure that holds elements of the same type arranged in rows and columns. Think of it as a table with uniform data type. To create a matrix, use the matrix()
function. Within the function, specify the number of rows, nrow
or number of columns ncol
<- matrix(1:6, nrow = 2)
mat mat
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
The number in the matrix get filled by columns. To change this arrangement, set the byrow
to TRUE
matrix(1:6, nrow = 2, byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
We can confirm if the object is a matrix by using is.matrix()
is.matrix(mat) # confirm object is a matrix
[1] TRUE
You can make a matrix from a vector by passing it to the matrix()
function.
<- matrix(fruits, nrow = 4) # make matrix from fruits vector fruit_mat
Warning in matrix(fruits, nrow = 4): data length [7] is not a sub-multiple or
multiple of the number of rows [4]
fruit_mat
[,1] [,2]
[1,] "orange" "strawberries"
[2,] "apple" "tomato"
[3,] "pineapple" "banana"
[4,] "grape" "orange"
To get the dimension of matrix use dim()
function.
dim(fruit_mat)
[1] 4 2
The result [1] 4 2
is interpreted 4 by 2, i.e. four rows and two column. That matrix is thereby called a four by two matrix. Check Figure 5.1 to see an example.
For example, Figure 5.1 shows indexing for a simple 3 * 2 matrix.
Similar to vectors, matrix can perform arithmetic operations. For arithmetic with scalar vector, the operation is carried on each element of the matrix.
<- matrix(1:6, nrow = 3)
my_mat my_mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
* 5 my_mat
[,1] [,2]
[1,] 5 20
[2,] 10 25
[3,] 15 30
- 15 my_mat
[,1] [,2]
[1,] -14 -11
[2,] -13 -10
[3,] -12 -9
<- matrix(2:7, nrow = 3)
my_mat2 / my_mat my_mat2
[,1] [,2]
[1,] 2.000000 1.250000
[2,] 1.500000 1.200000
[3,] 1.333333 1.166667
Matrix with unequal dimension cannot be added together
<- matrix(7:15, nrow = 3)
my_mat3 my_mat3
[,1] [,2] [,3]
[1,] 7 10 13
[2,] 8 11 14
[3,] 9 12 15
+ my_mat my_mat3
Error in my_mat3 + my_mat: non-conformable arrays
A property of matrix is being two dimensional. From the result printed above, we have [,1] and [1,] signifying rows and columns. These are the column and row indices of a matrix and they can replaced with names of our choosing. Using the colnames()
and rownames()
functions, the names of columns and rows of a matrix can be changed respectively.
# represent column names with values a and b
colnames(fruit_mat) <- letters[1:2]
fruit_mat
a b
[1,] "orange" "strawberries"
[2,] "apple" "tomato"
[3,] "pineapple" "banana"
[4,] "grape" "orange"
# represent the row names with uppercase A, B, C, and D
rownames(fruit_mat) <- LETTERS[1:4]
fruit_mat
a b
A "orange" "strawberries"
B "apple" "tomato"
C "pineapple" "banana"
D "grape" "orange"
Like with vectors, you can access any particular element in a matrix using the squared brackets, []. Since matrix is two dimensional, there’s a little adjustment in how we access elements as we have to specify rows and columns. The syntax is [row_index, column_index]. For example, let’s access the third row and second column element of fruit_mat
.
fruit_mat
a b
A "orange" "strawberries"
B "apple" "tomato"
C "pineapple" "banana"
D "grape" "orange"
3, 2] fruit_mat[
[1] "banana"
We can also access more than one element of a particular axis (row and column) by passing in a vector.
c(1, 3, 2), 1] # This returns row 1, 3, and 2 elements of column 1 fruit_mat[
A C B
"orange" "pineapple" "apple"
2, c(1, 2)] fruit_mat[
a b
"apple" "tomato"
You can also use :
within an axis to access elements within a range
2:4, c(1, 2)] # returns row 2, 3, and 4 element of column 1 and 2. fruit_mat[
a b
B "apple" "tomato"
C "pineapple" "banana"
D "grape" "orange"
To return all the rows of a particular column, leave the row space within the squared bracket empty, and specify the index of the column you want to return
1] fruit_mat[,
A B C D
"orange" "apple" "pineapple" "grape"
2] fruit_mat[,
A B C D
"strawberries" "tomato" "banana" "orange"
To return all the columns of a particular row, the column space is left empty.
3:4, ] # returns element of row 3 and 4 for all the columns. fruit_mat[
a b
C "pineapple" "banana"
D "grape" "orange"
You can also access an element using the row and column names.
"A", "a"] fruit_mat[
[1] "orange"
To add new columns to a matrix use cbind()
and to add rows use rbind()
. We’ll create a new matrix of fruits to show this .
<- matrix(c("raspberry", "blue berries",
new_fruit_list "kiwi", "clementine"),
nrow = 4,
dimnames = list(letters[1:4], "C"))
new_fruit_list
C
a "raspberry"
b "blue berries"
c "kiwi"
d "clementine"
<- cbind(fruit_mat, new_fruit_list)
fruit_list fruit_list
a b C
A "orange" "strawberries" "raspberry"
B "apple" "tomato" "blue berries"
C "pineapple" "banana" "kiwi"
D "grape" "orange" "clementine"
Ensure the number of rows of the matrices to be joined matches when performing a column bind and the number of columns matches when performing a row bind.
rbind(fruit_list, c("avocado", "pear", "lemon"))
a b C
A "orange" "strawberries" "raspberry"
B "apple" "tomato" "blue berries"
C "pineapple" "banana" "kiwi"
D "grape" "orange" "clementine"
"avocado" "pear" "lemon"
Transposing matrix is done using t()
. This flips row elements to columns and column elements to rows.
<- t(fruit_mat)
fruit_mat_transposed fruit_mat_transposed
A B C D
a "orange" "apple" "pineapple" "grape"
b "strawberries" "tomato" "banana" "orange"
dim(fruit_mat_transposed)
[1] 2 4
Data frames are two-dimensional like matrix and are the standard way to store data. They are similar to Excel spreadsheets. What makes them different from matrix is their ability to store different data types. To create a data frame, use the data.frame()
function. Below is a simple data frame that stores the inventory of a fruit in a store.
<- data.frame(
fruit_inventory type = c("pineapple", "mango", "apple"), # character data type
stock = c(5, 3, 0), # double data type
available = c(TRUE, TRUE, FALSE) # logical data type
)
fruit_inventory
type stock available
1 pineapple 5 TRUE
2 mango 3 TRUE
3 apple 0 FALSE
Data frames can combine vectors into a table where each vector becomes a column. The vectors to be used have to be of the same length to avoid error:
<- c("orange", "mango", "apple")
fruit <- c(5, 3, 0)
stock <- c(TRUE, TRUE, FALSE)
available
<- data.frame(fruit, stock, available)
fruit_tbl fruit_tbl
fruit stock available
1 orange 5 TRUE
2 mango 3 TRUE
3 apple 0 FALSE
Elements in data.frame are accessed in a similar fashion as matrix. Data frames also uses the dollar sign, $
to access columns.
$stock fruit_tbl
[1] 5 3 0
$available fruit_tbl
[1] TRUE TRUE FALSE
You can apply operations you want on each variable this way. For example we can check the total stocks available.
sum(fruit_tbl$stock)
[1] 8
Using the square brackets, and index number of the column we want to remove, we can remove old columns or add new columns to a data frame.
# Removing a Column
-1] fruit_tbl[
stock available
1 5 TRUE
2 3 TRUE
3 0 FALSE
# Adding a column
"location"] <- c("online", "on site", "online")
fruit_tbl[ fruit_tbl
fruit stock available location
1 orange 5 TRUE online
2 mango 3 TRUE on site
3 apple 0 FALSE online
A list is a versatile data structure that can store elements of different types and sizes - including other data structures like vectors, matrices, and even other lists. Think of it like a container that can hold different kinds of boxes.
# Basic list creation
<- list(
my_list
fruits,
fruit_mat,c(TRUE, FALSE),
FALSE,
fruit_mat_transposed,
fruit_tbl
)
my_list
[[1]]
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
[[2]]
a b
A "orange" "strawberries"
B "apple" "tomato"
C "pineapple" "banana"
D "grape" "orange"
[[3]]
[1] TRUE FALSE
[[4]]
[1] FALSE
[[5]]
A B C D
a "orange" "apple" "pineapple" "grape"
b "strawberries" "tomato" "banana" "orange"
[[6]]
fruit stock available location
1 orange 5 TRUE online
2 mango 3 TRUE on site
3 apple 0 FALSE online
The objects in a list can be given a name using names()
function.
names(my_list) <- c("fruits", "fruit_mat", "logical_1", "logical_2",
"fruit_transposed_matrix", "fruit_dataframe")
On printing list now, we’ll see each object named.
my_list
$fruits
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
$fruit_mat
a b
A "orange" "strawberries"
B "apple" "tomato"
C "pineapple" "banana"
D "grape" "orange"
$logical_1
[1] TRUE FALSE
$logical_2
[1] FALSE
$fruit_transposed_matrix
A B C D
a "orange" "apple" "pineapple" "grape"
b "strawberries" "tomato" "banana" "orange"
$fruit_dataframe
fruit stock available location
1 orange 5 TRUE online
2 mango 3 TRUE on site
3 apple 0 FALSE online
The squared-bracket, [] is used to select elements in a list but to print the items in the object you use [[]]. Like data.frame you can also use the dollar sign, $,
1] my_list[
$fruits
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
Notice the difference when we use [[]]. The result is similar to using $ to access the objects within the list
1]] my_list[[
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
$fruits # similar to [1] my_list
[1] "orange" "apple" "pineapple" "grape" "strawberries"
[6] "tomato" "banana"
To access the item in each object add a square bracket in front.
1]][3] my_list[[
[1] "pineapple"
In this chapter you learned about data structures in R. You saw how to create each data structure, how to access the elements of each structure. Also, you learned how to add and remove items from each data structure. You got introduced to checking the properties of each data structure such as their length and dimensions. Next you will learn about packages in R and how to install them, after you will bring together the knowledge you’ve gathered from chapter one into making R Scripts and R projects.