4  Functions

This section is under review.

In the previous chapters, functions such as c(), as.factors(), and class() to mention a few were used. But what exactly are functions, and why are they important? Functions in R are self-contained blocks of code designed to perform specific tasks.

Think of a function like a kitchen knife—its purpose is to cut, and how you use it depends on the task at hand. For example, the function c() stands for “concatenate” or “combine,” and its role is to combine objects into a list or vector.

We use functions because they simplify our work and make analyses more efficient. If we make mistakes, functions allow for easy corrections and re-use. In R, function names are usually verbs, representing actions to be taken. Functions are named objects followed by parentheses. Inside these parentheses are arguments, which are the values passed to the function to control how it operates and the type of result it returns.

Just like how a knife can cut onions in different ways—rings, half-moons, juliennes, dice, or rough chops—arguments allow us to specify the exact type of result we want from a function. Each variation produces a different output, but the underlying tool (the function) remains the same.

As an example, let’s take an hypothetical tomato nursery experiment. In this experiment, we get some tomato fruit and measure their diameter. To record the diameter, we use the concatenation, c() function.

tomato_diameter <- c(43.68, 70.23, 29.31, 83.08, 27.42, 53.50, 30.95, 10.51, 41.41, 68.06)

There are two ways to view the diameter collected. The first is using the print() function

print(tomato_diameter)
 [1] 43.68 70.23 29.31 83.08 27.42 53.50 30.95 10.51 41.41 68.06

The other option is by calling the object. This automatically print the result. It’s more common you see people do this than using the print() function, although, print() is still used a lot.

tomato_diameter
 [1] 43.68 70.23 29.31 83.08 27.42 53.50 30.95 10.51 41.41 68.06

4.1 Some Built-in Functions

There are a lot of functions readily loaded in R to equip you for the analysis tasks you want to undertake. For tasks that R built-in function can’t handle, you can install packages (covered in Chapter 7) or build your own, covered in 4.2. The various built-in functions are designed to perform specific tasks on the various data types. There are some designed to work with files and folders as well. For examples, you may want to know the total number of values we have in our record, that is the count. You can do this with length().

length(tomato_diameter)
[1] 10

To find the average of the records of tomato_diameter use mean().

mean(x = tomato_diameter)
[1] 45.815

The above can also be written as

`mean(tomato_diamter)`

x here is the keyword argument that the function takes and in this case is the data tomato_diameter. To know more about any function use the function help() or type a question mark followed by the question name. So we have help(mean) or ?mean.

To check the median of the data use (you guessed it right) median(). We also have sd(), var(), and cor() for estimating the standard deviation, variance and correlation respectively.

median(tomato_diameter)
[1] 42.545
sd(tomato_diameter)
[1] 22.70789
var(tomato_diameter)
[1] 515.6485
cor(tomato_diameter, 1:10)
[1] -0.1669699

Using functions such as round() you can round off the values of the recorded tomatoes diameter to the nearest whole number.

round(tomato_diameter)
 [1] 44 70 29 83 27 54 31 11 41 68

To specify the decimal place to round to, pass a number, which is ideally an integer or whole number to the argument digits.

round(tomato_diameter, digits = 1)
 [1] 43.7 70.2 29.3 83.1 27.4 53.5 31.0 10.5 41.4 68.1
# Also the same as
# round(tomato_diameter, 1)

To round up in a specific direction, either up or down, we have two variation of round(), ceiling(), to round up and floor() to round down.

floor(3.544)
[1] 3
ceiling(3.544)
[1] 4

To create a sequence of number use the colon (:) symbol and the seq() function. seq() gives more control on the sequence you want to generate. Important arguments of seq to remember are from, to, and by.

seq(from = 1, to = 100, by = 1)
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

The above is also similar to the code below, using position of the arguments instead of writing the argument out. Writing in such a manner is common in the R community.

seq(1, 100, 1)

With the argument, by, we can create even and odd numbers easily.

even_num <- seq(2, 100, 2)
even_num
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
[20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
[39]  78  80  82  84  86  88  90  92  94  96  98 100

R also comes with built-in constants such as pi

pi
[1] 3.141593

4.1.1 Making Graphs with the plot() Function

To see a graph of the tomato_diameter object, pass it to plot() function as the first argument x. x stands for x-axis, and a scattered plot graph is produced as shown in Figure 4.1.

plot(x = tomato_diameter)
Figure 4.1: Graph generated for tomato diameters using plot

The graph can be customized by passing values into other arguments within the plot() function. Arguments such as col changes the color while pch changes the shape of each point, ?fig-plot-cust. You can also use hist() to see the distribution of the data, Figure 4.2.

#| label: fig-plot-cust
#| fig-cap: |
#|    Simple plot customization: point shape changed to cross and color changed to red


plot(tomato_diameter, col = "red", pch = 3)

hist(tomato_diameter, col = "coral3", xlab = "Tomato Diameter", main = "Distribution of Tomato Plant Diameter")
Figure 4.2: Histogram of Tomato Plant Diameters (mm) showing a normal distribution

The xlab argument is used to customize the label of the x-axis, while the y-axis is customized with ylab. The argument main, is used to change the title of the graph. More on how to make visualization with R will be discussed in Chapter 11 and Chapter 12.

The code used to produce the graph is clear, but as you write more complex codes, readability reduces. The code could improve by writing one or two arguments in a line rather than all in a line. The code above can also be written as:

hist(
  tomato_diameter,
  col = "coral3",
  xlab = "Tomato Diameter",
  main = "Distribution of Tomato Plant Diameter"
)

4.1.2 Simulating Numbers

An important part of research is data simulation and R provides a robust set of functions for generating random numbers from different probability distributions. For example, the runif() function generates random numbers from a uniform distribution, rnorm() for the normal distribution, rpois() for the Poisson distribution, and rbinom() for the binomial distribution. These functions belong to a family that also includes functions for calculating density (d), distribution functions (p), quantiles (q), and random deviates (r) for various statistical distributions. A comprehensive list of these functions can be found in Table 4.1.

For example, to simulate the height of 100 Tectona grandis in a plantation with mean diameter of 35 cm and a deviation of 2.3 cm all of the same age use rnorm(). In forestry, the distribution of trees diameter in a plantation is usually a bell-shaped curve. With some codes you will generate such curve as Figure 4.3 (b) while the distribution of the diameters represented as a histogram can be seen in Figure 4.3 (a).

teak_diameter <- rnorm(100, mean = 35, sd = 2.3)
teak_diameter
  [1] 37.71687 37.96147 36.25440 33.98531 32.20917 35.04696 35.13248 36.61667
  [9] 31.68080 35.16442 38.63116 32.75703 35.73393 32.11662 37.58475 33.39289
 [17] 32.46347 36.65268 33.55001 36.02513 34.90087 32.37007 33.54168 36.78130
 [25] 33.38721 37.30889 34.25650 33.32478 35.00257 33.90966 36.80041 35.88863
 [33] 35.81288 32.47677 32.03593 38.21180 36.03176 37.05294 36.08896 35.60389
 [41] 34.43017 33.06870 36.30671 32.26768 32.62285 32.53437 34.10586 38.21818
 [49] 32.62178 31.41916 36.58848 34.35291 38.21399 34.58229 33.39589 39.43068
 [57] 33.79520 33.16219 34.05911 36.11644 36.69954 37.75015 34.59669 33.08292
 [65] 38.98542 34.77281 34.23120 35.36860 32.90113 33.05766 38.78572 37.90296
 [73] 32.12923 33.71422 34.85916 37.44048 32.21309 35.83139 32.98868 36.35195
 [81] 30.82363 32.40731 33.24594 36.90119 36.52155 34.36964 40.56553 36.35661
 [89] 37.09842 35.84358 35.88458 36.78003 32.77127 35.03908 33.46750 36.54347
 [97] 37.48490 32.17271 33.56138 34.57702

The numbers generated above will be different from that which you will produce and may never get the exact thing. This is because they are random numbers. The beauty of R is its reproducibility. Using the function set.seed(), you can capture or get a snapshot of specific number random numbers, a pseudorandom number which makes your work replicable. Given that the seed number is the same, whatever pseudorandom numbers are produce can be replicated by another person.

set.seed(123)
tree_diameter <- rnorm(100, mean = 25, sd = 12.3)
tree_diameter
  [1] 18.106150 22.168817 44.172112 25.867253 26.590239 46.095299 30.669269
  [8]  9.439747 16.551710 19.518358 40.056206 29.425710 29.929489 26.361397
 [15] 18.163154 46.979032 31.123561  0.810609 33.626678 19.184666 11.865768
 [22] 22.318909 12.380145 16.034638 17.312017  4.253672 35.304781 26.886489
 [29] 11.000916 40.421924 30.245510 21.370621 36.010046 35.801042 35.105447
 [36] 33.470275 31.813187 24.238486 21.236659 20.320207 16.455104 22.442617
 [43]  9.435625 51.678158 39.857933 11.185764 20.044517 19.260139 34.593571
 [50] 23.974560 28.115818 24.648875 24.472693 41.833808 22.223017 43.652588
 [57]  5.950341 32.190749 26.523407 27.656081 29.669566 18.821422 20.901549
 [64] 12.471523 11.816968 28.733402 30.512980 25.651952 36.343890 50.216042
 [71] 18.960317 -3.402777 37.370584 16.276831 16.537494 37.614528 21.497292
 [78]  9.985172 27.230033 23.291636 25.070899 29.738949 20.440882 32.925832
 [85] 22.288015 29.080918 38.491120 30.352732 20.991041 39.130334 37.220097
 [92] 31.745283 27.936400 17.276755 41.736025 17.616807 51.904196 43.851111
 [99] 22.100886 12.375023
(a)
(b)
Figure 4.3: Histogram of simulated tree diameter distribution

runif() is used to for generating number of a uniform distribution, Figure 4.4.

set.seed(123)
dt <- runif(100, min = 10, max = 20)
dt
  [1] 12.87578 17.88305 14.08977 18.83017 19.40467 10.45556 15.28105 18.92419
  [9] 15.51435 14.56615 19.56833 14.53334 16.77571 15.72633 11.02925 18.99825
 [17] 12.46088 10.42060 13.27921 19.54504 18.89539 16.92803 16.40507 19.94270
 [25] 16.55706 17.08530 15.44066 15.94142 12.89160 11.47114 19.63024 19.02299
 [33] 16.90705 17.95467 10.24614 14.77796 17.58460 12.16408 13.18181 12.31626
 [41] 11.42800 14.14546 14.13724 13.68845 11.52445 11.38806 12.33034 14.65962
 [49] 12.65973 18.57828 10.45831 14.42200 17.98925 11.21899 15.60948 12.06531
 [57] 11.27532 17.53308 18.95045 13.74463 16.65115 10.94841 13.83970 12.74384
 [65] 18.14640 14.48516 18.10064 18.12390 17.94342 14.39832 17.54475 16.29221
 [73] 17.10182 10.00625 14.75317 12.20119 13.79817 16.12771 13.51798 11.11135
 [81] 12.43619 16.68056 14.17647 17.88196 11.02865 14.34893 19.84957 18.93051
 [89] 18.86469 11.75053 11.30696 16.53102 13.43516 16.56758 13.20373 11.87691
 [97] 17.82294 10.93595 14.66779 15.11505
Figure 4.4: Uniformly distributed randomly generated numbers.

Other functions for generating random numbers according to distribution are:

Table 4.1: Functions for generating random numbers of data in R
Function Distribution Description
runif Uniform distribution Generates random numbers from a uniform distribution
rnorm Normal distribution Generates random numbers from a normal distribution
rpois Poisson distribution Generates random numbers from a poisson distribution
rbinom Binomial distribution Generates random numbers from a binomial distribution
dunif Uniform distribution Computes the density of a uniform distribution
dnorm Normal distribution Computes the density of a normal distribution
dpois Poisson distribution Computes the density of a poisson distribution
dbinom Binomial distribution Computes the density of a binomial distribution
punif Uniform distribution Computes the cumulative distribution function (CDF) of a uniform distribution
pnorm Normal distribution Computes the cumulative distribution function (CDF) of a normal distribution
ppois Poisson distribution Computes the cumulative distribution function (CDF) of a poisson distribution
pbinom Binomial distribution Computes the cumulative distribution function (CDF) of a binomial distribution
qunif Uniform distribution Computes the quantiles of a uniform distribution
qnorm Normal distribution Computes the quantiles of a normal distribution
qpois Poisson distribution Computes the quantiles of a poisson distribution
qbinom Binomial distribution Computes the quantiles of a binomial distribution

4.1.3 Sampling with sample()

Another important function for randomization is using sample(). sample() is used for selecting values at random from a set of data. Let’s take the example below:

new_dt <- seq(1, 20, 2)
new_dt
 [1]  1  3  5  7  9 11 13 15 17 19

We can randomly select five values from new_dt using sample()

set.seed(10)
sample(new_dt, 5)
[1] 17 13 15 11  5

4.1.3.1 Weighted Samples and Sampling with Replacement

More than selecting values at random, with sample() we can also select with replacement by setting the value of the argument replace to TRUE.

sample(new_dt, 20, replace = TRUE)
 [1] 15 19 13 19  3 15 15 13 11 13 11  3  9 17  3 19  9 19  1 13

The probabilities of the values we want to randomly select can also be determined by the prob. This is perfect for weighted probability. Example is a die of unequal prob.

die <- 1:6
die_prob <- c(0.23, 0.23, 0.23, 0.21, 0.07, 0.03)
set.seed(123)
die_selected <- sample(x = die, size = 100, replace = TRUE, prob = die_prob)
die_selected
  [1] 1 4 1 4 5 2 3 4 3 1 5 1 3 3 2 4 1 2 1 5 4 4 3 6 3 4 3 3 1 2 5 5 4 4 2 3 4
 [38] 2 1 1 2 1 1 1 2 2 1 3 1 4 2 1 4 2 3 2 2 4 4 1 3 2 1 1 4 1 4 4 4 1 4 3 4 2
 [75] 3 2 1 3 1 2 1 3 1 4 2 1 6 4 4 2 2 3 1 3 1 2 4 2 3 3

To get the number of occurrence of each die face we can use the table() function. and notice that values with lower probability of occuring such as 5 and 6 were sampled less.

table(die_selected)
die_selected
 1  2  3  4  5  6 
27 22 20 24  5  2 

4.1.4 String Functions

Some functions in R makes it easy to work with strings. Let’s create a simple string vector to begin.

tree_1 <- " Adansonia digitata "
tree_1
[1] " Adansonia digitata "

We can count the number of characters in the object using nchar(). Know that nchar() also includes the spaces in the text.

nchar(tree_1)
[1] 20

The data can be changed to lower cases by using tolower, and changed to upper cases by using toupper. Unfortunately, Base R does not provide a function to changed strings to title or sentence cases, but there are open source packages available to do this . You can explore stringr by Hadley Wickham and stringi by Marek Gagolewski which contains functions for such operation and other complex string manipulation operations in R.

tolower(tree_1)
[1] " adansonia digitata "
toupper(tree_1)
[1] " ADANSONIA DIGITATA "

With trimws() you can remove the white spaces around your strings. As the name sound trimws()trim white spaces

tree_1 <- trimws(tree_1)
tree_1
[1] "Adansonia digitata"

After trimming the white spaces, the number of characters would reduce.

nchar(tree_1)
[1] 18

To extract certain aspect of your text, use substring(), or substr(). substr() takes the argument x which your data, start which is the integer for the first element of the text to be extracted or replaced, and stop which is the integer for the last element of the text to be extracted or replaced. For example, we can extract Adansonia from the text with the following code.

substr(tree_1, start = 1, stop = 9)
[1] "Adansonia"

substr() is also used to replace strings in data.

substr(tree_1, start = 11, nchar(tree_1)) <- "gregorii"
tree_1
[1] "Adansonia gregorii"

There are times, we would like to split text data, using strsplit(), we can split the tree_1 strings to genus and species with the spaces between them.

strsplit(tree_1, split = " ")
[[1]]
[1] "Adansonia" "gregorii" 

Notice the output, both words now have their inverted comma.

There are still more functions for string manipulation in R, not all will be covered, as this will be a large book volume. However, Concatenation will be covered. To concatenate data using either paste0() or paste(). There are almost similar having slight differences.

paste(tree_1, " man")
[1] "Adansonia gregorii  man"
paste("My best tree species is ", tree_1)
[1] "My best tree species is  Adansonia gregorii"

4.2 Custom Functions

The functions that are preloaded in R are a lot, and they will likely not meet all our statistical or operational needs. For some tasks you want to do, you would need to write your own function.

4.2.1 Creating Custom Functions

For example we can estimate the z-score of some the data we have created so far. The z-score tells you how many standard deviations an element is from the mean of the vector. It transforms our data into a common scale. The formula for z-score is given as:

\[Z = \frac{x - \mu}{\sigma}\] Where:

  • \(x\) is the value in the vector,
  • \(\mu\) is the mean of the vector,
  • \(\sigma\) is the standard deviation of the vector.

Without a custom function we will do the following:

average_tomato <- mean(tomato_diameter)
sd_tomato <- sd(tomato_diameter)
z_tomato <- (tomato_diameter - average_tomato) / sd_tomato

average_tree_diameter <- mean(tree_diameter)
sd_tree_diameter <- sd(tree_diameter)
z_tree_diameter <- (tree_diameter - average_tree_diameter) / sd_tree_diameter

average_dt <-mean(dt)
sd_tomato <- sd(dt)
z_dt <- (dt - average_dt) / sd_tomato

We got the following results

z_tomato
 [1] -0.09402017  1.07517674 -0.72683973  1.64105924 -0.81007070  0.33842856
 [7] -0.65461816 -1.55474564 -0.19398540  0.97961526
z_tree_diameter
  [1] -0.71304802 -0.35120270  1.60854170 -0.02179795  0.04259548  1.77983218
  [7]  0.40589817 -1.48492941 -0.85149566 -0.58726835  1.24195461  0.29513939
 [13]  0.34000892  0.02221347 -0.70797086  1.85854263  0.44636008 -2.25349176
 [19]  0.66930255 -0.61698896 -1.26885349 -0.33783464 -1.22304003 -0.89754917
 [25] -0.78377819 -1.94683206  0.81876439  0.06898128 -1.34588242  1.27452758
 [31]  0.36815564 -0.42229479  0.88157948  0.86296437  0.80101058  0.65537241
 [37]  0.50778230 -0.16686565 -0.43422620 -0.51585092 -0.86009994 -0.32681639
 [43] -1.48529653  2.27707482  1.22429519 -1.32941869 -0.54040552 -0.61026684
 [49]  0.75541982 -0.19037243  0.17847258 -0.13031397 -0.14600575  1.40027842
 [55] -0.34637532  1.56226982 -1.79571669  0.54141021  0.03664303  0.13752572
 [61]  0.31685861 -0.64934164 -0.46407310 -1.21490140 -1.27319995  0.23347834
 [67]  0.39197814 -0.04097396  0.91131364  2.14685001 -0.63697081 -2.62876100
 [73]  1.00275711 -0.87597805 -0.85276181  1.02448422 -0.41101270 -1.43635058
 [79]  0.09957931 -0.25119772 -0.09272595  0.32303830 -0.50510289  0.60688103
 [85] -0.34058618  0.26443017  1.10255872  0.37770550 -0.45610238  1.15949090
 [91]  0.98935390  0.50173432  0.16249260 -0.78691881  1.39156928 -0.75663177
 [97]  2.29720706  1.57995139 -0.35725306 -1.22349625
z_dt
  [1] -0.74029816  1.01667000 -0.31432828  1.34899927  1.55058114 -1.58950882
  [7]  0.10367363  1.38198803  0.18553298 -0.14717528  1.60800687 -0.15868628
 [13]  0.62812145  0.25991452 -1.38821361  1.40797416 -0.88587877 -1.60177908
 [19] -0.59874073  1.59983236  1.37188355  0.68157065  0.49807079  1.73936495
 [25]  0.55140145  0.73675424  0.15967644  0.33538461 -0.73474643 -1.23316204
 [31]  1.62972964  1.41665527  0.67420868  1.04180123 -1.66299360 -0.07285392
 [37]  0.91194687 -0.99002015 -0.63291572 -0.93662330 -1.24829781 -0.29478615
 [43] -0.29767044 -0.45514279 -1.21445611 -1.26231194 -0.93168176 -0.11437574
 [49] -0.81610602  1.26061293 -1.58854506 -0.19775388  1.05393276 -1.32163504
 [55]  0.21891237 -1.02467527 -1.30187194  0.89387052  1.39120332 -0.43543256
 [61]  0.58441742 -1.41657907 -0.40207459 -0.78659323  1.10907471 -0.17559117
 [67]  1.09301940  1.10117798  1.03785346 -0.20606415  0.89796636  0.45847125
 [73]  0.74255060 -1.74716662 -0.08155371 -0.97699906 -0.41664711  0.40075054
 [79] -0.51495972 -1.35940351 -0.89453948  0.59473476 -0.28390722  1.01628648
 [85] -1.38842427 -0.22339407  1.70668793  1.38420585  1.36111055 -1.13512882
 [91] -1.29076986  0.54226490 -0.54401788  0.55509389 -0.62522354 -1.09078258
 [97]  0.99557901 -1.42094993 -0.11151046  0.04542695

If we look closely, we can spot an error while computing z_dt. The following steps can be done easily with a custom function. To create a custom function we need to solidify our understanding of functions. In R, every functions have three basic parts; a name, a body, and set of arguments. To create functions in R we call function() function followed by {}.

new_function <- function() {}

The name of the arguments is passed into the parenthesis of function() and the body, or expresions are passed into the curly brackets.

z_score <- function(x) {
  average_x <- mean(x, na.rm = TRUE)
  sd_x <- sd(x, na.rm = TRUE)
  z_value = (x - average_x)/sd_x
  return(z_value)
}

Next is using the custom function. We should note that, custom functions are called in a similar way that R’s built-in functions are called, the object name followed by parenthesis.

z_tomato_2 <- z_score(tomato_diameter)
z_tomato_2
 [1] -0.09402017  1.07517674 -0.72683973  1.64105924 -0.81007070  0.33842856
 [7] -0.65461816 -1.55474564 -0.19398540  0.97961526
z_tree_diameter_2 <- z_score(tree_diameter)
z_tree_diameter_2
  [1] -0.71304802 -0.35120270  1.60854170 -0.02179795  0.04259548  1.77983218
  [7]  0.40589817 -1.48492941 -0.85149566 -0.58726835  1.24195461  0.29513939
 [13]  0.34000892  0.02221347 -0.70797086  1.85854263  0.44636008 -2.25349176
 [19]  0.66930255 -0.61698896 -1.26885349 -0.33783464 -1.22304003 -0.89754917
 [25] -0.78377819 -1.94683206  0.81876439  0.06898128 -1.34588242  1.27452758
 [31]  0.36815564 -0.42229479  0.88157948  0.86296437  0.80101058  0.65537241
 [37]  0.50778230 -0.16686565 -0.43422620 -0.51585092 -0.86009994 -0.32681639
 [43] -1.48529653  2.27707482  1.22429519 -1.32941869 -0.54040552 -0.61026684
 [49]  0.75541982 -0.19037243  0.17847258 -0.13031397 -0.14600575  1.40027842
 [55] -0.34637532  1.56226982 -1.79571669  0.54141021  0.03664303  0.13752572
 [61]  0.31685861 -0.64934164 -0.46407310 -1.21490140 -1.27319995  0.23347834
 [67]  0.39197814 -0.04097396  0.91131364  2.14685001 -0.63697081 -2.62876100
 [73]  1.00275711 -0.87597805 -0.85276181  1.02448422 -0.41101270 -1.43635058
 [79]  0.09957931 -0.25119772 -0.09272595  0.32303830 -0.50510289  0.60688103
 [85] -0.34058618  0.26443017  1.10255872  0.37770550 -0.45610238  1.15949090
 [91]  0.98935390  0.50173432  0.16249260 -0.78691881  1.39156928 -0.75663177
 [97]  2.29720706  1.57995139 -0.35725306 -1.22349625
z_dt_2 <- z_score(dt)
z_dt_2
  [1] -0.74029816  1.01667000 -0.31432828  1.34899927  1.55058114 -1.58950882
  [7]  0.10367363  1.38198803  0.18553298 -0.14717528  1.60800687 -0.15868628
 [13]  0.62812145  0.25991452 -1.38821361  1.40797416 -0.88587877 -1.60177908
 [19] -0.59874073  1.59983236  1.37188355  0.68157065  0.49807079  1.73936495
 [25]  0.55140145  0.73675424  0.15967644  0.33538461 -0.73474643 -1.23316204
 [31]  1.62972964  1.41665527  0.67420868  1.04180123 -1.66299360 -0.07285392
 [37]  0.91194687 -0.99002015 -0.63291572 -0.93662330 -1.24829781 -0.29478615
 [43] -0.29767044 -0.45514279 -1.21445611 -1.26231194 -0.93168176 -0.11437574
 [49] -0.81610602  1.26061293 -1.58854506 -0.19775388  1.05393276 -1.32163504
 [55]  0.21891237 -1.02467527 -1.30187194  0.89387052  1.39120332 -0.43543256
 [61]  0.58441742 -1.41657907 -0.40207459 -0.78659323  1.10907471 -0.17559117
 [67]  1.09301940  1.10117798  1.03785346 -0.20606415  0.89796636  0.45847125
 [73]  0.74255060 -1.74716662 -0.08155371 -0.97699906 -0.41664711  0.40075054
 [79] -0.51495972 -1.35940351 -0.89453948  0.59473476 -0.28390722  1.01628648
 [85] -1.38842427 -0.22339407  1.70668793  1.38420585  1.36111055 -1.13512882
 [91] -1.29076986  0.54226490 -0.54401788  0.55509389 -0.62522354 -1.09078258
 [97]  0.99557901 -1.42094993 -0.11151046  0.04542695

Using a custom function does not only make our code more readable, it prevents errors from copying and pasting, and if we need to make a change in out formula, we need to do it in only a place–the function definition.

4.2.2 When to Write a Custom Functiom

When do you need to write a function?

  • When you find yourself copying, pasting and adapting blocks of code. Copying, pasting and adapting codes to new data or objects should not be more than three times.
  • When the code is clunky and not readable, writing custom functions improves the readability of your code
  • For individuals that regular submit report to different people, stakeholders, and groups, parametizing different inputs in your report ensure you produce multiple focused report at once.
  • Avoiding code duplication
Getting Help

Use the help() function and ? to get help. Also, use args() to see the arguments of functions. Just to let you know, there are more than 2300 functions loaded when an R session starts and that’s a lot. You will remember some and forget some, but as keep using R they become a part of you.

You are not expected to remember these functions, use help(), ? and args and check online for resources when you neeed help.

4.3 Summary

Functions are little codes that performs specific functions. Armed with them we can fly on eagles wing. Functions such as mean(), length(), plot() and so on, makes it easier to perform tasks. However, this does not imply that there are functions for all tasks we want to accomplish. In such case, we need to develop our own functions called, custom functions. There are also functions in other packages which extends the capabilities of R.

Exercise

  1. Using seq(), create a sequence of odd numbers between 1 and 100.
  2. Read the documentation of mean(), median(), sd(), and var(). What does the is.na() argument does?
  3. What is the difference between substring() and substr()?
  4. What is the difference between paste() and paste0().