R is a free software environment
for statistical computing and graphics. To use R we might use RStudio, the most popular R IDE, or
directly use R in a terminal. In both cases, we need to first download
and install R. To install R we can create a virtual environment by conda or directly download R source code or use a
Linux package manager (apt
, yum
, and etc.). In
this article we will learn some fundamental syntax in R including data
structures and operators, control flows, functions, and an overview of R
packages.
You may find more about plotting and programming in R at:
R operators include:
+
, -
, *
,
/
, ^
,
% any arithmetic operarors %
!
[
, [[
:
$
, @
&
, &&
,
|
, ||
%in%
=
, <-
,
->
<
, >
,
<=
, >=
, ==
,
!=
For example:
= 8
a = 3
b = 2
n %/% b # Intiger division
a ## 2
%% a # Remainder
a ## 0
^ n # nth power
a ## 64
^ 1/n # nth root
a ## 4
= matrix(c(1,2,3,4), ncol = 2)
A
A## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
%*% A # Matrix multiplication
A ## [,1] [,2]
## [1,] 7 15
## [2,] 10 22
There are four major data structures in R:
c
matrix
data.frame
list
Vectors are generating by c
command
which combines values into a vector. Vectors are
subscriptable and mutable objects that
can be concatenated. We can call them by using
vector[index]
. Vectors keeps all array with a same
type.
matrix
creates a matrix from the given
set of values. Matrices are subscriptable and
mutable objects and we can use
matrix[row,col]
to call columns and rows. Matrices keeps
all array with a same type and they cannot be concatenated.
data.frame
creates data frames, store
each column separately as a different variable with different
observations (n obs. of m variables). When we read a csv file it saves
as a dataframe. Data frames are subscriptable objects
and we can use data.frame[row,col]
or
data.frame[col]
to call columns and rows and
data.frame$col_name
can be used to call certain column by
their names. They also can concatenate.
R list is the object which contains elements of
different types – like strings, numbers, vectors, matrices, functions
and another list inside it. It also could contains different number of
objects at each row. For example if we have a loop that do not generate
same amount of results at each iteration then we can store them in a
list format. Lists are subscriptable and we can use
list$name
or list[index]
to call components
(rows) and list$name[index_2]
or
list[[index]][index_2]
to call members of each component
(row). They also can concatenate.
# Vectors
= c(1:3,7) # all int
c1 typeof(c1)
## [1] "double"
str(c1) # structure of c1
## num [1:4] 1 2 3 7
= c(1:3,'a',7) # all str
c2 typeof(c2)
## [1] "character"
str(c2)
##chr [1:5] "1" "2" "3" "a" "7"
= c('a','b','c','d')
letter 1] # first element
letter[## [1] "a"
1:3] # elements 1 to 3
letter[## [1] "a" "b" "c"
4] = 'z' # mutable
letter[
letter## [1] "a" "b" "c" "z"
c(letter, 'cat') # concatenate
## [1] "a" "b" "c" "z" "cat"
append(letter, 'append')
## [1] "a" "b" "c" "z" "append"
# Matrices
= matrix(c(1:8), 2, 4)
mm
mm## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 7
## [2,] 2 4 6 8
typeof(mm)
## [1] "integer"
str(mm)
## int [1:2, 1:4] 1 2 3 4 5 6 7 8
1,2] # row 1 col 2
mm[## [1] 3
2,4] = 100 # mutable
mm[
mm## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 7
## [2,] 2 4 6 100
## Dataframes
= data.frame(col1 = 1:3, col2 = letters[1:3], col3 = 31:33)
df
df## col1 col2 col3
## 1 1 a 31
## 2 2 b 32
## 3 3 c 33
typeof(df)
## [1] "list"
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ col1: int 1 2 3
## $ col2: Factor w/ 3 levels "a","b","c": 1 2 3
## $ col3: int 31 32 33
$col1 # column col1
df## [1] 1 2 3
1] # column 1
df[,## [1] 1 2 3
"col1"] # column 1
df[,## [1] 1 2 3
"col1"]]
df[[## [1] 1 2 3
1,] # row 1
df[## col1 col2 col3
## 1 1 a 31
1,1] # row 1 and col 1
df[## [1] 1
1,1] = 100 # mutable
df[
df## col1 col2 col3
## 1 100 a 31
## 2 2 b 32
## 3 3 c 33
$col4 = c(103,102,101) # concatenate
df
df## col1 col2 col3 col4
## 1 100 a 31 103
## 2 2 b 32 102
## 3 3 c 500 101
## Lists
= list(x = 11:15, y = 1:7)
ls typeof(ls)
## "list"
str(ls)
## List of 2
## $ x: int [1:5] 11 12 13 14 15
## $ y: int [1:7] 1 2 3 4 5 6 7
$y[7] # or ls[[2]][7]
ls## [1] 7
$y[8] = 80 # concatenate
ls$y
ls## [1] 1 2 3 4 5 6 7 80
= vector("list", 2) # make an empty list
mpty_list names(empty_list) = paste("list", 1:2, sep = "_") # rename the list
Note that not only we can select by indexing the objects, but also we can remove entries. For instance:
-4] # remove 4th element
letter[## [1] "a" "b" "c"
c(-2,-3)] # remove column 2,3
mm[,## [,1] [,2]
## [1,] 1 7
## [2,] 2 100
= df[,-4] # remove 4th col
df
df## col1 col2 col3
## 1 100 a 31
## 2 2 b 32
## 3 3 c 500
And since most of R data structures are subscriptable, we can easily filter them as well. For example:
## Let's select rows when:
$col1 < 100,] # col1 < 100
df[df## col1 col2 col3
## 2 2 b 32
## 3 3 c 500
$col3 %in% c(31,32),] # col3 is 31 or 32
df[df## col1 col2 col3
## 1 100 a 31
## 2 2 b 32
!df$col3 %in% c(31,32),] # col3 is not 31 nor 32
df[## col1 col2 col3
## 3 3 c 500
$col1 > 10 & df$col3 > 30, ] # col1 > 10 and col3 > 30
df[df## col1 col2 col3
## 1 100 a 31
$col1 > 10 | df$col3 > 40, ] # col1 > 10 or col3 > 30
df[df## col1 col2 col3
## 1 100 a 31
## 3 3 c 500
## Let's order based col1
order(df$col1),]
df[## col1 col2 col3
## 2 2 b 32
## 3 3 c 500
## 1 100 a 31
## Let's find which elemnts in col3 are > 31
which(df$col3 > 31)
## [1] 2 3
## Let's find percentage of col3 > 31
length(which(df$col3 > 31))/nrow(df)
## [1] 0.6666667
## Change col1 to 0,1 such that
$col1[df$col1 < 100] = 0
df$col1[df$col1 >= 100] = 1
df
df## col1 col2 col3
## 1 1 a 31
## 2 0 b 32
## 3 0 c 500
We can use the following commands to convert main R objects to other types:
as.numeric
as.integer
as.character
as.matrix
as.data.frame
as.list
as.Date
as.factor
These statements allow us to control flow of the R script. The most common control statements include:
The following are some simple examples of using these statements in R.
= 10
n if (n == 7) {
print("n is equal 7")
else if (n > 7) {
} print("n is greater than 7")
else {
} print("n is smaller than 7")
}## [1] "n is greater than 7"
= 7
n while (n < 10) {
print(n)
= n + 1
n
}## [1] 7
## [1] 8
## [1] 9
= 0
mysum for (i in c(10,20,30)) {
= mysum + i
mysum
}print(mysum)
## [1] 60
= 0
mysum for (i in 1:100) {
= mysum + i
mysum if (mysum > 25) {
break
}
}print(mysum)
## [1] 28
= 1:2
a = 1:2
b for (i in a) {
stopifnot(all.equal(a,b)) # if all are not TRUE then stop
cat("'a' and 'b' both are equal to: ", i,"\n")
}## 'a' and 'b' both are equal to: 1
## 'a' and 'b' both are equal to: 2
By using function
command we can define our own
functions in R. For instance, lets define function Δ = b2 − 4ac
and find the solution for a = 2, b = 3 and c = 4:
# Delta
= function(a, b, c) {
delta ^2 - 4*a*c
b
}delta(a = 2, b = 3, c = 4)
## [1] -23
Some other examples:
# Norm
= function(x) sqrt(x %*% x)
norm norm(1:4)
## [,1]
## [1,] 5.477226
# Square
= function(x) return(x * x)
square square(2)
## [1] 4
# Factorial
= function(n) {
fact_iter = 1
p for (i in 1:n) {
= p * i # Not recursive
p
}return(p)
} fact_iter(8)
## [1] 40320
# Recersive function that compute n!
= function(n) {
fact_rec if (n == 1)
return(1)
else
return(n * fact_rec(n - 1)) # Recursive function
}fact_rec(8)
## [1] 40320
# Recersive function that compute a * b
= function(a, b) {
mult if (b == 1) {
return(a)
else {
} return(a + mult(a, b-1)) # Recursive function
}
}mult(6, 5)
## [1] 30
# Recersive function that compute matrix power
= function(p, n) {
matrix.power if (n == 1)
return(p)
else
return(p %*% matrix.power(p, n-1)) # Recursive function
}matrix.power(matrix(c(4,2,2,4), 2, 2), 3)
## [,1] [,2]
## [1,] 112 104
## [2,] 104 112
# Matrix symmetric test
= function(a) {
sym if (is.matrix(a) == TRUE) {
if (identical(a, t(a)) == TRUE) {
return("Matrix is symmetric")
else return("Matrix is not symmetric")
} else return("Entry is not a Matrix")
}
}sym(matrix(c(4,2,2,4), 2, 2))
## [1] "Matrix is symmetric"
In R we can use read.
and write.
to read
and write the file types that we want.
= data.frame(name = c("Ashki", "Ari", "Dori", "Pishi"), gpa = c(3.4,3.7,3.9,3.5))
gpa
# write
write.table(gpa, file = "~/Documents/gpa.txt", sep = " ", row.names = FALSE, col.names = TRUE)
# add
write.table(data.frame(name = "Ellie", gpa = 3.3), file = "~/Documents/gpa.txt", append = TRUE, sep = " ", row.names = FALSE, col.names = FALSE)
# read
read.table("~/Documents/gpa.txt", header = T)
# csv
write.csv(gpa, file = "~/Documents/gpa.csv", row.names = FALSE)
read.csv("~/Documents/gpa.csv") # header is TRUE by default
Packages are very important component of R. RStudio is a great IDE
for R that provides some basic libraries. But based on your requirements
you may need to install and import other packages. We can use
install.packages("package name")
and
library("package name")
functions to install and import
packages in RStudio. Knowing packages in R is a very important topic,
some of packages that I am using are include:
rmarkdown
, kintr
,
kableExtra
shiny
lattice
, ggplot2
sf
, maps
, leaflet
R2OpenBUGS
, RStan
(need
openBUGS and Stan)reticulate
rjson
MASS
class
boot
glmnet
pls
splines
gam
gbm
tree
,
randomForest
e1071
lme4
, nlme
, MASS
profileR
plm
, splm