Part 1
Programming: Writing instructions for a computer to perform specific tasks.
R Language: A language and environment designed for statistical computing and graphics.
R
provides a wide array of statistical tests, models, and analyses.ggplot2
allow for sophisticated data visualizations.CRAN
repository for various applications.R
can process both structured and unstructured data.Notable Companies Using R: Google, Facebook, Airbnb, Uber, and many more use R
for data analysis.
Definition: A storage area in programming to hold and manipulate data.
Importance: Allows for data storage, retrieval, and manipulation.
Analogy: Think of variables as labeled storage boxes.
Storing a value inside a variable.
print()
functionClassifications of data based on its nature.
[1] 5
[1] "Hello"
[1] TRUE
[1] 90 85 88
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Name Age
1 Anna 23
2 Bob 25
$Name
[1] "John"
$Scores
[1] 90 85 88
Methods to extract specific data or subsets from data structures.
[1] 1 3 5
[1] 3 4
[1] 3
Name Age
1 Anna 23
[1] 23 25
$
for Data Frames: To access specific columns by name.class()
as.numeric()
, as.character()
, as.logical()
, etc.[1] 5
[1] "numeric"
[1] "5"
[1] "character"
NA
: Missing dataInf
: Infinity e.g., 10/0
NaN
: Result of invalid operations e.g., 0/0
NULL
: Absence of a valueData type for categorical data
[1] male female male
Levels: female male
[1] "female" "male"
Question: what will be the output of the following? as.numeric(gender)
Answer: 2, 1, 2
+
: Addition-
: Subtraction*
: Multiplication/
: Division^
: Exponentiation (raising to a power)%%
: Modulus (remainder after division)PEDMAS Rule:
P
: Parentheses - Always start with operations inside parentheses or brackets.E
: Exponents - Next, handle powers and square root operations.MD
: Multiplication and Division - Process them as they appear from left to right.AS
: Addition and Subtraction - Handle them last, moving from left to right.Q: 3 + 5 * 2
A: 13
Q: (3 + 5) * 2
A: 16
Q: 2 ^ 2 * 3
A: 12
Tip: Always use parentheses for clarity, even if not strictly needed.
Operations that return TRUE
or FALSE
based on certain conditions:
==
: Equal to!=
: Not equal to<
: Less than>
: Greater than<=
: Less than or equal to>=
: Greater than or equal to&
: Logical AND
|
: Logical OR
!
: Logical NOT
sort()
: Organize elements in ascending or descending order.order()
: Returns the indices that would arrange the data into ascending or descending order.rank()
: Provides the rank of each element when the data is sorted. In case of ties, it assigns the average rank.Briefly,
sort()
directly arranges the data.order()
provides indices for the arranged data.rank()
gives the position of each data point in the sorted order.