[1] 1 2 3 4 5
unique()
unique()
function removes duplicated elements from a vector or data frame.unique()
any()
and all()
any()
returns TRUE
if any of the values are TRUE
.all()
returns TRUE
if all of the values are TRUE
.ifelse()
The ifelse
function applies a function to elements of a vector depending on a condition.
Example:
cbind()
and rbind()
cbind()
combines vectors, matrices, or data frames by columns.rbind()
combines vectors, matrices, or data frames by rows.Set Theory
union()
intersect()
setdiff()
setequal()
Simple Random Sample (SRS): is a subset of a population, chosen in such a way that every possible sample of a given size has an equal chance of being selected. This method ensures that each individual or item within the population has an equal probability of being included in the sample, and the selection process is entirely by chance, without any bias.
Random Sampling
Sampling With Replacement (SWR): In this method, after an individual or item is selected for the sample, it is placed back into the population before the next selection is made, allowing for the possibility of being chosen more than once. This method is particularly useful when dealing with small population sizes or when it’s important to maintain the same population size for each draw.
Sampling Without Replacement (SWOR): Contrary to SWR, in Sampling Without Replacement, once an individual or item is selected, it is not placed back into the population, and hence, cannot be selected again. This method is often utilized when the population size is large, or when maintaining the same population size for each draw is not crucial.
sample()
functionThe sample()
function draws random samples from a vector.
Syntax:
[1] 3 9 7 10 5
Write an R function coin_flip()
that simulates flipping a coin. The function should return H
(for head) or T
(for tail).
Now, extend your function to perform multiple simulations of coin flips and return the number of heads and tails.
coin_flip = function(n) {
flips = sample(c("H", "T"), size = n, replace = TRUE)
return(table(flips))
}
coin_flip(5)
H | T |
---|---|
3 | 2 |
Analyze the results of your multiple simulations. What do you observe as the number of flips increases?
The apply functions in R provide a concise and efficient way to apply a function to the elements of data structures such as vectors, lists, data frames, or matrix.
Apply functions provide a concise way to apply a function to data.
Function | Description | Usage | Example |
---|---|---|---|
apply() |
Applies a function over the margins of an array or matrix. | apply(X, MARGIN, FUN, ...) |
apply(matrix(1:9, nrow = 3), 1, sum) |
lapply() |
Applies a function to each element of a list, returning a list. | lapply(X, FUN, ...) |
lapply(list(1:5, 6:10), sum) |
sapply() |
Similar to lapply() , but tries to simplify the result. |
sapply(X, FUN, ..., simplify = TRUE) |
sapply(list(1:5, 6:10), sum) |
list
$a
[1] 1 2 3 4 5
$b
[1] 3 4 5 6 7
$c
[1] 10 11 12 13 14
mean()
using lapply()
sum()
using sapply()
sweep()
FunctionThe sweep()
function in R allows you to perform operations on arrays by “sweeping” out values of a summary statistic across margins.
sweep()
x
: the array to sweep out statistics from.MARGIN
: the margin to apply the sweep on.STATS
: the summary statistic to be used.FUN
: the function to apply.The \(Z\)-score of an observation is a metric that indicates how many standard deviations an element is from the mean of the whole set.
\(z = \frac{x - \mu}{\sigma}\)
where:
Note: The \(Z\)-score is unitless i.e., having no units of measurement
sweep()
.sweep()
.0.7720323 | -0.0236110 | 0.0700578 | 0.5499061 | -0.8822566 | -0.5208829 | 1.2807723 | -2.2090599 | 1.3172705 | -0.7553829 |
1.1352704 | 0.9457801 | 1.0327003 | 0.9907401 | 0.5247179 | -0.0071550 | -1.0176867 | 2.6768273 | -0.2675364 | -0.8119132 |
-0.3556700 | -1.3081242 | -0.2164506 | 0.1440630 | 0.9130161 | -0.0147323 | 2.1405846 | -1.1328485 | -0.1673459 | -0.6183830 |
0.3695155 | 3.1066999 | -2.7725347 | -0.7530670 | 0.1043633 | 0.6799525 | -0.8807629 | 1.8317829 | -0.4001457 | -1.4748364 |
1.5371188 | -1.4549000 | -0.9155727 | -0.8977446 | -1.1268794 | -1.8834163 | 0.2358407 | -1.1359993 | 0.9284241 | -0.5760723 |
0.4551764 | 0.9378018 | 0.8704419 | 1.2940218 | -0.0578897 | 0.4757888 | -0.6402558 | -0.2850556 | 2.0390161 | 0.4287817 |
2.0869911 | 1.4019592 | 0.8649523 | 0.0084196 | 0.2193383 | -1.8027268 | 2.0185865 | -0.7242505 | -0.5220771 | -1.5277922 |
-2.0259216 | -0.6943732 | -0.7830958 | -0.2226866 | 0.1012604 | -1.6709143 | 0.0444366 | 0.2291196 | -0.4010404 | -0.3937708 |
-0.3312715 | 0.4724834 | -0.6503196 | -0.9633905 | 1.1221714 | -0.6869030 | 0.0825795 | 1.6361345 | -0.1416749 | 0.3748363 |
1.0570444 | 1.5559028 | 0.3398075 | -0.2588452 | -0.0381289 | -0.5262985 | 1.0498150 | -0.3115017 | -0.1811098 | -0.0527232 |
[1] 0.47002857 0.49396187 -0.21600136 -0.01085833 0.08797128 -0.59572878
[7] 0.43139097 0.05751487 0.22037806 -0.54072559
[1] 0.47002857 0.49396187 -0.21600136 -0.01085833 0.08797128 -0.59572878
[7] 0.43139097 0.05751487 0.22037806 -0.54072559
0.3020037 | -0.5175729 | 0.2860591 | 0.5607645 | -0.9702279 | 0.0748459 | 0.8493813 | -2.2665748 | 1.0968924 | -0.2146573 |
0.6652418 | 0.4518182 | 1.2487017 | 1.0015984 | 0.4367466 | 0.5885738 | -1.4490777 | 2.6193124 | -0.4879144 | -0.2711876 |
-0.8256986 | -1.8020861 | -0.0004493 | 0.1549214 | 0.8250448 | 0.5809965 | 1.7091936 | -1.1903634 | -0.3877239 | -0.0776574 |
-0.1005131 | 2.6127380 | -2.5565334 | -0.7422087 | 0.0163920 | 1.2756813 | -1.3121539 | 1.7742681 | -0.6205237 | -0.9341108 |
1.0670902 | -1.9488618 | -0.6995713 | -0.8868863 | -1.2148507 | -1.2876875 | -0.1955503 | -1.1935142 | 0.7080460 | -0.0353468 |
-0.0148522 | 0.4438399 | 1.0864433 | 1.3048802 | -0.1458610 | 1.0715176 | -1.0716467 | -0.3425705 | 1.8186380 | 0.9695073 |
1.6169625 | 0.9079973 | 1.0809537 | 0.0192779 | 0.1313670 | -1.2069980 | 1.5871955 | -0.7817654 | -0.7424552 | -0.9870666 |
-2.4959502 | -1.1883350 | -0.5670945 | -0.2118283 | 0.0132891 | -1.0751855 | -0.3869544 | 0.1716047 | -0.6214184 | 0.1469548 |
-0.8013000 | -0.0214784 | -0.4343183 | -0.9525322 | 1.0342001 | -0.0911742 | -0.3488115 | 1.5786196 | -0.3620529 | 0.9155619 |
0.5870158 | 1.0619410 | 0.5558089 | -0.2479869 | -0.1261002 | 0.0694302 | 0.6184240 | -0.3690166 | -0.4014878 | 0.4880023 |
0.2594376 | -0.3669410 | 0.2494626 | 0.7242210 | -1.3855529 | 0.0806603 | 0.7410379 | -1.4707892 | 1.2424232 | -0.3215159 |
0.5714790 | 0.3203233 | 1.0889509 | 1.2935530 | 0.6237045 | 0.6342973 | -1.2642395 | 1.6996821 | -0.5526487 | -0.4061875 |
-0.7093201 | -1.2776158 | -0.0003918 | 0.2000792 | 1.1782214 | 0.6261313 | 1.4911762 | -0.7724314 | -0.4391654 | -0.1163160 |
-0.0863462 | 1.8523396 | -2.2294670 | -0.9585541 | 0.0234089 | 1.3747829 | -1.1447812 | 1.1513295 | -0.7028520 | -1.3991208 |
0.9166887 | -1.3816747 | -0.6100727 | -1.1454036 | -1.7348913 | -1.3877218 | -0.1706067 | -0.7744760 | 0.8019864 | -0.0529427 |
-0.0127588 | 0.3146669 | 0.9474508 | 1.6852379 | -0.2082996 | 1.1547587 | -0.9349521 | -0.2222953 | 2.0599267 | 1.4521380 |
1.3890589 | 0.6437382 | 0.9426635 | 0.0248972 | 0.1876013 | -1.3007640 | 1.3847397 | -0.5072906 | -0.8409608 | -1.4784385 |
-2.1441573 | -0.8424878 | -0.4945441 | -0.2735738 | 0.0189778 | -1.1587116 | -0.3375962 | 0.1113550 | -0.7038654 | 0.2201104 |
-0.6883604 | -0.0152275 | -0.3787544 | -1.2301845 | 1.4769097 | -0.0982571 | -0.3043186 | 1.0243724 | -0.4100885 | 1.3713381 |
0.5042786 | 0.7528789 | 0.4847023 | -0.3202722 | -0.1800798 | 0.0748239 | 0.5395405 | -0.2394563 | -0.4547554 | 0.7309349 |
sweep()
Normalize a data frame df
(with columns X1
, X2
, X3
each containing \(10\) random integers between \(1\) and \(100\)) by subtracting the median and dividing by the interquartile range of each column.
set.seed(123)
df = data.frame(X1 = sample(1:100, 10),
X2 = sample(1:100, 10),
X3 = sample(1:100, 10))
df
X1 | X2 | X3 |
---|---|---|
31 | 90 | 7 |
79 | 91 | 42 |
51 | 69 | 9 |
14 | 99 | 83 |
67 | 57 | 36 |
42 | 92 | 78 |
50 | 9 | 81 |
43 | 93 | 43 |
97 | 72 | 76 |
25 | 26 | 15 |
X1 | X2 | X3 |
---|---|---|
-0.5299145 | 0.2834646 | -0.6200873 |
1.1111111 | 0.3149606 | -0.0087336 |
0.1538462 | -0.3779528 | -0.5851528 |
-1.1111111 | 0.5669291 | 0.7074236 |
0.7008547 | -0.7559055 | -0.1135371 |
-0.1538462 | 0.3464567 | 0.6200873 |
0.1196581 | -2.2677165 | 0.6724891 |
-0.1196581 | 0.3779528 | 0.0087336 |
1.7264957 | -0.2834646 | 0.5851528 |
-0.7350427 | -1.7322835 | -0.4803493 |
Generate a dataset that simulates the heights (in centimeters) of 1000 individuals. Assume an average height of 170 cm and a standard deviation of 10 cm. Follow the following steps:
rnorm()
function.[1] 167.8413 166.6509 159.1430 169.1458 180.7061 168.5461
[1] -0.17243735 -0.29236253 -1.04868498 -0.04103338 1.12352467 -0.10144587
apply()
and sweep()
Feature | apply() |
sweep() |
---|---|---|
Purpose | Apply a function over the margins of an array or matrix to summarize or transform it. | Apply arithmetic operations to an array “sweeping” out array summaries. |
Usage | Used for summarizing data with a function over specified margins (rows or columns) | Used for adjusting data using a summary statistic for operations like centering or scaling. |
Functionality | Used to apply a wide range of functions for summarizing or transforming data across dimensions | Used to perform arithmetic operations using a summary statistic and is often used after summarizing data with apply(). |
Arguments | apply(X, MARGIN, FUN, ...) where X is the array, MARGIN specifies rows(1) or columns(2), and FUN is the function to be applied. |
sweep(x, MARGIN, STATS, FUN = "-", ...) where x is the array, MARGIN specifies the dimension, STATS is the summary statistic, and FUN is the arithmetic function to be applied. |
Return Value | Returns an array, matrix, or list with the results of the function application, which may be of a different dimension from the input. | Returns an adjusted array with the same dimensions as the input, with element-wise arithmetic operations performed. |
Exclusive Actions | - Can return different structures (vector, array, list) based on the function and margin. - Can work with higher-dimensional arrays beyond matrices. |
- Directly performs arithmetic sweep operations using a summary statistic. - Ideal for data adjustments after using apply() to calculate the summary statistic. |
Limitations | - Cannot directly adjust data using a summary statistic; additional steps are required to integrate the summary before or after using apply() . |
- Not designed for summarizing data; it requires pre-calculated statistics to perform the sweep operation. |
Flexibility | - Can use any function, including user-defined ones, for summarization or transformation. - More general-purpose in data manipulation. |
- Limited to arithmetic sweep operations; custom functions must conform to the expected input and output format of sweep() . |
Common Use Case | - Computing aggregate statistics like means, sums, etc., across rows or columns. - General data manipulation tasks requiring the application of a function. |
- Standardizing or normalizing data. - Centering data by subtracting the mean or dividing by a standard deviation after calculating these with apply() . |
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE