In a series of learning data analysis using R, Let’s see different methods to perform descriptive statistics in R. This includes measures of central tendency, variability, and distribution shape for continuous variable.
For this tutorial, we shall use built-in dataset mtcars
. This dataset consists of 32 observations and 11 variables. We shall use three variables alone for calculating descriptive statistics.
Refine the data
> data("mtcars") > data<-c("mpg","hp","wt") > head(mtcars[data]) mpg hp wt Mazda RX4 21.0 110 2.620 Mazda RX4 Wag 21.0 110 2.875 Datsun 710 22.8 93 2.320 Hornet 4 Drive 21.4 110 3.215 Hornet Sportabout 18.7 175 3.440 Valiant 18.1 105 3.460 > data("mtcars")
The base R installation has summary()
function which shall be used to obtain descriptive statistics.
Example for descriptive statistics
> summary(mtcars[data]) mpg hp wt Min. :10.40 Min. : 52.0 Min. :1.513 1st Qu.:15.43 1st Qu.: 96.5 1st Qu.:2.581 Median :19.20 Median :123.0 Median :3.325 Mean :20.09 Mean :146.7 Mean :3.217 3rd Qu.:22.80 3rd Qu.:180.0 3rd Qu.:3.610 Max. :33.90 Max. :335.0 Max. :5.424
The summary()
function provides the minimum, maximum, quartiles, and the mean for numerical variables and frequencies for factors and logical vectors. The above results doesn’t include Standard deviation, Skewness, Kurtosis and Variance. What if you need to calculate these statistics?. For this you may use stat.desc
function in pastecs
package.
> install.packages("pastecs") > library(pastecs) > stat.desc(mtcars[data]) mpg hp wt nbr.val 32.0000000 32.0000000 32.0000000 nbr.null 0.0000000 0.0000000 0.0000000 nbr.na 0.0000000 0.0000000 0.0000000 min 10.4000000 52.0000000 1.5130000 max 33.9000000 335.0000000 5.4240000 range 23.5000000 283.0000000 3.9110000 sum 642.9000000 4694.0000000 102.9520000 median 19.2000000 123.0000000 3.3250000 mean 20.0906250 146.6875000 3.2172500 SE.mean 1.0654240 12.1203173 0.1729685 CI.mean.0.95 2.1729465 24.7195501 0.3527715 var 36.3241028 4700.8669355 0.9573790 std.dev 6.0269481 68.5628685 0.9784574 coef.var 0.2999881 0.4674077 0.3041285
There are many other packages that are available you may try describe()
function in psych
package. and let me know which is your preferable function/package for calculating descriptive statistics.