Data visualizations are for business users to see the visual representation of Analytics including charts, graphs, maps and other graphical formats. Data visualizations illustrate difficult concepts, unearth relationships among different data elements and also help in spotting hidden trends and patterns within the data set.
R is a very powerful open source programming language and software environment for statistical computing and graphics. It compiles and runs on various UNIX platforms, Windows and MacOS.
The grammar of graphics based plotting system (ggplot2) is the starting point for data visualizations or graphics in R. You can use ggplot2 to plot complex multi-layered graphs with ease.
I will be using the following R version in my examples:
R version 3.4.0 (2017-04-21) — “You Stupid Darkness”
Here are some examples to get you started –
If you do not have ggplot2 installed, you can use the following syntax for install ggplot2:
To load the ggplot2 libraries, use the following syntax:
I will be using the famous mtcars (or Motor Trend Car Road Tests) dataset which is one of preloaded datasets in R. To find more details about the mtcars dataset, please check the following link:
You can use the following syntax for checking mtcars dataset:
Let us first try to plot the columns miles per gallon (mpg) and displacement (disp) using the qplot (quick plot) function:
Here is another variant of the qplot function for the above plot:
qplot(mpg, disp, data = mtcars)
You can realize the difference between qplot() and the basic plot() function in R if you see the output for these two columns using plot() function. The syntax to be used is:
To understand the power of power of ggplot2, let us try to customize the visualization using qplot. Let us use the following syntax to add colors to a basic scatter plot:
qplot(mpg, disp, data = mtcars, color = cyl)
Here we are using the argument color with the qplot function and we are using the no. of cylinders (cyl) column for changing these colors. As you can see the colors used in the scatter plot are different shades of Blue. If you need to use different colors for different groups, you can use the following syntax:
qplot(mpg, disp, data = mtcars, colour=factor(cyl))
Now to change the sizes of points for the cyl column, you can also use the following syntax:
qplot(mpg, disp, data = mtcars, colour=factor(cyl), size=cyl)
Also, to add different shapes for different values of cyl column the following syntax can be tried out:
qplot(mpg, disp, data = mtcars, colour=factor(cyl), size=cyl, shape=factor(cyl))
To provide a customized title for your chart, you can use the main argument of the qplot function:
qplot(mpg, disp, data = mtcars, colour=factor(cyl), size=cyl, shape=factor(cyl), main=”MTCars”)
To customize the labels of X-axis and Y-axis, the arguments xlab and ylab are used for the qplot function:
qplot(mpg, disp, data = mtcars, colour=factor(cyl), size=cyl, shape=factor(cyl), main=”MTCars”, ylab=”Displacement”, xlab=”Miles per Gallon”)
That covers the basics of ggplot2. I am going to cover more features and different types of charts in the next blog post. Stay tuned ….