Introduction to plotly, and ggplot2 animation package in R

Al-Fazrin Banapon
3 min readApr 12, 2019

--

Hello Readers, in this part, I will show you short tutorial for making visualization in R using plotly and ggplot2 packages in R.

What is Data Visualization ?

Data visualization is the final piece and skill set for accomplished data scientist and data analysis It involves communicating their findings effectively through graphical means. So that the layman, often a business analyst or corporate executive, can comprehend the data scientist’s complex findings, a comprehensive presentation is developed. It might include graphs, charts, mind maps, infographics, and other visuals to help convey key data findings and insights. [1]

library logo

there are some ways to visual our data, the on of is ggplot2 and plotly. ggplot2 : Create Elegant Data Visualization Data Using the Grammar of Graphics. Meanwhile plotly, we can make our plot became more interactive. And dplyr for data manipulation.

so, let’s do this !!!

you can access the data here, before visualize the data we have to build a panel data, brevity Panel Data is a data consist of time series and cross section data, or combination from both it.

library(openxlsx)
gdp <- read.xlsx(“gapminer.xlsx”, sheet = 1, startRow = 1, colNames = T)
# Country Vector Column
country_vec <- gdp[,1]
head(country_vec)
country_panel <- c()
for (i in 1:170)
{
x = rep(country_vec[i], 47)
country_panel <- append(country_panel, x)
}
years_panel <- rep(1970:2016, 170) # Years Vector Column
# GDP Vector Column
gdp_panel <- c()
for (i in 1:170)
{
x = gdp[i,]
x = x[-c(1:3)]
x = t(x)
gdp_panel = append(gdp_panel, x)
}
head(gdp_panel)

with the same way, make the same vector column for each variables. and build every data into a data frame.

df <- data.frame(country_panel, years_panel, gdp_panel, populasi_panel, life_panel)

now, Let’s visual the data !!!

Based on the panel data, we want to know by visualize “which country are in top 10 the highest Mean GDP”, for answer this question we will use dplyr package for manipulating data for get the top 10 country.

df %>%
group_by(country_panel) %>%
summarise(mean_gdp = mean(gdp_panel)) %>%
arrange(desc(mean_gdp)) %>%
head(.,n=10) %>%

we have found the top 10 country, and the next step is visualize that data using ggplot2.

ggplot() +
geom_bar(aes(x = reorder(country_panel, -mean_gdp), y=mean_gdp, fill = country_panel), stat =”identity”) +
geom_label(aes(x =reorder(country_panel, -mean_gdp), y=mean_gdp, label = round(mean_gdp, digits = 2))) +
guides(fill = FALSE) +
labs(title = “Top 10 Country by Mean GDP”, x = “Mean GDP”, y = “Country”) +
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1))

hint : running both of code above in the same time, if not will getting error.

Output from R

based on the graph, we know that the highest GDP is USA, the second is Japan from Asia Ocean, it means that Japan be the highest GDP country in Asia.

moreover, I want to make a scatter plot between gdp and life, and make size based on population, and color based on country.

gap <- ggplot(df, aes(x = log(gdp_panel), y = life_panel, size = populasi_panel,color = country_panel)) +
geom_point(aes(frame = years_panel))

after running this code, you will be getting warning, but don’t worry, it does’t mean that error, you just need calling that graph using

ggplotly(gap)

for showing that interactive graph

--

--

Al-Fazrin Banapon
Al-Fazrin Banapon

Written by Al-Fazrin Banapon

I’m a Data Scientist who never stop learning.

No responses yet