Using R to Enhance Our Dataset

R is a great programming language that can help us to expand upon a raw dataset and extract more insightful information from it. We are going to use R to enhance our Cyclistic dataset by adding two more columns to our spreadsheets: one for the day of the week a ride took place on, and another for the distance between a ride’s start point and end point.

To get started, we are going to want to make sure that we have an R console installed and set up on our computer.

Here is an in-depth tutorial from Dataquest on how to install either RGui or RStudio:

https://www.dataquest.io/blog/tutorial-getting-started-with-r-and-rstudio/

The concepts and functions that we will be using for this analysis apply to both versions of R. Due to the subscription-based cloud service model of RStudio, we will be using the free RGui to conduct our analysis and will set the working directories within our “Marketing Analysis” folder.

All of the various possibilities that R enables for data analysis come from it’s wealth of library packages. Each package is designed for specific roles and has unique functions to help us achieve our goals. In order to move forward, we are going to need to install the particular packages that have the functions we will need throughout this project.

In the free RGui console, it is necessary to download and install all of the packages we will be using.

Here is a quick tutorial that you can refer back to on how to search for and install packages into either RGui or RStudio.

https://www.geeksforgeeks.org/how-to-install-a-package-in-r/

When using RGui, it’s necessary to not only have the appropriate packages installed, but to also load those packages during each session. It’s a good practice to have a file that lists all of the packages you could potentially use in your analysis. That way, you can load whichever packages that you need for that particular session in RGui.

We are going to create a file in R that will act as both a list of our needed packages and the place where we can run the command to load them. We will call this a “Loading Page”.

To start, we’ll go back into our “Marketing Analysis” folder and create a new folder inside it. This folder will be named “R Desktop files” and will be the place where we store the core R files that make up our analysis.

Next, open a new file in RGui by clicking the “File” button at the top menu and selecting “New Script”. This will bring up a new script file for us to work in. With the new script window selected, click the “File” button once again and click “Save As”. This will bring up the Save As window. From here, use the main panel to find your “R Desktop files” folder and double-click to select that folder as the destination. In the “file name” tab at the bottom of the window, we’ll save the file with the name “Loading-Page”.

Once we have the file saved to our folder, the final step is to type in the commands to load in all of the packages we will need to use. We will load each package individually using the library() function.

Simply type the name of the package within the parenthesis of the function. For instance, to load the dplyr package (which contains some of the most essential functions for data manipulation) we would type “library(dplyr)”. From there, highlight that line of code and right-click to bring up a dropdown menu. Select the “Run line or selection” option to load the package. You should see the console window in RGui running the selection and giving input as to whether or not it was successful.

Repeat this process for all of the packages we will be using for this project. Here is the complete list of packages:

  • dplyr
  • geosphere
  • lubridate
  • readr
  • readxl
  • rmarkdown
  • tidyr
  • tidyverse
  • utils
  • xlsx

Throughout the course of the project, I will go over the use for most of these packages individually. As a rule of thumb, you can always use the previously mentioned steps to find and load up any package that is missing from your console. This will be useful whenever you run a function and the console tells you that the necessary package is missing.

Your “Loading Page” window should look like this once you are finished:

Whenever you start up a new session of RGui to work on the project, it will be important to open up the “Loading-Page” file and run each line the same way that we did for our dplyr example.

With all of these packages loaded, we are finally ready to start adding the new columns to our dataset using R.

Similar Posts

Leave a Reply

Your email address will not be published.