Adding “Day of Week” to Version 2
By this point, we should have a basic template, or “baseline”, set up in our R workspace for every spreadsheet in our dataset. We will use this baseline of our “Version 2” to start making edits to the dataset.
The edits that we will be adding onto to our spreadsheets will be two new columns, which we will be naming “day_of_week” and “trip_dist_miles”.
These new column names stand for “Day of Week” and “Trip Distance (Miles)” respectively. “Day of Week” is rather obviously meant to show the day that each particular ride began on. “Trip Distance (Miles)” refers to the geospatial distance between the starting location of a ride and it’s ending location.
It’s important to note that “Trip Distance (Miles)” does not necessarily represent how far the rider actually travelled on the bike. It can only show the distance between start point and end point. Later on, we will see that there are several rides that will have a “0.00” value. This is because the rider started their ride and ended it at the same location.
Now that we have a better understanding for the data that we want to add to your dataset, it’s time to start making the changes we want to see.
Just like in the last post where we assigned values to our spreadsheets, we will make our first edits on our January spreadsheet.
Open up our “Basic-Formatting-Page” file in R and continue on from the last month that you saved into the workspace. Underneath your last line of code, leaving a comment to distinguish the following lines of code from what you have above. Use a pound sign (#) and write your comment. I wrote “Editing” in my comment to act as a title for this section of the file. In the next line, I used double pound signs (##) to make a sub-comment to the first. The sub-comment will act as an explanation to the code we will be typing.
We will begin our edits by adding a “day_of_week” column to our “jan_21_v2”. To do this, we will write “jan_21_v2” and add a dollar sign ($) to the end of it. In R, the dollar sign indicates that you are about to call up a specific column in a dataset. We can use the dollar sign to add a new spreadsheet to a dataset, as well as assign the data that is meant to be added inside the column.
On the other side of the dollar sign, we will add the name of our new column (day_of_week). The resulting code should look like this:
jan_21_v2$day_of_week
Before we run this code, we are going to want to add the function that will fill up our new column with the data that we desire to see there. We will be using the weekday() function to add the day of the week pertaining to the start of each ride. To do this, we are going to need to nest the data in our “started_at” column within the weekday() function.
The weekday() function would need to have nested inside it the code for our “started_at” column when it is called inside R. This would be done by using the dollar sign once again (jan_21_v2$started_at). Our code chunk is not complete, however, because the weekday() function will not be able to read the column name in that format. It needs to have code inside of it that is in an accepted time format.
We’re going to nest another function inside weekday(), and this one will be used to feed our column data into weekday(). If we take a look at our “started_at” column, we can see that our date and time data is formatted in an order that goes: month, day, year, followed by hour, and then minute.
The function that matches this format in R would be mdy_hm(). Note that each letter in the function represents the measurement of time that the data is ordered in. To complete our code chunk, we will nest “jan_21_v2$started_at” inside of our mdy_hm() function, and then nest mdy_hm() inside of our weekday() function. The final result of this line of code should look like this:
jan_21_v2$day_of_week <- weekday(mdy_hm(jan_21_v2$started_at))
After we have ran this line of code, we can check to see our results by running the view() function on our “jan_21_v2” value. The result should be this:
Congrats! You’ve just added a new column of data using R! This one used a rather simple chunk of code to fill up the column. The code needed to fill up our next column, “trip_dist_miles” (“Trip Distance in Miles”) , will be a bit more complicated. We’ll cover that in the next post.