{"id":236,"date":"2022-11-10T09:47:48","date_gmt":"2022-11-10T09:47:48","guid":{"rendered":"https:\/\/dcarr-projects.com\/?p=236"},"modified":"2022-11-10T09:47:48","modified_gmt":"2022-11-10T09:47:48","slug":"data-visualization-with-r-ggplot2","status":"publish","type":"post","link":"https:\/\/dcarr-projects.com\/index.php\/2022\/11\/10\/data-visualization-with-r-ggplot2\/","title":{"rendered":"Data Visualization with R: GGPlot2"},"content":{"rendered":"\n<p>The last post was a light refresher on working in an R workspace and uploading data into that workspace. With our desired SQL exports loaded into R, we can now begin creating visualizations of our data. This will help produce images to represent the numbers, and will aid in our presentation of this data to our stakeholders. <\/p>\n\n\n\n<p>One of the great packages in R that can be utilized for data visualization is called <em>ggplot2<\/em> . If you recall from our Loading Page, it is one of the packages that we will want to load up every time. <em>ggplot2 <\/em> is a host to various functions which we can use to not only create visualizations, but to fine tune our visualizations to suit the needs of our projects. <\/p>\n\n\n\n<p>The main function used in <em>ggplot2<\/em> comes in the form of the base function, which is <em>ggplot()<\/em>. We use <em>ggplot()<\/em> to both indicate that we are making a visualization and also to specify what data we will be working with. There are multiple ways to make use of the <em>ggplot() <\/em>command, but for now we&#8217;ll just use it to specify the data that we are working with. <\/p>\n\n\n\n<p>We&#8217;ll start our first visualization by using <em>ggplot()<\/em> to specify that we want to work with the Docked Bikes dataset.  For this part, we just need to specify that the <em>data<\/em> for the chart will be <em>docked_bikes<\/em>, which is the value that I assigned to the Docked Bikes CSV file. Simply nest data = docked_bikes into the <em>ggplot() <\/em> function. So far, your line of code should look like this:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">ggplot(data = docked_bikes)<\/p>\n\n\n\n<p>Now that the data has been specified, we can set the parameters for the actual graph. First, we&#8217;ll have to decide what kind of graph or chart we&#8217;ll want to use. <\/p>\n\n\n\n<p>In order to get a birds-eye-view of out data, we&#8217;ll want to pick a visualization that can show the information of riders both individually and as a whole. This would be a good opportunity to use a scatterplot. <\/p>\n\n\n\n<p>A scatterplot is a graph that uses points on a chart to show the relationship that a certain data point would have with criteria placed along the x-axis and y-axis. We can use a scatterplot to see how far a rider may have gone on a trip (Trip Distance) and how long they took on their ride (Ride Length).  When data for the whole year is placed onto a scatterplot, we can begin to see patterns emerge between all of the points that have been plotted on the graph, with each point being an individual rider. <\/p>\n\n\n\n<p>Within the library of functions made for the <em>ggplot2<\/em> package, the <em>geom_point()<\/em>  function is used to create scatterplots.  There are several elements for the graph that will need to be nested inside of the <em>geom_point()<\/em> function, but we will first need to attach <em>geom_point()<\/em> to our base <em>ggplot()<\/em> statement. We can do this by putting a &#8220;+&#8221; in between <em>ggplot()<\/em> and <em>geom_point<\/em>. So far, our code line should look like this:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">ggplot(data = docked_bikes)+geom_point()<\/p>\n\n\n\n<p>With <em>geom_point<\/em> added to our line of code, we can now add the elements that will make up the visuals of our graph. Our <em>geom_point <\/em> statement will need the instructions for how we want the graph to be constructed. This is known as the <em>mapping<\/em> of the graph. We&#8217;ll start of by putting <em>mapping =<\/em> into our <em>geom_point<\/em> function:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">ggplot(data = docked_bikes)+geom_point(mapping =)<\/p>\n\n\n\n<p>Next, we&#8217;ll need to tell the function exactly what kind of <em>mapping <\/em>we&#8217;ll want to do. For this graph, we will focus on the aesthetics, or visual qualities, we wish to implement. Aesthetics are written in R as <em>aes()<\/em>, and we can nest the specific aesthetic elements we want to use inside the <em>aes()<\/em> function. <\/p>\n\n\n\n<p>This is a good time to ask ourselves: What kind of elements do we want making up this graph?<\/p>\n\n\n\n<p>Since we want to compare how far people rode their docked bikes versus how long their ride actually took, we are going to put ride length on the x-axis and the trip distance on the y-axis.  We also want to compare our results between casual riders and members. This can be done by having points for members being one color and points for casuals being another. <\/p>\n\n\n\n<p>These elements can be implemented rather easily from within our <em>aes()<\/em> function. We would just have to nest in the following values: x = ride_length, y= trip_dist_miles, color = member_casual .<\/p>\n\n\n\n<p>As you can see, each value entered into the <em>aes()<\/em> function is assigned a column in the data that  posses the information we want translated into the graph. You&#8217;re complete line of code should now look like this:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">ggplot(data = docked_bikes)+geom_point(mapping = aes(x = ride_length, y = trip_dist_miles, color = member_casual))<\/p>\n\n\n\n<p>Now we get to run the line of code and see what kind of visualization R generates with it. Here&#8217;s what was generated from running this code on the given dataset:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" width=\"680\" height=\"711\" src=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-docked_bikes-all-year.png\" alt=\"\" class=\"wp-image-242\" srcset=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-docked_bikes-all-year.png 680w, https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-docked_bikes-all-year-287x300.png 287w\" sizes=\"(max-width: 680px) 100vw, 680px\" \/><\/figure><\/div>\n\n\n<p>As you can see, there are thousands of plot points scattered across the chart. There are certain patterns of behavior that we can make out from this visualization. For instance, we can see that no matter how long across the entire span of time that riders might have spent on their ride, there is a greater tendency to stay within 5 miles of their original station. <\/p>\n\n\n\n<p>Another glaring detail about this graph is that there are virtually no members shown riding from a docked bike in 2021.  In fact, when we go back into the SQL file that our dataset comes from, we can see that there is literally only one member who rode a docked bike in 2021:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" width=\"556\" height=\"419\" src=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-SQL-1-row-for-docked-bikes.png\" alt=\"\" class=\"wp-image-243\" srcset=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-SQL-1-row-for-docked-bikes.png 556w, https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-SQL-1-row-for-docked-bikes-300x226.png 300w\" sizes=\"(max-width: 556px) 100vw, 556px\" \/><\/figure><\/div>\n\n\n<p>If we were actually employed at the fictional Cyclistic bike-share company, we would want to confirm with the appropriate department that docked bikes were even a service offered to our members. For the sake of this analysis, let&#8217;s pretend that docked bikes are usually only rented out to casuals due to their one-time payment method. <\/p>\n\n\n\n<p>Let&#8217;s take a look at a visualization that has clear representation of both casuals and members. We can create a similar graph for our Classic Bike dataset using this line of code:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">ggplot(data = classic_bikes)+geom_point(mapping = aes(x = ride_length, y = trip_dist_miles, color = member_casual))<\/p>\n\n\n\n<p>And this is the visualization that results from it:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" width=\"685\" height=\"701\" src=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-classic-bikes-all-year.png\" alt=\"\" class=\"wp-image-244\" srcset=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-classic-bikes-all-year.png 685w, https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/10\/R-Visualization-classic-bikes-all-year-293x300.png 293w\" sizes=\"(max-width: 685px) 100vw, 685px\" \/><\/figure><\/div>\n\n\n<p>Here we get a better picture of how members and causal riders use their vehicles differently. It&#8217;s easy to see that members tend to use their bikes for the shorter durations, even if they travel very far. This suggests that members are more likely to use these bikes as a routine commute, rather than an activity for leisure time. Such an insight can help us draw conclusions for our analysis.<\/p>\n\n\n\n<p>This covers the basic concepts that go into making a visualization in R. In the next post, we will go more in depth into other kinds of graphs that can be made, as well as ways that we can tailor our visualization to be more presentable to our stakeholders. <\/p>\n\n\n\n<div class=\"wp-container-1 is-content-justification-center wp-block-buttons\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?p=226\">Previous Page: Import for Viz<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?page_id=11\">Main Page<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?p=254\">Next Page: Viz Improvement<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The last post was a light refresher on working in an R workspace and uploading data into that workspace. With our desired SQL exports loaded into R, we can now begin creating visualizations of our data. This will help produce images to represent the numbers, and will aid in our presentation of this data to&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false},"categories":[3],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/236"}],"collection":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/comments?post=236"}],"version-history":[{"count":12,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/236\/revisions"}],"predecessor-version":[{"id":374,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/236\/revisions\/374"}],"wp:attachment":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/media?parent=236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/categories?post=236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/tags?post=236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}