{"id":64,"date":"2022-11-10T09:42:42","date_gmt":"2022-11-10T09:42:42","guid":{"rendered":"https:\/\/dcarr-projects.com\/?p=64"},"modified":"2022-11-10T09:42:43","modified_gmt":"2022-11-10T09:42:43","slug":"downloading-new-sheets-for-version-2","status":"publish","type":"post","link":"https:\/\/dcarr-projects.com\/index.php\/2022\/11\/10\/downloading-new-sheets-for-version-2\/","title":{"rendered":"Downloading New Sheets for Version 2"},"content":{"rendered":"\n<p>We just did a lot of heaving lifting in our R workspace, creating new columns using several nested functions. Now that we have completed adding the three additional columns to all 12 of our spreadsheets for 2021, we can move on to combining these spreadsheets and saving them to new ones.<\/p>\n\n\n\n<p>We will combine these spreadsheets into the four quarterly tables, and then combine the quarters into a dataset that shows the entire year of 2021. To combine our current datasets into quarters, we will use the <em>rbind()<\/em> function. <\/p>\n\n\n\n<p>Starting off with the first quarter. place the names of the tables you wish to combine (ex. jan_21_v2, feb_21_v2, mar_21_v2) into the <em>rbind() <\/em>function and assign them the name q1_2021. You&#8217;re resulting line of code should look like this: <\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">q1_2021 &lt;- rbind(jan_21_v2, feb_21_v2, mar_21_v2)<\/p>\n\n\n\n<p>This will result in a large dataset that we have assigned the value of q1_2021. To make sure that the data has combined correctly, we can either use a <em>View()<\/em> to see the entire spreadsheet, or we can use a <em>tibble()<\/em> function to get a preview of the dataset in the R console window. We use <em>tibble()<\/em> in the same way as <em>view()<\/em>, by inserting the desired value into the function:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">tibble(q1_2021)<\/p>\n\n\n\n<p>Running this line of code should bring up this result in the R console window:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"675\" height=\"471\" src=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Tibble-for-Q1.png\" alt=\"\" class=\"wp-image-166\" srcset=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Tibble-for-Q1.png 675w, https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Tibble-for-Q1-300x209.png 300w\" sizes=\"(max-width: 675px) 100vw, 675px\" \/><\/figure>\n\n\n\n<p>Note that the the first line of the <em>tibble()<\/em> preview will show how many rows of data there are, and how many columns. You can use this knowledge and compare it to that of the datasets which you have combined to confirm if the new value of q1_2021 was made correctly using <em>rbind()<\/em>.  <\/p>\n\n\n\n<p>Do the same for the other three quarters, matching each quarter with the tables that correspond to the months that fall under it. Once you have all four quarters bound and declared, you can use the <em>rbind()<\/em> function once again to combine them into a table for the whole year. Name the table &#8220;all_2021&#8221;. You should now have five new dataset values to work with: q1_2021, q2_2021, q3_2021,  q4_2021, and all_2021. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"655\" height=\"320\" src=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Quarters-and-All-Year-bound.png\" alt=\"\" class=\"wp-image-167\" srcset=\"https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Quarters-and-All-Year-bound.png 655w, https:\/\/dcarr-projects.com\/wp-content\/uploads\/2022\/09\/R-Quarters-and-All-Year-bound-300x147.png 300w\" sizes=\"(max-width: 655px) 100vw, 655px\" \/><\/figure>\n\n\n\n<p>With all of our desired datasets assigned values in our R workspace, the final step in R is to download these new tables into spreadsheets that we will use in our analysis. For this, we will use the <em>write_csv()<\/em> function. <\/p>\n\n\n\n<p>The <em>write_csv()<\/em> function will create a new CSV file and place it in our working directory. To do this, we simply nest the name of the value we wish to turn into a CSV file, and the we follow it by what we wish to name the file in quotation marks. <\/p>\n\n\n\n<p>Before I show you an example of a <em>write_csv <\/em>function, it&#8217;s important for us to remember that we will want to put these new files into a folder made specifically for our Version 2 dataset.  We should make a subfolder in &#8220;Edited Trip Data 2021&#8221; and call it &#8220;Edited Data V2&#8221;. Once we have that folder made, we&#8217;ll make it our working directory in our R workspace.<\/p>\n\n\n\n<p>Your <em>setwd()<\/em> code should look something like this: <\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">setwd(&#8220;&#8230;\/Marketing Analysis\/2021 Edited Trip Data\/Edited Data V2&#8221;)<\/p>\n\n\n\n<p>Once we&#8217;ve set &#8220;Edited Data V2&#8221; as our working directory, it will become the destination of our new CSV files that will be made using <em>write_csv()<\/em>. We&#8217;ll start with our January file for Version 2. Using the value of <em>jan_21_v2<\/em>, our <em>write_csv()<\/em> code should look like this:<\/p>\n\n\n\n<p class=\"has-theme-palette-7-background-color has-background\">write_csv(jan_21_v2, &#8220;2021-01-tripdata-v2.csv&#8221;)<\/p>\n\n\n\n<p>Note that the naming convention that we&#8217;re using for the file is the same as with our original, raw spreadsheets. The only difference is the &#8220;v2&#8221; added at the end of the name, for Version 2. We will use this convention for all of the monthly tables. For our new quarterly and yearly tables, we can put the timeframe information at the front of the file name, where the date would be. For example, Q1 2021 would be named &#8220;q1-2021-tripdata-v2.csv&#8221;. <\/p>\n\n\n\n<p>Make a CSV file for all 12 of our monthly tables, then do the same to the four quarterly tables as well as the yearly table for 2021. We now have CSV files saved for all of the tables that we have worked on in R.&nbsp;<\/p>\n\n\n\n<p>There is one last step we should take before diving into a deeper analysis of our data. We should upload our yearly dataset into a platform that can handle such a high volume of information. <\/p>\n\n\n\n<p>The data in \u20182021-all-year-tripdata-v2.csv\u2019 is going to have well over 4.5 million rows! This is well over the limit that Excel is capable of processing. If we were to open this csv file using Excel, the file would cut the data off at around 1 million rows, and all the rest of our data would be lost. <\/p>\n\n\n\n<p>To work around this, we will use a free extension to Python known as \u2018Mito\u2019, which can be used to work with an unlimited amount of data. To use Mito, we will first need to install Python and create a Jupyter Notebook within an Anaconda environment. <\/p>\n\n\n\n<p>All of that may sound like gibberish to anyone who is unfamiliar with Python and it&#8217;s various extensions. That&#8217;s fine, because this is not a major part of the project and can be skipped over. This would just be a safe practice to have if this were for a job, because such a large dataset should have a backup in which it can be viewed in it&#8217;s entirety. <\/p>\n\n\n\n<p>If you are going to back up your dataset using Mito, you&#8217;ll first need to follow these tutorials for setting up a Jupyter Notebook using Python and Anaconda. <\/p>\n\n\n\n<ul><li>Python: <a href=\"https:\/\/phoenixnap.com\/kb\/how-to-install-python-3-windows\" title=\"\">https:\/\/phoenixnap.com\/kb\/how-to-install-python-3-windows<\/a><\/li><li>Jupyter Notebook &amp; Anaconda: <a href=\"https:\/\/www.dataquest.io\/blog\/jupyter-notebook-tutorial\/\" title=\"\">https:\/\/www.dataquest.io\/blog\/jupyter-notebook-tutorial\/<\/a><\/li><\/ul>\n\n\n\n<p> Once you are able to make an environment in Jupyter Notebook, it&#8217;s time to set up Mito. Here is a tutorial on how to install Mito.<\/p>\n\n\n\n<ul><li><a href=\"https:\/\/docs.trymito.io\/getting-started\/installing-mito\" title=\"\">https:\/\/docs.trymito.io\/getting-started\/installing-mito<\/a><\/li><\/ul>\n\n\n\n<p>With Mito up and running, we can import our spreadsheet into the virtual environment we just created. This will be useful if we want to work on this data in the future using spreadsheet functions. At the very least, its a safe place to store our yearly data. We are now ready to finish up the &#8220;Process&#8221; Phase of our project and go into the most critical phase: Analysis. <\/p>\n\n\n\n<p> <\/p>\n\n\n\n<div class=\"wp-container-1 is-content-justification-center wp-block-buttons\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?p=56\">Previous Page: Trip Distance<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?page_id=11\">Main Page<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/dcarr-projects.com\/?p=69\">Next Page: Analysis Phase<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>We just did a lot of heaving lifting in our R workspace, creating new columns using several nested functions. Now that we have completed adding the three additional columns to all 12 of our spreadsheets for 2021, we can move on to combining these spreadsheets and saving them to new ones. We will combine these&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false},"categories":[3],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/64"}],"collection":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/comments?post=64"}],"version-history":[{"count":7,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/64\/revisions"}],"predecessor-version":[{"id":355,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/posts\/64\/revisions\/355"}],"wp:attachment":[{"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/media?parent=64"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/categories?post=64"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dcarr-projects.com\/index.php\/wp-json\/wp\/v2\/tags?post=64"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}