Visualizing Cost Savings in Tableau _ Live Code-Along

The Art of Creating a Visual Dashboard: A Step-by-Step Guide

Connor began his presentation by explaining how he had created a visual dashboard to answer questions about cancer diagnosis costs. He started by registering for a webinar on becoming a business analyst, which was held at DataCamp.com. The webinar covered topics such as dashboard creation and the use of Tableau.

As Connor walked through his process, he explained that he had used a combination of visualizations to answer complex questions. He began by selecting data from two different sources and linking them together using Tableau's blend relationship feature. This allowed him to combine data from multiple sources into a single view.

Connor then discussed the challenges of working with linked data sets. He explained that when creating a new visualization, he had needed to link the two data sets in order to get meaningful results. In this case, he had used Tableau's blend relationship feature to create a custom relationship between disease stage and disease name.

Next, Connor moved on to discussing the different visualizations he had created for his dashboard. He explained that each visualization was designed to answer a specific question about cancer diagnosis costs. The first visualization showed how the total cost of cancer treatment varied by stage. The second visualization provided information on which stages were most prevalent and where diagnoses were most common.

Connor then discussed how to add additional visualizations to his dashboard. He explained that he had added two new questions to the end of the presentation, including a question about the cost of cancer diagnosis in California for each individual stage. This allowed him to provide more detailed information on the costs associated with different stages of cancer diagnosis.

Throughout the presentation, Connor emphasized the importance of creating meaningful visualizations and using linked data sets to get accurate results. He also discussed how to customize the look and feel of a dashboard by changing the color palette. Finally, he encouraged the audience to ask questions and provide feedback on his dashboard.

The Presentation Continued

Connor concluded his presentation by discussing how he had created a calculated field to summarize the cost of cancer treatment for each individual stage. This allowed him to get accurate results even though adding cost to the rows didn't make sense in this case.

One member of the audience asked about how to link two data sets if they weren't already linked. Connor explained that he had used Tableau's blend relationship feature to create a custom relationship between disease stage and disease name. He also discussed how to change the color palette on the dashboard by going into the color options in the marks card.

Another member of the audience asked about how to make the dashboard look more visually appealing. Connor suggested playing around with different palettes and customization options. He encouraged the audience to experiment and find a color scheme that worked best for their needs.

The Audience Q&A Session

As the presentation came to a close, Connor opened up the floor for questions from the audience. Several members of the audience asked questions about how to link data sets and create meaningful visualizations. Others asked about how to customize the look and feel of the dashboard.

One member of the audience asked if it was possible to change the color palette on the dashboard. Connor explained that he had used Tableau's color options in the marks card to select a custom palette. He also suggested playing around with different palettes to find one that worked best for their needs.

Another member of the audience asked about how to create a calculated field to summarize data. Connor explained that this was an advanced feature in Tableau, but it allowed him to get accurate results even when adding cost to the rows didn't make sense.

The Audience Member with Cancer Background

One member of the audience shared their own experience with cancer diagnosis costs. They mentioned that they had been surprised by how much cancer diagnosis costs varied by stage. This sparked a discussion about the importance of screening programs and how they can affect cancer diagnosis costs.

As the presentation came to a close, Connor thanked the audience for their participation and encouraged them to continue experimenting with Tableau. He also reminded them that more training sessions would be available in the future, including one on becoming a business analyst.

"WEBVTTKind: captionsLanguage: enforeign I'm excited to walk you through some Tableau visualizations today uh as Richie said I'm a Salesforce developer here at datacamp um I have been here a couple years and uh I uh for those who don't know Salesforce is one of the um you know Key Systems that tracks our business to business data so moving to Tableau and working on these visualizations is uh definitely close to uh some of the stuff I work with um so I believe we're going to uh shut off uh cameras uh just to save bandwidth but uh nice to meet you all and uh let's let's dive into it so um so for our agenda today um uh we're going to be looking at a couple of data sets on uh cancer cancer cases and when people are diagnosed what stage they are diagnosed at and the cost for what happens by stage and the idea behind this is to maybe determine what sort of public policy um what c-trends advise towards policy that could you know help prevent um help lower costs uh help uh know when and uh how to screen for uh for Cancers and just improve uh uh the the uh the Whole Health Care System behind cancer treatment and diagnosis um so that's the the overall arching uh motivation behind this but we'll get in and Define a few uh key questions that we can uh use Tableau to help us answer um so first of all I'll share a little bit more info on the data we'll Define these key questions and then we'll jump into Tableau um hopefully everyone's able to download the data in Tableau a little bit visualize build our visualizations and move on to our question and answer session so first let's take a let's let's take a look at our data we're gonna be working with um uh and before we look at our actual sheets I wanted to cover a few key definitions that are really essential for the data we're looking at so both of our our files you'll see have cancer broken up by stage different stages so the uh when cancer is diagnosed a stage is determined between 0 and 4 and 4 being the worst zero being uh worst as in most uh most progressed disease and zero being the the least progression so uh zero here is defined as abnormal cells are present but have not spread in nearby tissue and in some cases may be a precursor to cancer but not an actual cancer now with stage one two and three share a definition uh cancer is present um the severity is then graded between one two and three based off of uh factors such as the size of tumor and the spread into nearby tissue and finally stage four uh is where cancer spread to distant parts of the body so you'll see there is uh one um some stage question mark in our data and these are just diagnoses with unknown stage in our data so we may not be able to gather a ton of info from those but uh it does happen that um the stage is not known or determined for for some of these diagnoses All Right Moving On uh we will look to answer some key questions about these this data in Tableau um and I've written these out here so first uh first we can try and figure out what types of cancer are most prevalent then we can look at what stage do most diagnoses happen and then we'll pull in um our cost data and try and determine what is the cost of breast cancer treatment depending on stage and uh finally we'll try and pull it all together combine our uh um are uh cancer the the count of cancer diagnoses in California with the cost per uh per treatment and trying to answer this last question what is the was the total cost for breast cancer treatment in California from 2004 to 2012. um and you know I'll cover why we're working on California for those dates so moving on um let's just take a look at our data in Excel I have it open up here we have two sources here first being uh these accounts uh by stage so you see we have uh five different types of cancer breast colon rectum lung and prostate and then it's broken up by stage so we have 8200 uh instances of colon cancer diagnosed at stage zero and one one thing I want to note about this particular data set is that is only for California from the years 2004 to 2012. so that's why uh we're getting into that uh that question where we're trying to we're going at the very end determine the cost for all of these by stage for California and then moving over to our Second Source we have uh again broken up by stage but here we're looking at the cost uh and this is just for breast cancer I'd like to highlight we only have this data for breast cancer so this would be breast cancer stage zero uh per uh case the cost is going to be about sixty thousand dollars for zero to twelve months around eighteen thousand dollars for 12 to 24 months so these are our two data sets we're going to be pulling in um and visualizing in tableau all right so uh hopefully everyone has a tableau public downloaded and installed so let's get it up and running and let's uh let's dive in so uh let's open up tableau foreign so now we have Tableau open um uh let's load our first data set uh first file that we're going to look at is going to be the count of the our cancer diagnoses so since we're working with Excel files today let's just choose Excel file and uh we're going to first look at the count and when we'll get to the cost later so let's load this file click open and we see Tableau has pulled in our data for us and it's giving us a nice preview here for um uh broken up into our rows and columns that we expect everything looks good the column names and row names are as expected so we've successfully loaded our data uh let's pull it into our worksheet then so we can just hop over to our worksheet here um and to easier to keep track of what our questions and tie everything back together our first question we just as a reminder our first question we wanted to answer is what types of cancer are most prevalent so we can rename our sheet to that uh in next uh since we're going to be answering multiple questions we're going to have multiple worksheets open let's get ahead of it and rename our worksheet down here at the bottom to cancer prevalence perfect so now that we've named our sheets uh we could um jump in straight into the visualization but let's just take a look at how we can uh make sure our data verify our data looks looks good here in uh in our worksheet so we're going to start with a simple table and to do that I'm going to drag measured names to our columns section and then stage name to our rows and then measure values to text here and now we have essentially what we had in Excel we have our table again when we loaded our data source we saw a preview of this this is just verifying to us that we can in fact pull our data onto the worksheet everything's everything's working right you'll see this count of cancer diagnoses here this is it uh just getting some questions for the order is it possible to maximize Tableau public and increase the text size just for people's eye select yes I can maximize um let me try to go with a command plus will uh increase the size of the text it did not unfortunately um sorry bear with me one sec I'll see if I can uh increase the size of the text oh I think it has to be done in Mac OS system settings rather than Tableau itself oh yeah I can change my resolution actually um give me one second everybody I'll just update my screen resolution to um I think I'm on a high-res monitor that's probably creating some issues um just to jump in I think that uh this is a d just to I think that people can also use the zoom function at the top of their go to webinar uh preview screen and they can zoom in into your uh into your screen so that's also an option is that does that work better for anybody is that uh yeah it should be fine actually uh sorry let's continue okay perfect so hopefully I changed the resolution on the screen hopefully that uh that helps all right perfect um so as of now we've just basically loaded in exactly what we had in Excel but we wanted to do more than that um say uh we we might be interested in um uh seeing the total number of cases atop across all types of cancer not just broken up between breast colon Etc so what we can do for that is create a calculated field uh so I'm just going to go up to this carrot right here and create a calculated field and we're just going to call this uh total cases and this will be easy enough we just want to add up the sum get the sum of all of these together so we can drag them on foreign calculation is valid it is we can save our field and now we have this total cases field and we can see it's added to our table and for each stage we can see the total cases now so um pretty easy to do a quick calculation like that in Tableau again uh uh this is still something we could have done in Excel so but it's very easy to do here and we might want to pull this in later and take a look at this data um and it allows us to pull it into a visualization very easily so uh now that we have this table we've verified our data let's uh um I actually use the what where Tableau shines and creative visualization so we can hopefully see and better understand our data and draw easier conclusions from it so it's going to be much easier to tell the uh go back going back to our question what types of cancer are most prevalent once we set up a graph so let's clear our sheet we don't need this table and um start creating a bar graph so to do this let's take our measure names drag them to columns and then measure values over here to rows and you'll see Tableau has smartly created a bar graph just for us it's uh and this is getting towards what we're looking for again we have this this field here count of cancer diagnosis let's just drag that off don't need that and though this was cool that we could create this calculated field this for total cases this isn't necessarily what we're trying to answer we're trying to look at prevalence so we're just really comparing the individual types so we can also drag that off all right so now we're getting there um one thing you always want to do in your Tableau graphs is make sure it's very clear what all the scales and such are so right here we just have a value on our uh our axis let's rename this to something more clear let's call this total cases perfect um so now we have a graph you can you can see immediately that breast cancer is the highest but I think we can improve this a little bit more um so we if you've noticed we haven't added stage or Incorporated a stage in any way to this so let's uh let's play around with some ways of doing that what we can do is uh add divide up These Bars by stage so let's drag stage name to color and here we go now we have separate sections for the stage name and in the totals you you have it broken up by stage which is nice but not perfectly clear to read so let's also um without our uh our uh over here we have our um Legend over here but it's it's not perfectly clear Let's uh increase The Clarity by dragging stage name to label cool now that we have these uh labeled uh we can improve it even further say we want these to stay stage one and two stage zero stage three stage four just so it's like immediately clear what these um uh stand for we can go into our label we can edit the text and let's just call it stage and that space there cool so now um we have our graph set up it's colorful but it looks a little busy with all these labels we can uh and we have all of this unused space over here so let's spread out our graph let's make it take up the entire width up here we can select entire View and there we go um and the last thing we're going to do to improve this is sort uh these These Bars to uh so we can better see and better visualize the the prevalence of cancer so what we can do here is just this button up here sort measure names ascending by measure values and there we go so it's very clear to us here that we have the highest uh prevalence of cancer is breast cancer and then prostate is second lung colon and rectum uh so there we go we have uh created a pretty interesting visualization uh we can see um the stage breakdowns but most importantly we've answered our question what types of cancer are most prevalent uh very clearly you can just look at this um and and no uh know the answer but uh moving on towards our second question we're also interested in these individual stages and that the prevalence of these stages and knowing what state the diagnoses happen so you can kind of see on here that uh breast cancer for example stage one and two is is the vast majority um but looking at just at this graph here one cancer for example you have to go onto the graph and hover to try and differentiate between one and two and three so if we really want to dive into like uh this question what stage and most diagnoses happen and visualize it we can improve this graph so to do that let's uh let's do it in another worksheet because we want to keep this one it answers one of our key questions let's go down here and duplicate our sheet all right so now we have a duplicated sheet um we're going to be answering the question I'll add it in here what stage do most diagnoses happen perfect and then let's get ahead of renaming our tab rename here to cancer stage I forgot an S there just fix the spelling cool uh so our goal here is to better visualize uh What stages um what the actual stage to do does the diagnosis happen third time's the time uh perfect so to do that let's uh remove our stage name from our marks and add it to columns so we want to better visualize the stage name um and the easiest way to do that will be to just add it to columns here so now we have uh get broken up on our columns as well so we have at the top uh each type of cancer and then the stages at which those get diagnosed are are clear um so to get back to our question what stage do most diagnoses happen uh we can see stage one and two uh is generally uh the the first uh the most diagnosed stage while uh zero um for breast cancer is very common um but uh the standout here if we want to take if we'll notice easily notice that stage four for lung cancer is by far the most common stage it's diagnosed where for other cancers that's uh just far from the norm um and again this is uh just by creating a simple visualization we've been able to uh see some interesting and interesting Trend in the data or and an interesting deviation from the trend in the case of lung cancer so uh thinking about why this might be um prostate cancer colon cancer all of these cancers breast cancer are regularly screened for in the United States this is U.S data there are routine screenings and for example breast cancer is one of the most screenforward Cancers and seems like it's very often caught in stage zero whereas for lung cancer unfortunately it seems uh people these uh instances of lung cancer are not getting caught early on they're getting caught later on as the disease progresses so we've successfully answered uh both of our questions here um what stage do most diagnoses happen and the uh prevalence um what types of cancer are most prevalent but uh it would be useful to bring this all together on a Central Central place to so we can keep track of our questions and and the answers against them the best way for us to do that is going to be creating a dashboard um so let's go ahead and do that uh and again here on this dashboard you'll see this white area is not spreading across our whole screen we can fix that um rather than having this uh predefined range let's change it to automatic now we can use our whole screen for to drop our sheets and for our central place for questions let's drag both of our sheets on to our dashboard cool so now we have created answered our first two questions created a dashboard so we can go back and easily see our answers to both um and uh yeah there we go so before we move on to our um answering our third question and what we'll do next is pollen cost data uh pulling our cost data set uh I wanted to pause for a second and see if there are any uh immediate questions any issues um just give a few minutes before before we move on so we just got some questions um about the the data source so uh some people didn't see uh where the some people arrived late didn't see where they dated come from so can you just go back to the uh the link to the data sets so I can download the files and how he reported them to tableau sure uh so the data is available in um I think we can post the link again in the chat um it's going to be in this uh Google Drive I've set up for us uh so there's these two Excel files uh you should be able to download each of those to your local machine and then load them into tableau so once you have them downloaded I'll give everyone a second just as you're following along so once we have these downloaded I will switch back over to tableau and you can once you have it open you'll be in this data sources screen and uh you'll see this window here and you'll just be able to click Microsoft Excel and then choose the file uh you you were loading and we're working first with uh breast cancer cost excuse me we're working for the cancer diagnoses California 2004 to 2012. that's what we've that's what we've loaded so far we'll get into loading the second one in a second so um we'll be you'll be able to follow along for that as well but uh yeah this is the first one we've pulled in all right um we've got two more audience questions um if anyone else has any questions please do please do add them to the chat now um so there's one uh just a reminder on what the question mark stage means question mark said yes uh so you'll see that on here um it's interesting uh in our data there were uh basically uh looking at let's look at let's look at this so this is lung cancer there were 20 000 question mark cases right here so essentially what these are defined as is someone was diagnosed with lung cancer but the stage couldn't be precisely determined and they were not put into one of these categories so uh it's kind of uh you know it's a question mark we're not sure exactly what category to be best fit in um so you can draw some interesting conclusions from the the size of this as well uh for example looks like uh the ratio of question mark is much lower for colon cancer but higher for things like lung cancer so maybe it's harder to actually determine a stage for these types of cancer things like that all right and there's a couple of people asking uh could you show again how you renamed the uh in in the first graph how you rename the stage titles to include the word stage yep perfect of course yeah so we just wanted to include stage here just to make it a little clearer um maybe for someone who doesn't know the data as well as us or who are building this so uh what we did to get the stage titles onto These Bars was just drag stage name here to the label but that only gave us the question mark and it didn't give us the the first part so what we were able to do is again click on this label and then uh modify the text to whatever we want here so we could uh just go in here and I just typed the word stage here um so we can just even though I put tests here and click apply it adds it in um and you'll see this stage name in carrots is just the uh the um pulling in the actual value from from the field and this is our custom text we'll apply again there we go and then if we really want it to spice it up we could change things such as the font color things like that but uh we'll leave it as is for now nice okay um and so uh before we move on there's a couple questions around um color so um in the first graph you've got different colors for stages but then there's a sort of it's just blue in the second sheet so just why is that and can you replicate the color scheme from one to the other yeah uh so for this one um we set up this color scheme just to really uh make it make it pop like the definite the difference between the stages in these bars um here we're not really breaking it up as much but uh it it the blue does look a bit boring um so we could uh replicate a bit of color here and honestly let's let's do it and so we can just follow through and see what it could look like in this uh in this different form of graph um so what I would be able to do here is uh we're um so we're breaking it up in columns by uh the type of cancer and then the stage is here um so one thing we can do perhaps here is we want to splitting it up by the name so we can drag measure names to color and now we have um each bar we have all these like split up into different colors looks a little bit better it's easier to tell the difference between um where like flung to prostate for example uh and it does like spice up the graph a little makes it make it look a little bit nicer and uh gets rid of that monotonous blue so I think um as far as uh setting up different colors on this this looks pretty good um as always you can pretty much drag anything to color on here see what it looks like but uh I don't know what else we we do to get better information out here if we dragged the stage name from the tables into color that would give us the same color scheme as in the other plot uh let's let's try it yeah yeah so this has uh now given it um the same as the other plot let's just go back to the dashboard we can visualize um yeah and so we have stage two is red here stage three so this uh this actually does um provide like a a really cool illustration of how we switched up this graph um we've essentially rotated these bar graphs to be out uh out just like this um so yeah honestly let's keep it just like this I like the way this one looks all right great we'll have time for more audience questions at the end so perhaps we can move on to the second data set all right cool man thank you for the questions everybody all right so now it's time to move on to our second data set and our final two questions so um let's create our worksheet and then load our data source I'm going to create the sheet first and then we'll go back and load our source uh so the next question we're trying to answer is what is the cost of breasts cancer treatment depending on stage there so there we have our question and let's just rename our sheet to rest cancer costs so now we're going to load our breast cancer cost data set data source we'll go back to our data source tab over here um first before we do that I've noticed that this is not the the cleanest name for this one let's just rename you just double click in here I'm going to rename this to just the file name just so it's a little cleaner there we go that shouldn't affect anything in our sheets it's just renaming the the data set data source so it's a little clearer what we'll do to add our next data source is this um Source drop down here we'll click here click on new data source and again we're going to be loading from Microsoft Excel so uh let's load from Excel the cost set this one here breast cancer cost by stage click open and wait for Tableau to pull our data in there we go uh again this looks like sorry our Excel we have broken up by stage you'll notice there is no question mark set on this cost one um that just was not available in the data for this so unfortunately we don't have a question mark here but we still do have it broken up by zero one two three and four um perfect so now we have uh this data set loaded let's go back to our worksheet and we'll see at the top here we have our two different data sets to choose from we can go from this is the one we were working with already which was the number of cases and now we have our cost and we have all of our values pulled in here so uh um our goal here is to answer what is the cost of breast cancer treatment depending on stage um foreign and what we want to do is see separate sections for 0 to 12 months to 12 to 24 months so what I think we can do is uh we're going to build a graph very similar to this one here except we want to have two columns or 0 to 12 months and then one for 12 to 24 months and since we're building the same one let's I'm going to turn you loose and let you let you take a stab at it so we'll I'll give you five minutes to try and just create a quick version of this and then we'll go through and build it together uh so go ahead and take a stab at it and then we'll come back together there's a timer just so everyone's clearing this can you just repeat what you expect for people to draw in the plot yeah so uh while that's going um our goal is to build a bar graph with two um uh to answer this question here what is the cost of breast cancer treatment on depending on stage and what we want to see is uh 0 to 12 months and 12 to 24.4 months in columns and then the cost by stage on the side in a similar way to what we have um what we have here where we have total cases but instead of total cases we're going to be tracking cost and we're going to track stage just just like this uh 12 to 24 10 0 to 12 months and then 12 to 24 months would be how we hope it to come out all right wonderful foreign just while the time is taken down there's a question about uh can you show the process again of adding another data source uh yeah of course I can do that real quick so uh step one we'll be going to this data source tab and once we're in here uh you can see you can select between your data sources you've loaded up here and view them uh but for a new data source you're going to click right here click new data source here this plus icon and then choose your file and in this case we're using our cost data set so this second one here this is the second one we loaded and I think we'll time for one more question while the other time is taken um was how do you decide which of the um illnesses were showing in those first graphs like can you filter out so only some of them are shown yeah of course uh so we would be able to do a filter we were just uh including all of them uh but say we wanted to create a filter um we don't want to consider colon cancer for whatever reason easiest way to do that is down here you can click on it and just click exclude and now we don't have there what you see is this in this filters tab it's added a filter here you can click on here edit filter and and view that we've pulled out colon here so let's just add it back here and this method and there you go so uh if we really wanted to get fancy we could do some Dynamic filtering on the side here or you can right click and uh hit show filter and then we'll be able to just select dynamically on here and you can even pull this into the dashboard if you wanted um but we're uh We're Not Gonna do that right now all right uh how are we doing with that time are we about ready to move on made yep uh so hopefully everyone uh made some good progress on this let's just go over it real quickly how I would make it uh so we're going to drag measure names it to the columns and then we also wanted uh oops I'm on my wrong data set that's the key it's getting the right data set so we're going to drag measure names to columns and then disease stage as well as the columns because we wanted that uh by in the columns and then let's pull in measure values to rows there we go um this is what we're going for uh and let's remove this count column we don't need that um so so when you use measure values does that do is that all the numeric columns is that putting in yeah correct that is correct uh when I'm referring to measure values those would be these uh continuous or these measures here um use uh these values here um and they're all the number values so that's just an easy way of uh pulling them to this section here you can take it out like that and only have one um and then uh you can add it back just add it back like this there we go uh so while I I'm going to rename a few of these access and stuff just to make our graph look good but I had a question for everyone uh I'm noticing an interesting Trend here um where the cost is increasing for the first year by stage but for stage four here uh it this looks like an exception I'm wondering if anyone has any uh theories on why that could be or what can we uh potentially draw from this data I'm just calling this total cost here and there we go do you want to talk us through what your theories are for why costs go down that's the second year with stage four yeah um so this is a bit of a an unfortunate Trend we we can pull from here um and uh is is that cost in stage four is really decreasing and not falling a trend and potentially that can be um If you're diagnosed in stage four your prognosis is really not as good so uh it's unfortunate but um really uh like illustrates how important it is to improve methods for screening to catch cancer early and to not uh like hopefully eliminate these type of diagnoses where you're actually being diagnosed in stage four um and then last I'm going to move this to entire view just so it's a little bit uh it looks a little better um and then we'll we'll add this to our dashboard as well later once we've answered our final question so let's uh let's move on to our last question Let's uh create a new sheet and uh we're a question here what was cost by a stage I'm going to rename the sheets just to tal cost for California so what we want to do is combine our two data sets and then at the beginning we noticed they both were broken up by stage so we want to take advantage of that so we can do a calculation of total cost so um let's start out by just creating a table um so let's go to first go to our cost data source so this one here and let's drag the Z state two columns uh now we want to pull in our our account sheet and let's uh one thing we need to keep in mind our cost is only for breast cancer so we can really only look at breast cancer here um if we're trying to calculate total cost so let's drag breast cancer to Rose and we'll see this pop up here in order to use fields we need in order to use fields from both different data sets when a relationship needs to be created with cancer breast cancer costs by stage so we're just going to follow the instructions on here select data edit blend relationships to open the blend relationships dialog box so we need to link our data together in the best the way to do that described in the pop-up we'll go here data edit blend relationships and we'll see our data sources here I'll just stay in the one here and we need to click custom then add and automatically there will be a mapping from disease stage to stage name you'll notice the names are slightly different but Tableau still will match them together let's click ok and immediately we see now our graph is properly representing has changed to represent that the data is linked we'll see our link icon right here cool so our goal is to answer what was the cost in California um but uh let's we want to look at it and let's pull it together in a table first to get a visualization so uh let's click show me and then table this isn't quite what we want um so uh let's go to cost by stage we're going to take our measure names and move it to rows and we don't want um disease stage and rows anymore let's pull that out let's pull that on one second um cancer diagnosis let's go to cancer diagnosis and drag stage name to columns and now we can remove disease stage there we go and then we're going to go back to our cost by stage uh drag measure values to text here and there we go we have a table representing all this combined let's drag out this count so this is just to help visualize what exactly we've done to get this all together here's the the cost um and the rates but this isn't exactly useful to us um so we're going to create a calculated field to get the total cost uh let's go to the breast cancer cost data set which we're already in we're going to create a calculated field here and then we're going to combine two of our data sources in the calculated field let's call this uh total cost for CA let's drag number of breast cancer cases in here you see it pulls it in and it's indicating the other data set and we're trying to calculate total cost so we're going to multiply this so this is the number um of cases and we need to multiply by our cost and let's just do it for 0 to 12 months here and we need to take the sum of this all right so this is valid We'll add that in so now we have a calculated field for total cost um let's visualize it uh and the best way to do it we have stage name on columns uh let's drag our total cost field two rows remove this and we have some labels in here we don't uh don't want let's take the measure values out and now we have on our left here we have the total cost um for each individual stage you see that uh this isn't exactly see here with 13 billion for stage one and two and three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of case aren't most prevalent and what stage most diagnoses happen foreign hopefully they they do a decent job of answering our questions obviously we'd go in and color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much lighter that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how we go about linking the two data sets again that was that was kind of uh the Crux to those last uh plots yep and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examiner data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go in the color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbye um for each individual stage you see that uh this isn't exactly see here with 13 billion um for stage one and two and uh three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's uh easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of Kings aren't most prevalent and what stage most diagnoses happen foreign that's all the uh visualizations I had prepared um hopefully they they do a decent job of answering our questions obviously we'd go in and uh color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much later that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how you got linking the two data sets again that was that was kind of uh the Crux to those last uh plots yeah and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examinated data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go into color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbyeforeign I'm excited to walk you through some Tableau visualizations today uh as Richie said I'm a Salesforce developer here at datacamp um I have been here a couple years and uh I uh for those who don't know Salesforce is one of the um you know Key Systems that tracks our business to business data so moving to Tableau and working on these visualizations is uh definitely close to uh some of the stuff I work with um so I believe we're going to uh shut off uh cameras uh just to save bandwidth but uh nice to meet you all and uh let's let's dive into it so um so for our agenda today um uh we're going to be looking at a couple of data sets on uh cancer cancer cases and when people are diagnosed what stage they are diagnosed at and the cost for what happens by stage and the idea behind this is to maybe determine what sort of public policy um what c-trends advise towards policy that could you know help prevent um help lower costs uh help uh know when and uh how to screen for uh for Cancers and just improve uh uh the the uh the Whole Health Care System behind cancer treatment and diagnosis um so that's the the overall arching uh motivation behind this but we'll get in and Define a few uh key questions that we can uh use Tableau to help us answer um so first of all I'll share a little bit more info on the data we'll Define these key questions and then we'll jump into Tableau um hopefully everyone's able to download the data in Tableau a little bit visualize build our visualizations and move on to our question and answer session so first let's take a let's let's take a look at our data we're gonna be working with um uh and before we look at our actual sheets I wanted to cover a few key definitions that are really essential for the data we're looking at so both of our our files you'll see have cancer broken up by stage different stages so the uh when cancer is diagnosed a stage is determined between 0 and 4 and 4 being the worst zero being uh worst as in most uh most progressed disease and zero being the the least progression so uh zero here is defined as abnormal cells are present but have not spread in nearby tissue and in some cases may be a precursor to cancer but not an actual cancer now with stage one two and three share a definition uh cancer is present um the severity is then graded between one two and three based off of uh factors such as the size of tumor and the spread into nearby tissue and finally stage four uh is where cancer spread to distant parts of the body so you'll see there is uh one um some stage question mark in our data and these are just diagnoses with unknown stage in our data so we may not be able to gather a ton of info from those but uh it does happen that um the stage is not known or determined for for some of these diagnoses All Right Moving On uh we will look to answer some key questions about these this data in Tableau um and I've written these out here so first uh first we can try and figure out what types of cancer are most prevalent then we can look at what stage do most diagnoses happen and then we'll pull in um our cost data and try and determine what is the cost of breast cancer treatment depending on stage and uh finally we'll try and pull it all together combine our uh um are uh cancer the the count of cancer diagnoses in California with the cost per uh per treatment and trying to answer this last question what is the was the total cost for breast cancer treatment in California from 2004 to 2012. um and you know I'll cover why we're working on California for those dates so moving on um let's just take a look at our data in Excel I have it open up here we have two sources here first being uh these accounts uh by stage so you see we have uh five different types of cancer breast colon rectum lung and prostate and then it's broken up by stage so we have 8200 uh instances of colon cancer diagnosed at stage zero and one one thing I want to note about this particular data set is that is only for California from the years 2004 to 2012. so that's why uh we're getting into that uh that question where we're trying to we're going at the very end determine the cost for all of these by stage for California and then moving over to our Second Source we have uh again broken up by stage but here we're looking at the cost uh and this is just for breast cancer I'd like to highlight we only have this data for breast cancer so this would be breast cancer stage zero uh per uh case the cost is going to be about sixty thousand dollars for zero to twelve months around eighteen thousand dollars for 12 to 24 months so these are our two data sets we're going to be pulling in um and visualizing in tableau all right so uh hopefully everyone has a tableau public downloaded and installed so let's get it up and running and let's uh let's dive in so uh let's open up tableau foreign so now we have Tableau open um uh let's load our first data set uh first file that we're going to look at is going to be the count of the our cancer diagnoses so since we're working with Excel files today let's just choose Excel file and uh we're going to first look at the count and when we'll get to the cost later so let's load this file click open and we see Tableau has pulled in our data for us and it's giving us a nice preview here for um uh broken up into our rows and columns that we expect everything looks good the column names and row names are as expected so we've successfully loaded our data uh let's pull it into our worksheet then so we can just hop over to our worksheet here um and to easier to keep track of what our questions and tie everything back together our first question we just as a reminder our first question we wanted to answer is what types of cancer are most prevalent so we can rename our sheet to that uh in next uh since we're going to be answering multiple questions we're going to have multiple worksheets open let's get ahead of it and rename our worksheet down here at the bottom to cancer prevalence perfect so now that we've named our sheets uh we could um jump in straight into the visualization but let's just take a look at how we can uh make sure our data verify our data looks looks good here in uh in our worksheet so we're going to start with a simple table and to do that I'm going to drag measured names to our columns section and then stage name to our rows and then measure values to text here and now we have essentially what we had in Excel we have our table again when we loaded our data source we saw a preview of this this is just verifying to us that we can in fact pull our data onto the worksheet everything's everything's working right you'll see this count of cancer diagnoses here this is it uh just getting some questions for the order is it possible to maximize Tableau public and increase the text size just for people's eye select yes I can maximize um let me try to go with a command plus will uh increase the size of the text it did not unfortunately um sorry bear with me one sec I'll see if I can uh increase the size of the text oh I think it has to be done in Mac OS system settings rather than Tableau itself oh yeah I can change my resolution actually um give me one second everybody I'll just update my screen resolution to um I think I'm on a high-res monitor that's probably creating some issues um just to jump in I think that uh this is a d just to I think that people can also use the zoom function at the top of their go to webinar uh preview screen and they can zoom in into your uh into your screen so that's also an option is that does that work better for anybody is that uh yeah it should be fine actually uh sorry let's continue okay perfect so hopefully I changed the resolution on the screen hopefully that uh that helps all right perfect um so as of now we've just basically loaded in exactly what we had in Excel but we wanted to do more than that um say uh we we might be interested in um uh seeing the total number of cases atop across all types of cancer not just broken up between breast colon Etc so what we can do for that is create a calculated field uh so I'm just going to go up to this carrot right here and create a calculated field and we're just going to call this uh total cases and this will be easy enough we just want to add up the sum get the sum of all of these together so we can drag them on foreign calculation is valid it is we can save our field and now we have this total cases field and we can see it's added to our table and for each stage we can see the total cases now so um pretty easy to do a quick calculation like that in Tableau again uh uh this is still something we could have done in Excel so but it's very easy to do here and we might want to pull this in later and take a look at this data um and it allows us to pull it into a visualization very easily so uh now that we have this table we've verified our data let's uh um I actually use the what where Tableau shines and creative visualization so we can hopefully see and better understand our data and draw easier conclusions from it so it's going to be much easier to tell the uh go back going back to our question what types of cancer are most prevalent once we set up a graph so let's clear our sheet we don't need this table and um start creating a bar graph so to do this let's take our measure names drag them to columns and then measure values over here to rows and you'll see Tableau has smartly created a bar graph just for us it's uh and this is getting towards what we're looking for again we have this this field here count of cancer diagnosis let's just drag that off don't need that and though this was cool that we could create this calculated field this for total cases this isn't necessarily what we're trying to answer we're trying to look at prevalence so we're just really comparing the individual types so we can also drag that off all right so now we're getting there um one thing you always want to do in your Tableau graphs is make sure it's very clear what all the scales and such are so right here we just have a value on our uh our axis let's rename this to something more clear let's call this total cases perfect um so now we have a graph you can you can see immediately that breast cancer is the highest but I think we can improve this a little bit more um so we if you've noticed we haven't added stage or Incorporated a stage in any way to this so let's uh let's play around with some ways of doing that what we can do is uh add divide up These Bars by stage so let's drag stage name to color and here we go now we have separate sections for the stage name and in the totals you you have it broken up by stage which is nice but not perfectly clear to read so let's also um without our uh our uh over here we have our um Legend over here but it's it's not perfectly clear Let's uh increase The Clarity by dragging stage name to label cool now that we have these uh labeled uh we can improve it even further say we want these to stay stage one and two stage zero stage three stage four just so it's like immediately clear what these um uh stand for we can go into our label we can edit the text and let's just call it stage and that space there cool so now um we have our graph set up it's colorful but it looks a little busy with all these labels we can uh and we have all of this unused space over here so let's spread out our graph let's make it take up the entire width up here we can select entire View and there we go um and the last thing we're going to do to improve this is sort uh these These Bars to uh so we can better see and better visualize the the prevalence of cancer so what we can do here is just this button up here sort measure names ascending by measure values and there we go so it's very clear to us here that we have the highest uh prevalence of cancer is breast cancer and then prostate is second lung colon and rectum uh so there we go we have uh created a pretty interesting visualization uh we can see um the stage breakdowns but most importantly we've answered our question what types of cancer are most prevalent uh very clearly you can just look at this um and and no uh know the answer but uh moving on towards our second question we're also interested in these individual stages and that the prevalence of these stages and knowing what state the diagnoses happen so you can kind of see on here that uh breast cancer for example stage one and two is is the vast majority um but looking at just at this graph here one cancer for example you have to go onto the graph and hover to try and differentiate between one and two and three so if we really want to dive into like uh this question what stage and most diagnoses happen and visualize it we can improve this graph so to do that let's uh let's do it in another worksheet because we want to keep this one it answers one of our key questions let's go down here and duplicate our sheet all right so now we have a duplicated sheet um we're going to be answering the question I'll add it in here what stage do most diagnoses happen perfect and then let's get ahead of renaming our tab rename here to cancer stage I forgot an S there just fix the spelling cool uh so our goal here is to better visualize uh What stages um what the actual stage to do does the diagnosis happen third time's the time uh perfect so to do that let's uh remove our stage name from our marks and add it to columns so we want to better visualize the stage name um and the easiest way to do that will be to just add it to columns here so now we have uh get broken up on our columns as well so we have at the top uh each type of cancer and then the stages at which those get diagnosed are are clear um so to get back to our question what stage do most diagnoses happen uh we can see stage one and two uh is generally uh the the first uh the most diagnosed stage while uh zero um for breast cancer is very common um but uh the standout here if we want to take if we'll notice easily notice that stage four for lung cancer is by far the most common stage it's diagnosed where for other cancers that's uh just far from the norm um and again this is uh just by creating a simple visualization we've been able to uh see some interesting and interesting Trend in the data or and an interesting deviation from the trend in the case of lung cancer so uh thinking about why this might be um prostate cancer colon cancer all of these cancers breast cancer are regularly screened for in the United States this is U.S data there are routine screenings and for example breast cancer is one of the most screenforward Cancers and seems like it's very often caught in stage zero whereas for lung cancer unfortunately it seems uh people these uh instances of lung cancer are not getting caught early on they're getting caught later on as the disease progresses so we've successfully answered uh both of our questions here um what stage do most diagnoses happen and the uh prevalence um what types of cancer are most prevalent but uh it would be useful to bring this all together on a Central Central place to so we can keep track of our questions and and the answers against them the best way for us to do that is going to be creating a dashboard um so let's go ahead and do that uh and again here on this dashboard you'll see this white area is not spreading across our whole screen we can fix that um rather than having this uh predefined range let's change it to automatic now we can use our whole screen for to drop our sheets and for our central place for questions let's drag both of our sheets on to our dashboard cool so now we have created answered our first two questions created a dashboard so we can go back and easily see our answers to both um and uh yeah there we go so before we move on to our um answering our third question and what we'll do next is pollen cost data uh pulling our cost data set uh I wanted to pause for a second and see if there are any uh immediate questions any issues um just give a few minutes before before we move on so we just got some questions um about the the data source so uh some people didn't see uh where the some people arrived late didn't see where they dated come from so can you just go back to the uh the link to the data sets so I can download the files and how he reported them to tableau sure uh so the data is available in um I think we can post the link again in the chat um it's going to be in this uh Google Drive I've set up for us uh so there's these two Excel files uh you should be able to download each of those to your local machine and then load them into tableau so once you have them downloaded I'll give everyone a second just as you're following along so once we have these downloaded I will switch back over to tableau and you can once you have it open you'll be in this data sources screen and uh you'll see this window here and you'll just be able to click Microsoft Excel and then choose the file uh you you were loading and we're working first with uh breast cancer cost excuse me we're working for the cancer diagnoses California 2004 to 2012. that's what we've that's what we've loaded so far we'll get into loading the second one in a second so um we'll be you'll be able to follow along for that as well but uh yeah this is the first one we've pulled in all right um we've got two more audience questions um if anyone else has any questions please do please do add them to the chat now um so there's one uh just a reminder on what the question mark stage means question mark said yes uh so you'll see that on here um it's interesting uh in our data there were uh basically uh looking at let's look at let's look at this so this is lung cancer there were 20 000 question mark cases right here so essentially what these are defined as is someone was diagnosed with lung cancer but the stage couldn't be precisely determined and they were not put into one of these categories so uh it's kind of uh you know it's a question mark we're not sure exactly what category to be best fit in um so you can draw some interesting conclusions from the the size of this as well uh for example looks like uh the ratio of question mark is much lower for colon cancer but higher for things like lung cancer so maybe it's harder to actually determine a stage for these types of cancer things like that all right and there's a couple of people asking uh could you show again how you renamed the uh in in the first graph how you rename the stage titles to include the word stage yep perfect of course yeah so we just wanted to include stage here just to make it a little clearer um maybe for someone who doesn't know the data as well as us or who are building this so uh what we did to get the stage titles onto These Bars was just drag stage name here to the label but that only gave us the question mark and it didn't give us the the first part so what we were able to do is again click on this label and then uh modify the text to whatever we want here so we could uh just go in here and I just typed the word stage here um so we can just even though I put tests here and click apply it adds it in um and you'll see this stage name in carrots is just the uh the um pulling in the actual value from from the field and this is our custom text we'll apply again there we go and then if we really want it to spice it up we could change things such as the font color things like that but uh we'll leave it as is for now nice okay um and so uh before we move on there's a couple questions around um color so um in the first graph you've got different colors for stages but then there's a sort of it's just blue in the second sheet so just why is that and can you replicate the color scheme from one to the other yeah uh so for this one um we set up this color scheme just to really uh make it make it pop like the definite the difference between the stages in these bars um here we're not really breaking it up as much but uh it it the blue does look a bit boring um so we could uh replicate a bit of color here and honestly let's let's do it and so we can just follow through and see what it could look like in this uh in this different form of graph um so what I would be able to do here is uh we're um so we're breaking it up in columns by uh the type of cancer and then the stage is here um so one thing we can do perhaps here is we want to splitting it up by the name so we can drag measure names to color and now we have um each bar we have all these like split up into different colors looks a little bit better it's easier to tell the difference between um where like flung to prostate for example uh and it does like spice up the graph a little makes it make it look a little bit nicer and uh gets rid of that monotonous blue so I think um as far as uh setting up different colors on this this looks pretty good um as always you can pretty much drag anything to color on here see what it looks like but uh I don't know what else we we do to get better information out here if we dragged the stage name from the tables into color that would give us the same color scheme as in the other plot uh let's let's try it yeah yeah so this has uh now given it um the same as the other plot let's just go back to the dashboard we can visualize um yeah and so we have stage two is red here stage three so this uh this actually does um provide like a a really cool illustration of how we switched up this graph um we've essentially rotated these bar graphs to be out uh out just like this um so yeah honestly let's keep it just like this I like the way this one looks all right great we'll have time for more audience questions at the end so perhaps we can move on to the second data set all right cool man thank you for the questions everybody all right so now it's time to move on to our second data set and our final two questions so um let's create our worksheet and then load our data source I'm going to create the sheet first and then we'll go back and load our source uh so the next question we're trying to answer is what is the cost of breasts cancer treatment depending on stage there so there we have our question and let's just rename our sheet to rest cancer costs so now we're going to load our breast cancer cost data set data source we'll go back to our data source tab over here um first before we do that I've noticed that this is not the the cleanest name for this one let's just rename you just double click in here I'm going to rename this to just the file name just so it's a little cleaner there we go that shouldn't affect anything in our sheets it's just renaming the the data set data source so it's a little clearer what we'll do to add our next data source is this um Source drop down here we'll click here click on new data source and again we're going to be loading from Microsoft Excel so uh let's load from Excel the cost set this one here breast cancer cost by stage click open and wait for Tableau to pull our data in there we go uh again this looks like sorry our Excel we have broken up by stage you'll notice there is no question mark set on this cost one um that just was not available in the data for this so unfortunately we don't have a question mark here but we still do have it broken up by zero one two three and four um perfect so now we have uh this data set loaded let's go back to our worksheet and we'll see at the top here we have our two different data sets to choose from we can go from this is the one we were working with already which was the number of cases and now we have our cost and we have all of our values pulled in here so uh um our goal here is to answer what is the cost of breast cancer treatment depending on stage um foreign and what we want to do is see separate sections for 0 to 12 months to 12 to 24 months so what I think we can do is uh we're going to build a graph very similar to this one here except we want to have two columns or 0 to 12 months and then one for 12 to 24 months and since we're building the same one let's I'm going to turn you loose and let you let you take a stab at it so we'll I'll give you five minutes to try and just create a quick version of this and then we'll go through and build it together uh so go ahead and take a stab at it and then we'll come back together there's a timer just so everyone's clearing this can you just repeat what you expect for people to draw in the plot yeah so uh while that's going um our goal is to build a bar graph with two um uh to answer this question here what is the cost of breast cancer treatment on depending on stage and what we want to see is uh 0 to 12 months and 12 to 24.4 months in columns and then the cost by stage on the side in a similar way to what we have um what we have here where we have total cases but instead of total cases we're going to be tracking cost and we're going to track stage just just like this uh 12 to 24 10 0 to 12 months and then 12 to 24 months would be how we hope it to come out all right wonderful foreign just while the time is taken down there's a question about uh can you show the process again of adding another data source uh yeah of course I can do that real quick so uh step one we'll be going to this data source tab and once we're in here uh you can see you can select between your data sources you've loaded up here and view them uh but for a new data source you're going to click right here click new data source here this plus icon and then choose your file and in this case we're using our cost data set so this second one here this is the second one we loaded and I think we'll time for one more question while the other time is taken um was how do you decide which of the um illnesses were showing in those first graphs like can you filter out so only some of them are shown yeah of course uh so we would be able to do a filter we were just uh including all of them uh but say we wanted to create a filter um we don't want to consider colon cancer for whatever reason easiest way to do that is down here you can click on it and just click exclude and now we don't have there what you see is this in this filters tab it's added a filter here you can click on here edit filter and and view that we've pulled out colon here so let's just add it back here and this method and there you go so uh if we really wanted to get fancy we could do some Dynamic filtering on the side here or you can right click and uh hit show filter and then we'll be able to just select dynamically on here and you can even pull this into the dashboard if you wanted um but we're uh We're Not Gonna do that right now all right uh how are we doing with that time are we about ready to move on made yep uh so hopefully everyone uh made some good progress on this let's just go over it real quickly how I would make it uh so we're going to drag measure names it to the columns and then we also wanted uh oops I'm on my wrong data set that's the key it's getting the right data set so we're going to drag measure names to columns and then disease stage as well as the columns because we wanted that uh by in the columns and then let's pull in measure values to rows there we go um this is what we're going for uh and let's remove this count column we don't need that um so so when you use measure values does that do is that all the numeric columns is that putting in yeah correct that is correct uh when I'm referring to measure values those would be these uh continuous or these measures here um use uh these values here um and they're all the number values so that's just an easy way of uh pulling them to this section here you can take it out like that and only have one um and then uh you can add it back just add it back like this there we go uh so while I I'm going to rename a few of these access and stuff just to make our graph look good but I had a question for everyone uh I'm noticing an interesting Trend here um where the cost is increasing for the first year by stage but for stage four here uh it this looks like an exception I'm wondering if anyone has any uh theories on why that could be or what can we uh potentially draw from this data I'm just calling this total cost here and there we go do you want to talk us through what your theories are for why costs go down that's the second year with stage four yeah um so this is a bit of a an unfortunate Trend we we can pull from here um and uh is is that cost in stage four is really decreasing and not falling a trend and potentially that can be um If you're diagnosed in stage four your prognosis is really not as good so uh it's unfortunate but um really uh like illustrates how important it is to improve methods for screening to catch cancer early and to not uh like hopefully eliminate these type of diagnoses where you're actually being diagnosed in stage four um and then last I'm going to move this to entire view just so it's a little bit uh it looks a little better um and then we'll we'll add this to our dashboard as well later once we've answered our final question so let's uh let's move on to our last question Let's uh create a new sheet and uh we're a question here what was cost by a stage I'm going to rename the sheets just to tal cost for California so what we want to do is combine our two data sets and then at the beginning we noticed they both were broken up by stage so we want to take advantage of that so we can do a calculation of total cost so um let's start out by just creating a table um so let's go to first go to our cost data source so this one here and let's drag the Z state two columns uh now we want to pull in our our account sheet and let's uh one thing we need to keep in mind our cost is only for breast cancer so we can really only look at breast cancer here um if we're trying to calculate total cost so let's drag breast cancer to Rose and we'll see this pop up here in order to use fields we need in order to use fields from both different data sets when a relationship needs to be created with cancer breast cancer costs by stage so we're just going to follow the instructions on here select data edit blend relationships to open the blend relationships dialog box so we need to link our data together in the best the way to do that described in the pop-up we'll go here data edit blend relationships and we'll see our data sources here I'll just stay in the one here and we need to click custom then add and automatically there will be a mapping from disease stage to stage name you'll notice the names are slightly different but Tableau still will match them together let's click ok and immediately we see now our graph is properly representing has changed to represent that the data is linked we'll see our link icon right here cool so our goal is to answer what was the cost in California um but uh let's we want to look at it and let's pull it together in a table first to get a visualization so uh let's click show me and then table this isn't quite what we want um so uh let's go to cost by stage we're going to take our measure names and move it to rows and we don't want um disease stage and rows anymore let's pull that out let's pull that on one second um cancer diagnosis let's go to cancer diagnosis and drag stage name to columns and now we can remove disease stage there we go and then we're going to go back to our cost by stage uh drag measure values to text here and there we go we have a table representing all this combined let's drag out this count so this is just to help visualize what exactly we've done to get this all together here's the the cost um and the rates but this isn't exactly useful to us um so we're going to create a calculated field to get the total cost uh let's go to the breast cancer cost data set which we're already in we're going to create a calculated field here and then we're going to combine two of our data sources in the calculated field let's call this uh total cost for CA let's drag number of breast cancer cases in here you see it pulls it in and it's indicating the other data set and we're trying to calculate total cost so we're going to multiply this so this is the number um of cases and we need to multiply by our cost and let's just do it for 0 to 12 months here and we need to take the sum of this all right so this is valid We'll add that in so now we have a calculated field for total cost um let's visualize it uh and the best way to do it we have stage name on columns uh let's drag our total cost field two rows remove this and we have some labels in here we don't uh don't want let's take the measure values out and now we have on our left here we have the total cost um for each individual stage you see that uh this isn't exactly see here with 13 billion for stage one and two and three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of case aren't most prevalent and what stage most diagnoses happen foreign hopefully they they do a decent job of answering our questions obviously we'd go in and color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much lighter that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how we go about linking the two data sets again that was that was kind of uh the Crux to those last uh plots yep and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examiner data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go in the color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbye um for each individual stage you see that uh this isn't exactly see here with 13 billion um for stage one and two and uh three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's uh easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of Kings aren't most prevalent and what stage most diagnoses happen foreign that's all the uh visualizations I had prepared um hopefully they they do a decent job of answering our questions obviously we'd go in and uh color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much later that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how you got linking the two data sets again that was that was kind of uh the Crux to those last uh plots yeah and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examinated data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go into color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbyeforeign I'm excited to walk you through some Tableau visualizations today uh as Richie said I'm a Salesforce developer here at datacamp um I have been here a couple years and uh I uh for those who don't know Salesforce is one of the um you know Key Systems that tracks our business to business data so moving to Tableau and working on these visualizations is uh definitely close to uh some of the stuff I work with um so I believe we're going to uh shut off uh cameras uh just to save bandwidth but uh nice to meet you all and uh let's let's dive into it so um so for our agenda today um uh we're going to be looking at a couple of data sets on uh cancer cancer cases and when people are diagnosed what stage they are diagnosed at and the cost for what happens by stage and the idea behind this is to maybe determine what sort of public policy um what c-trends advise towards policy that could you know help prevent um help lower costs uh help uh know when and uh how to screen for uh for Cancers and just improve uh uh the the uh the Whole Health Care System behind cancer treatment and diagnosis um so that's the the overall arching uh motivation behind this but we'll get in and Define a few uh key questions that we can uh use Tableau to help us answer um so first of all I'll share a little bit more info on the data we'll Define these key questions and then we'll jump into Tableau um hopefully everyone's able to download the data in Tableau a little bit visualize build our visualizations and move on to our question and answer session so first let's take a let's let's take a look at our data we're gonna be working with um uh and before we look at our actual sheets I wanted to cover a few key definitions that are really essential for the data we're looking at so both of our our files you'll see have cancer broken up by stage different stages so the uh when cancer is diagnosed a stage is determined between 0 and 4 and 4 being the worst zero being uh worst as in most uh most progressed disease and zero being the the least progression so uh zero here is defined as abnormal cells are present but have not spread in nearby tissue and in some cases may be a precursor to cancer but not an actual cancer now with stage one two and three share a definition uh cancer is present um the severity is then graded between one two and three based off of uh factors such as the size of tumor and the spread into nearby tissue and finally stage four uh is where cancer spread to distant parts of the body so you'll see there is uh one um some stage question mark in our data and these are just diagnoses with unknown stage in our data so we may not be able to gather a ton of info from those but uh it does happen that um the stage is not known or determined for for some of these diagnoses All Right Moving On uh we will look to answer some key questions about these this data in Tableau um and I've written these out here so first uh first we can try and figure out what types of cancer are most prevalent then we can look at what stage do most diagnoses happen and then we'll pull in um our cost data and try and determine what is the cost of breast cancer treatment depending on stage and uh finally we'll try and pull it all together combine our uh um are uh cancer the the count of cancer diagnoses in California with the cost per uh per treatment and trying to answer this last question what is the was the total cost for breast cancer treatment in California from 2004 to 2012. um and you know I'll cover why we're working on California for those dates so moving on um let's just take a look at our data in Excel I have it open up here we have two sources here first being uh these accounts uh by stage so you see we have uh five different types of cancer breast colon rectum lung and prostate and then it's broken up by stage so we have 8200 uh instances of colon cancer diagnosed at stage zero and one one thing I want to note about this particular data set is that is only for California from the years 2004 to 2012. so that's why uh we're getting into that uh that question where we're trying to we're going at the very end determine the cost for all of these by stage for California and then moving over to our Second Source we have uh again broken up by stage but here we're looking at the cost uh and this is just for breast cancer I'd like to highlight we only have this data for breast cancer so this would be breast cancer stage zero uh per uh case the cost is going to be about sixty thousand dollars for zero to twelve months around eighteen thousand dollars for 12 to 24 months so these are our two data sets we're going to be pulling in um and visualizing in tableau all right so uh hopefully everyone has a tableau public downloaded and installed so let's get it up and running and let's uh let's dive in so uh let's open up tableau foreign so now we have Tableau open um uh let's load our first data set uh first file that we're going to look at is going to be the count of the our cancer diagnoses so since we're working with Excel files today let's just choose Excel file and uh we're going to first look at the count and when we'll get to the cost later so let's load this file click open and we see Tableau has pulled in our data for us and it's giving us a nice preview here for um uh broken up into our rows and columns that we expect everything looks good the column names and row names are as expected so we've successfully loaded our data uh let's pull it into our worksheet then so we can just hop over to our worksheet here um and to easier to keep track of what our questions and tie everything back together our first question we just as a reminder our first question we wanted to answer is what types of cancer are most prevalent so we can rename our sheet to that uh in next uh since we're going to be answering multiple questions we're going to have multiple worksheets open let's get ahead of it and rename our worksheet down here at the bottom to cancer prevalence perfect so now that we've named our sheets uh we could um jump in straight into the visualization but let's just take a look at how we can uh make sure our data verify our data looks looks good here in uh in our worksheet so we're going to start with a simple table and to do that I'm going to drag measured names to our columns section and then stage name to our rows and then measure values to text here and now we have essentially what we had in Excel we have our table again when we loaded our data source we saw a preview of this this is just verifying to us that we can in fact pull our data onto the worksheet everything's everything's working right you'll see this count of cancer diagnoses here this is it uh just getting some questions for the order is it possible to maximize Tableau public and increase the text size just for people's eye select yes I can maximize um let me try to go with a command plus will uh increase the size of the text it did not unfortunately um sorry bear with me one sec I'll see if I can uh increase the size of the text oh I think it has to be done in Mac OS system settings rather than Tableau itself oh yeah I can change my resolution actually um give me one second everybody I'll just update my screen resolution to um I think I'm on a high-res monitor that's probably creating some issues um just to jump in I think that uh this is a d just to I think that people can also use the zoom function at the top of their go to webinar uh preview screen and they can zoom in into your uh into your screen so that's also an option is that does that work better for anybody is that uh yeah it should be fine actually uh sorry let's continue okay perfect so hopefully I changed the resolution on the screen hopefully that uh that helps all right perfect um so as of now we've just basically loaded in exactly what we had in Excel but we wanted to do more than that um say uh we we might be interested in um uh seeing the total number of cases atop across all types of cancer not just broken up between breast colon Etc so what we can do for that is create a calculated field uh so I'm just going to go up to this carrot right here and create a calculated field and we're just going to call this uh total cases and this will be easy enough we just want to add up the sum get the sum of all of these together so we can drag them on foreign calculation is valid it is we can save our field and now we have this total cases field and we can see it's added to our table and for each stage we can see the total cases now so um pretty easy to do a quick calculation like that in Tableau again uh uh this is still something we could have done in Excel so but it's very easy to do here and we might want to pull this in later and take a look at this data um and it allows us to pull it into a visualization very easily so uh now that we have this table we've verified our data let's uh um I actually use the what where Tableau shines and creative visualization so we can hopefully see and better understand our data and draw easier conclusions from it so it's going to be much easier to tell the uh go back going back to our question what types of cancer are most prevalent once we set up a graph so let's clear our sheet we don't need this table and um start creating a bar graph so to do this let's take our measure names drag them to columns and then measure values over here to rows and you'll see Tableau has smartly created a bar graph just for us it's uh and this is getting towards what we're looking for again we have this this field here count of cancer diagnosis let's just drag that off don't need that and though this was cool that we could create this calculated field this for total cases this isn't necessarily what we're trying to answer we're trying to look at prevalence so we're just really comparing the individual types so we can also drag that off all right so now we're getting there um one thing you always want to do in your Tableau graphs is make sure it's very clear what all the scales and such are so right here we just have a value on our uh our axis let's rename this to something more clear let's call this total cases perfect um so now we have a graph you can you can see immediately that breast cancer is the highest but I think we can improve this a little bit more um so we if you've noticed we haven't added stage or Incorporated a stage in any way to this so let's uh let's play around with some ways of doing that what we can do is uh add divide up These Bars by stage so let's drag stage name to color and here we go now we have separate sections for the stage name and in the totals you you have it broken up by stage which is nice but not perfectly clear to read so let's also um without our uh our uh over here we have our um Legend over here but it's it's not perfectly clear Let's uh increase The Clarity by dragging stage name to label cool now that we have these uh labeled uh we can improve it even further say we want these to stay stage one and two stage zero stage three stage four just so it's like immediately clear what these um uh stand for we can go into our label we can edit the text and let's just call it stage and that space there cool so now um we have our graph set up it's colorful but it looks a little busy with all these labels we can uh and we have all of this unused space over here so let's spread out our graph let's make it take up the entire width up here we can select entire View and there we go um and the last thing we're going to do to improve this is sort uh these These Bars to uh so we can better see and better visualize the the prevalence of cancer so what we can do here is just this button up here sort measure names ascending by measure values and there we go so it's very clear to us here that we have the highest uh prevalence of cancer is breast cancer and then prostate is second lung colon and rectum uh so there we go we have uh created a pretty interesting visualization uh we can see um the stage breakdowns but most importantly we've answered our question what types of cancer are most prevalent uh very clearly you can just look at this um and and no uh know the answer but uh moving on towards our second question we're also interested in these individual stages and that the prevalence of these stages and knowing what state the diagnoses happen so you can kind of see on here that uh breast cancer for example stage one and two is is the vast majority um but looking at just at this graph here one cancer for example you have to go onto the graph and hover to try and differentiate between one and two and three so if we really want to dive into like uh this question what stage and most diagnoses happen and visualize it we can improve this graph so to do that let's uh let's do it in another worksheet because we want to keep this one it answers one of our key questions let's go down here and duplicate our sheet all right so now we have a duplicated sheet um we're going to be answering the question I'll add it in here what stage do most diagnoses happen perfect and then let's get ahead of renaming our tab rename here to cancer stage I forgot an S there just fix the spelling cool uh so our goal here is to better visualize uh What stages um what the actual stage to do does the diagnosis happen third time's the time uh perfect so to do that let's uh remove our stage name from our marks and add it to columns so we want to better visualize the stage name um and the easiest way to do that will be to just add it to columns here so now we have uh get broken up on our columns as well so we have at the top uh each type of cancer and then the stages at which those get diagnosed are are clear um so to get back to our question what stage do most diagnoses happen uh we can see stage one and two uh is generally uh the the first uh the most diagnosed stage while uh zero um for breast cancer is very common um but uh the standout here if we want to take if we'll notice easily notice that stage four for lung cancer is by far the most common stage it's diagnosed where for other cancers that's uh just far from the norm um and again this is uh just by creating a simple visualization we've been able to uh see some interesting and interesting Trend in the data or and an interesting deviation from the trend in the case of lung cancer so uh thinking about why this might be um prostate cancer colon cancer all of these cancers breast cancer are regularly screened for in the United States this is U.S data there are routine screenings and for example breast cancer is one of the most screenforward Cancers and seems like it's very often caught in stage zero whereas for lung cancer unfortunately it seems uh people these uh instances of lung cancer are not getting caught early on they're getting caught later on as the disease progresses so we've successfully answered uh both of our questions here um what stage do most diagnoses happen and the uh prevalence um what types of cancer are most prevalent but uh it would be useful to bring this all together on a Central Central place to so we can keep track of our questions and and the answers against them the best way for us to do that is going to be creating a dashboard um so let's go ahead and do that uh and again here on this dashboard you'll see this white area is not spreading across our whole screen we can fix that um rather than having this uh predefined range let's change it to automatic now we can use our whole screen for to drop our sheets and for our central place for questions let's drag both of our sheets on to our dashboard cool so now we have created answered our first two questions created a dashboard so we can go back and easily see our answers to both um and uh yeah there we go so before we move on to our um answering our third question and what we'll do next is pollen cost data uh pulling our cost data set uh I wanted to pause for a second and see if there are any uh immediate questions any issues um just give a few minutes before before we move on so we just got some questions um about the the data source so uh some people didn't see uh where the some people arrived late didn't see where they dated come from so can you just go back to the uh the link to the data sets so I can download the files and how he reported them to tableau sure uh so the data is available in um I think we can post the link again in the chat um it's going to be in this uh Google Drive I've set up for us uh so there's these two Excel files uh you should be able to download each of those to your local machine and then load them into tableau so once you have them downloaded I'll give everyone a second just as you're following along so once we have these downloaded I will switch back over to tableau and you can once you have it open you'll be in this data sources screen and uh you'll see this window here and you'll just be able to click Microsoft Excel and then choose the file uh you you were loading and we're working first with uh breast cancer cost excuse me we're working for the cancer diagnoses California 2004 to 2012. that's what we've that's what we've loaded so far we'll get into loading the second one in a second so um we'll be you'll be able to follow along for that as well but uh yeah this is the first one we've pulled in all right um we've got two more audience questions um if anyone else has any questions please do please do add them to the chat now um so there's one uh just a reminder on what the question mark stage means question mark said yes uh so you'll see that on here um it's interesting uh in our data there were uh basically uh looking at let's look at let's look at this so this is lung cancer there were 20 000 question mark cases right here so essentially what these are defined as is someone was diagnosed with lung cancer but the stage couldn't be precisely determined and they were not put into one of these categories so uh it's kind of uh you know it's a question mark we're not sure exactly what category to be best fit in um so you can draw some interesting conclusions from the the size of this as well uh for example looks like uh the ratio of question mark is much lower for colon cancer but higher for things like lung cancer so maybe it's harder to actually determine a stage for these types of cancer things like that all right and there's a couple of people asking uh could you show again how you renamed the uh in in the first graph how you rename the stage titles to include the word stage yep perfect of course yeah so we just wanted to include stage here just to make it a little clearer um maybe for someone who doesn't know the data as well as us or who are building this so uh what we did to get the stage titles onto These Bars was just drag stage name here to the label but that only gave us the question mark and it didn't give us the the first part so what we were able to do is again click on this label and then uh modify the text to whatever we want here so we could uh just go in here and I just typed the word stage here um so we can just even though I put tests here and click apply it adds it in um and you'll see this stage name in carrots is just the uh the um pulling in the actual value from from the field and this is our custom text we'll apply again there we go and then if we really want it to spice it up we could change things such as the font color things like that but uh we'll leave it as is for now nice okay um and so uh before we move on there's a couple questions around um color so um in the first graph you've got different colors for stages but then there's a sort of it's just blue in the second sheet so just why is that and can you replicate the color scheme from one to the other yeah uh so for this one um we set up this color scheme just to really uh make it make it pop like the definite the difference between the stages in these bars um here we're not really breaking it up as much but uh it it the blue does look a bit boring um so we could uh replicate a bit of color here and honestly let's let's do it and so we can just follow through and see what it could look like in this uh in this different form of graph um so what I would be able to do here is uh we're um so we're breaking it up in columns by uh the type of cancer and then the stage is here um so one thing we can do perhaps here is we want to splitting it up by the name so we can drag measure names to color and now we have um each bar we have all these like split up into different colors looks a little bit better it's easier to tell the difference between um where like flung to prostate for example uh and it does like spice up the graph a little makes it make it look a little bit nicer and uh gets rid of that monotonous blue so I think um as far as uh setting up different colors on this this looks pretty good um as always you can pretty much drag anything to color on here see what it looks like but uh I don't know what else we we do to get better information out here if we dragged the stage name from the tables into color that would give us the same color scheme as in the other plot uh let's let's try it yeah yeah so this has uh now given it um the same as the other plot let's just go back to the dashboard we can visualize um yeah and so we have stage two is red here stage three so this uh this actually does um provide like a a really cool illustration of how we switched up this graph um we've essentially rotated these bar graphs to be out uh out just like this um so yeah honestly let's keep it just like this I like the way this one looks all right great we'll have time for more audience questions at the end so perhaps we can move on to the second data set all right cool man thank you for the questions everybody all right so now it's time to move on to our second data set and our final two questions so um let's create our worksheet and then load our data source I'm going to create the sheet first and then we'll go back and load our source uh so the next question we're trying to answer is what is the cost of breasts cancer treatment depending on stage there so there we have our question and let's just rename our sheet to rest cancer costs so now we're going to load our breast cancer cost data set data source we'll go back to our data source tab over here um first before we do that I've noticed that this is not the the cleanest name for this one let's just rename you just double click in here I'm going to rename this to just the file name just so it's a little cleaner there we go that shouldn't affect anything in our sheets it's just renaming the the data set data source so it's a little clearer what we'll do to add our next data source is this um Source drop down here we'll click here click on new data source and again we're going to be loading from Microsoft Excel so uh let's load from Excel the cost set this one here breast cancer cost by stage click open and wait for Tableau to pull our data in there we go uh again this looks like sorry our Excel we have broken up by stage you'll notice there is no question mark set on this cost one um that just was not available in the data for this so unfortunately we don't have a question mark here but we still do have it broken up by zero one two three and four um perfect so now we have uh this data set loaded let's go back to our worksheet and we'll see at the top here we have our two different data sets to choose from we can go from this is the one we were working with already which was the number of cases and now we have our cost and we have all of our values pulled in here so uh um our goal here is to answer what is the cost of breast cancer treatment depending on stage um foreign and what we want to do is see separate sections for 0 to 12 months to 12 to 24 months so what I think we can do is uh we're going to build a graph very similar to this one here except we want to have two columns or 0 to 12 months and then one for 12 to 24 months and since we're building the same one let's I'm going to turn you loose and let you let you take a stab at it so we'll I'll give you five minutes to try and just create a quick version of this and then we'll go through and build it together uh so go ahead and take a stab at it and then we'll come back together there's a timer just so everyone's clearing this can you just repeat what you expect for people to draw in the plot yeah so uh while that's going um our goal is to build a bar graph with two um uh to answer this question here what is the cost of breast cancer treatment on depending on stage and what we want to see is uh 0 to 12 months and 12 to 24.4 months in columns and then the cost by stage on the side in a similar way to what we have um what we have here where we have total cases but instead of total cases we're going to be tracking cost and we're going to track stage just just like this uh 12 to 24 10 0 to 12 months and then 12 to 24 months would be how we hope it to come out all right wonderful foreign just while the time is taken down there's a question about uh can you show the process again of adding another data source uh yeah of course I can do that real quick so uh step one we'll be going to this data source tab and once we're in here uh you can see you can select between your data sources you've loaded up here and view them uh but for a new data source you're going to click right here click new data source here this plus icon and then choose your file and in this case we're using our cost data set so this second one here this is the second one we loaded and I think we'll time for one more question while the other time is taken um was how do you decide which of the um illnesses were showing in those first graphs like can you filter out so only some of them are shown yeah of course uh so we would be able to do a filter we were just uh including all of them uh but say we wanted to create a filter um we don't want to consider colon cancer for whatever reason easiest way to do that is down here you can click on it and just click exclude and now we don't have there what you see is this in this filters tab it's added a filter here you can click on here edit filter and and view that we've pulled out colon here so let's just add it back here and this method and there you go so uh if we really wanted to get fancy we could do some Dynamic filtering on the side here or you can right click and uh hit show filter and then we'll be able to just select dynamically on here and you can even pull this into the dashboard if you wanted um but we're uh We're Not Gonna do that right now all right uh how are we doing with that time are we about ready to move on made yep uh so hopefully everyone uh made some good progress on this let's just go over it real quickly how I would make it uh so we're going to drag measure names it to the columns and then we also wanted uh oops I'm on my wrong data set that's the key it's getting the right data set so we're going to drag measure names to columns and then disease stage as well as the columns because we wanted that uh by in the columns and then let's pull in measure values to rows there we go um this is what we're going for uh and let's remove this count column we don't need that um so so when you use measure values does that do is that all the numeric columns is that putting in yeah correct that is correct uh when I'm referring to measure values those would be these uh continuous or these measures here um use uh these values here um and they're all the number values so that's just an easy way of uh pulling them to this section here you can take it out like that and only have one um and then uh you can add it back just add it back like this there we go uh so while I I'm going to rename a few of these access and stuff just to make our graph look good but I had a question for everyone uh I'm noticing an interesting Trend here um where the cost is increasing for the first year by stage but for stage four here uh it this looks like an exception I'm wondering if anyone has any uh theories on why that could be or what can we uh potentially draw from this data I'm just calling this total cost here and there we go do you want to talk us through what your theories are for why costs go down that's the second year with stage four yeah um so this is a bit of a an unfortunate Trend we we can pull from here um and uh is is that cost in stage four is really decreasing and not falling a trend and potentially that can be um If you're diagnosed in stage four your prognosis is really not as good so uh it's unfortunate but um really uh like illustrates how important it is to improve methods for screening to catch cancer early and to not uh like hopefully eliminate these type of diagnoses where you're actually being diagnosed in stage four um and then last I'm going to move this to entire view just so it's a little bit uh it looks a little better um and then we'll we'll add this to our dashboard as well later once we've answered our final question so let's uh let's move on to our last question Let's uh create a new sheet and uh we're a question here what was cost by a stage I'm going to rename the sheets just to tal cost for California so what we want to do is combine our two data sets and then at the beginning we noticed they both were broken up by stage so we want to take advantage of that so we can do a calculation of total cost so um let's start out by just creating a table um so let's go to first go to our cost data source so this one here and let's drag the Z state two columns uh now we want to pull in our our account sheet and let's uh one thing we need to keep in mind our cost is only for breast cancer so we can really only look at breast cancer here um if we're trying to calculate total cost so let's drag breast cancer to Rose and we'll see this pop up here in order to use fields we need in order to use fields from both different data sets when a relationship needs to be created with cancer breast cancer costs by stage so we're just going to follow the instructions on here select data edit blend relationships to open the blend relationships dialog box so we need to link our data together in the best the way to do that described in the pop-up we'll go here data edit blend relationships and we'll see our data sources here I'll just stay in the one here and we need to click custom then add and automatically there will be a mapping from disease stage to stage name you'll notice the names are slightly different but Tableau still will match them together let's click ok and immediately we see now our graph is properly representing has changed to represent that the data is linked we'll see our link icon right here cool so our goal is to answer what was the cost in California um but uh let's we want to look at it and let's pull it together in a table first to get a visualization so uh let's click show me and then table this isn't quite what we want um so uh let's go to cost by stage we're going to take our measure names and move it to rows and we don't want um disease stage and rows anymore let's pull that out let's pull that on one second um cancer diagnosis let's go to cancer diagnosis and drag stage name to columns and now we can remove disease stage there we go and then we're going to go back to our cost by stage uh drag measure values to text here and there we go we have a table representing all this combined let's drag out this count so this is just to help visualize what exactly we've done to get this all together here's the the cost um and the rates but this isn't exactly useful to us um so we're going to create a calculated field to get the total cost uh let's go to the breast cancer cost data set which we're already in we're going to create a calculated field here and then we're going to combine two of our data sources in the calculated field let's call this uh total cost for CA let's drag number of breast cancer cases in here you see it pulls it in and it's indicating the other data set and we're trying to calculate total cost so we're going to multiply this so this is the number um of cases and we need to multiply by our cost and let's just do it for 0 to 12 months here and we need to take the sum of this all right so this is valid We'll add that in so now we have a calculated field for total cost um let's visualize it uh and the best way to do it we have stage name on columns uh let's drag our total cost field two rows remove this and we have some labels in here we don't uh don't want let's take the measure values out and now we have on our left here we have the total cost um for each individual stage you see that uh this isn't exactly see here with 13 billion for stage one and two and three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of case aren't most prevalent and what stage most diagnoses happen foreign hopefully they they do a decent job of answering our questions obviously we'd go in and color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much lighter that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how we go about linking the two data sets again that was that was kind of uh the Crux to those last uh plots yep and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examiner data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go in the color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbye um for each individual stage you see that uh this isn't exactly see here with 13 billion um for stage one and two and uh three million first or three billion for stage three 1.3 for stage four and let's just drag this onto the label so it's uh easy to see right on there without hovering move to the entire view so now we have the total cost in California um for each individual stage and there we go so finally let's just add these last two questions to our dashboard easy to drag these in let's put them here and total cost for California so now we have a central place to answer all of our questions uh we can see the total cost by stage we can see what is the cost depending on stage we can see what types of Kings aren't most prevalent and what stage most diagnoses happen foreign that's all the uh visualizations I had prepared um hopefully they they do a decent job of answering our questions obviously we'd go in and uh color code these and improve them but uh we're running out of time so let's uh move on to the final uh question and answer all right brilliant thank you very much Connor um I have to say uh the the thing about how most um most Cancers get diagnosed at stage one or two but they have lung cancer is much later that's really interesting it sort of really shows the effect of like uh screening programs in there for some types of cancer but not others um all right so we have a few more questions from the audience so um one of the questions is can you just go over about how you got linking the two data sets again that was that was kind of uh the Crux to those last uh plots yeah and linking data sets is one of the uh harder things we're doing here so um happy to go over it uh so what I did to um to prompt the linking of the two data sources was dragged a field from each uh set onto our page and then it uh it gave me a message I can't really work with this data unless I link them unless I tell Tableau how those data sets can be linked uh we know from examinated data in the beginning that they are linked um they could be linked based off stage name they have similar similar values so we're able to go up into this data tab uh and then create a blend relationship so we go to edit blend relationships and we were able to create a custom relationship linking disease stage to disease name um I did that by clicking this add screen here and Tableau is smart enough to know that like the values in here the the field names were similar so we're able to just select from the list pick that and then uh and then link together here um and like after linking uh we had to do some uh some work with the data to really create a calculated field of what our answer could be uh just because like essentially adding cost to the rows doesn't make a lot of sense um for our question and we just needed to create a calculated field okay wonderful um and there's one question about how do if you want uh your own color scheme how do you change the the palette how do you change the palettes uh yeah so let's take a look uh let's go back to our first one a lot of color here um you're able to change palette color by going into color uh in your marks card find your marks card go into color edit colors and you can select from a ton of different palettes in here um get really fancy with it uh and change things like opacity uh even at Borders things like that so you can you can really make it look good if you uh if you have some more time to uh play around with those all right wonderful well we're at time now so um thank you once again Connor that was a really nice presentation uh like uh yeah really interesting to get to see a whole dashboard made uh uh in a pretty short amount of time thank you also to um Eddie for moderating thank you to everyone who asked a question thank you to everyone who showed up uh we have more Tableau training coming next Tuesday make sure you register for that and uh through there's another session I believe uh next Wednesday on how to become a business analyst so if you're interested in uh dashboard creation and that sort of thing as a career then please do uh register again uh for that one so that's a datacamp.com webinars um I will see you all in future sessions uh have a nice day and goodbye\n"