Create viz story in Tableau – example¶
Story link in tableau public server¶
https://public.tableau.com/profile/publish/CASPER_R13_with_cluster/R13
Data Courtesy of¶
Rebecca Rose and Susanna Lamers @ bioinfox.com
Original data source:¶
| All sample data: | |
|---|---|
| SequenceID columns is “Sequence Name” | |
| All couples in R13: | |
| Each SequenceID only listed once, so 91 couples means total 182 SequenceID are with couple. | |
| clusters: | Five cluster files, each for one cluster. | 
Processed data source¶
- GP41_R13_couples_checkcls.csv - process: - read the couples data, for each couple (two IDs), add columns for their ClusterID.
- calculate the Catergory. The “Category” column has three values: IN, OUT, INOUT.
 - IN – If the couple in same cluster
- OUT – If the couple not in cluster
- INOUT – If only one of the couple in cluster
 - columns: - c1id,c1region,c1subtype,
- c2id,c2region,c2subtype,
- c1clsid_01,c2clsid_01,Category01,
- c1clsid_02,c2clsid_02,Category02,
- c1clsid_03,c2clsid_03,Category03,
- c1clsid_04,c2clsid_04,Category04,
- c1clsid_05,c2clsid_05,Category05,
- c1clsid_053,c2clsid_053,Category053
 
- GP41_R13_couples_category.csv - Process: - Same as the couples_checkcls.csv file, but drop the ClusterID columns.
- Create a new dataframe with reversed c1 and c2 columns, leave the Category columns unchanged. Then concatenate two dataframe. This is to make sure columne “c1id” covers all IDs with couple.
 - columns: - c1id,c1region,c1subtype,
- c2id,c2region,c2subtype,
- Category01,Category02,Category03,Category04,Category05,Category053
 
- GP41_R13_with_cluster.csv - Process: - read all sample data, filter out R13 subset
- read cluster datasets, add clusterID columns for each SequenceID
 - Columns: - original data columns
- plus the new clusterID columns: ClusterID_01,ClusterID_02,ClusterID_03,ClusterID_04,ClusterID_05,ClusterID_053
 - Note: - Here the clusterID columns replaces the dropped clusterID columns in couples_category.csv file. 
Created data source for mapping¶
Open the Regions image in Illustrator, use the pen tool and trace a path for each region area. Click each path, from menu Window->Attributes, set “image map” to “polygon”, and “URL” an identification you want, e.g. “#R7”. Then from menu File->Save for Web and Devices, save the image map as web html file. Open the html file in a text editor, there you can find a list of control points for each path identification (URL). Add the point list for each Region to your csv file, with incrementing PointOrder (start from 1)
| file: | region_polygon.csv | 
|---|---|
| columns: | X,Y,Region,PointOrder | 
| X range: | 0-668 | 
| Y range: | 0-669 | 
data join¶
| Left join: | GP41_R13_with_cluster.csv (Sequence Name) with GP41_R13_couples_category.csv (c1id), data type: String | 
|---|---|
| Inner join: | GP41_R13_with_cluster.csv (Region) with region_polygon.csv (Region), data type: Number (Whole) | 
Calculated fields in Tableau¶
| hasCouple: | NOTNULL([c2id]) | 
|---|---|
| coupleStatus: | IF hasCouple THEN “Couple” ELSE “NoCouple” | 
| coupleInCluster: | |
| Category==”IN” | |
| coupleInClusterStatus: | |
| IF coupleInCluster THEN “IN” ELSE “OUT” | |
Worksheets and Dashboards:¶
| Cluster tree map: | |
|---|---|
| color by “number of records” | |
| Regions filled map: | |
| plot AVG(X) and AVG(Y), turn “Region” to dimension and set as detail. path by “PointOrder”, color by “number of records”. Also set map as background image. | |
| Subtype Pie chart: | |
| color by “Subtype”, angle by “number of records” | |
| Subtype bar chart: | |
| color by “Subtype”, size by “number of records” | |
| hasCouple pie chart: | |
| color by “hasCouple”, angle by “number of records” | |
| coupleInCluster pie chart: | |
| color by “coupleInCluster”, angle by “number of records” | |
| dual axis charts: | |
| 
 | |
Stories¶
Answer the questions:
- What’s the most affected Region, and what’s the least?
- Explore subType distribution among regions
- For each cluster, do the subtype distribution change?
- For each cluster, how many couples get into the same cluster?