Create viz story in Tableau – example¶
Story link in tableau public server¶
https://public.tableau.com/profile/publish/CASPER_R13_with_cluster/R13
Data Courtesy of¶
Rebecca Rose and Susanna Lamers @ bioinfox.com
Original data source:¶
All sample data: | |
---|---|
SequenceID columns is “Sequence Name” | |
All couples in R13: | |
Each SequenceID only listed once, so 91 couples means total 182 SequenceID are with couple. | |
clusters: | Five cluster files, each for one cluster. |
Processed data source¶
GP41_R13_couples_checkcls.csv
process: - read the couples data, for each couple (two IDs), add columns for their ClusterID.
- calculate the Catergory. The “Category” column has three values: IN, OUT, INOUT.
- IN – If the couple in same cluster
- OUT – If the couple not in cluster
- INOUT – If only one of the couple in cluster
columns: - c1id,c1region,c1subtype,
- c2id,c2region,c2subtype,
- c1clsid_01,c2clsid_01,Category01,
- c1clsid_02,c2clsid_02,Category02,
- c1clsid_03,c2clsid_03,Category03,
- c1clsid_04,c2clsid_04,Category04,
- c1clsid_05,c2clsid_05,Category05,
- c1clsid_053,c2clsid_053,Category053
GP41_R13_couples_category.csv
Process: - Same as the couples_checkcls.csv file, but drop the ClusterID columns.
- Create a new dataframe with reversed c1 and c2 columns, leave the Category columns unchanged. Then concatenate two dataframe. This is to make sure columne “c1id” covers all IDs with couple.
columns: - c1id,c1region,c1subtype,
- c2id,c2region,c2subtype,
- Category01,Category02,Category03,Category04,Category05,Category053
GP41_R13_with_cluster.csv
Process: - read all sample data, filter out R13 subset
- read cluster datasets, add clusterID columns for each SequenceID
Columns: - original data columns
- plus the new clusterID columns: ClusterID_01,ClusterID_02,ClusterID_03,ClusterID_04,ClusterID_05,ClusterID_053
Note: Here the clusterID columns replaces the dropped clusterID columns in couples_category.csv file.
Created data source for mapping¶
Open the Regions image in Illustrator, use the pen tool and trace a path for each region area. Click each path, from menu Window->Attributes, set “image map” to “polygon”, and “URL” an identification you want, e.g. “#R7”. Then from menu File->Save for Web and Devices, save the image map as web html file. Open the html file in a text editor, there you can find a list of control points for each path identification (URL). Add the point list for each Region to your csv file, with incrementing PointOrder (start from 1)
file: | region_polygon.csv |
---|---|
columns: | X,Y,Region,PointOrder |
X range: | 0-668 |
Y range: | 0-669 |
data join¶
Left join: | GP41_R13_with_cluster.csv (Sequence Name) with GP41_R13_couples_category.csv (c1id), data type: String |
---|---|
Inner join: | GP41_R13_with_cluster.csv (Region) with region_polygon.csv (Region), data type: Number (Whole) |
Calculated fields in Tableau¶
hasCouple: | NOTNULL([c2id]) |
---|---|
coupleStatus: | IF hasCouple THEN “Couple” ELSE “NoCouple” |
coupleInCluster: | |
Category==”IN” | |
coupleInClusterStatus: | |
IF coupleInCluster THEN “IN” ELSE “OUT” |
Worksheets and Dashboards:¶
Cluster tree map: | |
---|---|
color by “number of records” |
|
Regions filled map: | |
plot AVG(X) and AVG(Y), turn “Region” to dimension and set as detail. path by “PointOrder”, color by “number of records”. Also set map as background image. |
|
Subtype Pie chart: | |
color by “Subtype”, angle by “number of records” |
|
Subtype bar chart: | |
color by “Subtype”, size by “number of records” |
|
hasCouple pie chart: | |
color by “hasCouple”, angle by “number of records” |
|
coupleInCluster pie chart: | |
color by “coupleInCluster”, angle by “number of records” |
|
dual axis charts: | |
|
Stories¶
Answer the questions:
- What’s the most affected Region, and what’s the least?
- Explore subType distribution among regions
- For each cluster, do the subtype distribution change?
- For each cluster, how many couples get into the same cluster?