Create viz story in Tableau – example¶

Story link in tableau public server¶

https://public.tableau.com/profile/publish/CASPER_R13_with_cluster/R13

Data Courtesy of¶

Rebecca Rose and Susanna Lamers @ bioinfox.com

Original data source:¶

All sample data:
	SequenceID columns is “Sequence Name”
All couples in R13:
	Each SequenceID only listed once, so 91 couples means total 182 SequenceID are with couple.
clusters:	Five cluster files, each for one cluster.

Processed data source¶

GP41_R13_couples_checkcls.csv

process:

read the couples data, for each couple (two IDs), add columns for their ClusterID.
calculate the Catergory. The “Category” column has three values: IN, OUT, INOUT.

IN – If the couple in same cluster

OUT – If the couple not in cluster

INOUT – If only one of the couple in cluster

columns:

c1id,c1region,c1subtype,
c2id,c2region,c2subtype,
c1clsid_01,c2clsid_01,Category01,
c1clsid_02,c2clsid_02,Category02,
c1clsid_03,c2clsid_03,Category03,
c1clsid_04,c2clsid_04,Category04,
c1clsid_05,c2clsid_05,Category05,
c1clsid_053,c2clsid_053,Category053

GP41_R13_couples_category.csv

Process:	Same as the couples_checkcls.csv file, but drop the ClusterID columns. Create a new dataframe with reversed c1 and c2 columns, leave the Category columns unchanged. Then concatenate two dataframe. This is to make sure columne “c1id” covers all IDs with couple.
columns:	c1id,c1region,c1subtype, c2id,c2region,c2subtype, Category01,Category02,Category03,Category04,Category05,Category053

GP41_R13_with_cluster.csv

Process:	read all sample data, filter out R13 subset read cluster datasets, add clusterID columns for each SequenceID
Columns:	original data columns plus the new clusterID columns: ClusterID_01,ClusterID_02,ClusterID_03,ClusterID_04,ClusterID_05,ClusterID_053
Note:	Here the clusterID columns replaces the dropped clusterID columns in couples_category.csv file.

Created data source for mapping¶

Open the Regions image in Illustrator, use the pen tool and trace a path for each region area. Click each path, from menu Window->Attributes, set “image map” to “polygon”, and “URL” an identification you want, e.g. “#R7”. Then from menu File->Save for Web and Devices, save the image map as web html file. Open the html file in a text editor, there you can find a list of control points for each path identification (URL). Add the point list for each Region to your csv file, with incrementing PointOrder (start from 1)

file:	region_polygon.csv
columns:	X,Y,Region,PointOrder
X range:	0-668
Y range:	0-669

data join¶

Left join:	GP41_R13_with_cluster.csv (Sequence Name) with GP41_R13_couples_category.csv (c1id), data type: String
Inner join:	GP41_R13_with_cluster.csv (Region) with region_polygon.csv (Region), data type: Number (Whole)

Calculated fields in Tableau¶

coupleInCluster:
hasCouple:	NOTNULL([c2id])
coupleStatus:	IF hasCouple THEN “Couple” ELSE “NoCouple”
	Category==”IN”
coupleInClusterStatus:
	IF coupleInCluster THEN “IN” ELSE “OUT”

Worksheets and Dashboards:¶

Cluster tree map:
	color by “number of records”
Regions filled map:
	plot AVG(X) and AVG(Y), turn “Region” to dimension and set as detail. path by “PointOrder”, color by “number of records”. Also set map as background image.
Subtype Pie chart:
	color by “Subtype”, angle by “number of records”
Subtype bar chart:
	color by “Subtype”, size by “number of records”
hasCouple pie chart:
	color by “hasCouple”, angle by “number of records”
coupleInCluster pie chart:
	color by “coupleInCluster”, angle by “number of records”
dual axis charts:
	Donut chart of “hasCouple/Subtype” dual axis Donut chart of “coupleInCluster/Subtype” dual axis

Stories¶

Answer the questions:

What’s the most affected Region, and what’s the least?
Explore subType distribution among regions
For each cluster, do the subtype distribution change?
For each cluster, how many couples get into the same cluster?