Create viz story in Tableau – example

Story link in tableau public server

Data Courtesy of

Rebecca Rose and Susanna Lamers @

Original data source:

All sample data:
 SequenceID columns is “Sequence Name”
All couples in R13:
 Each SequenceID only listed once, so 91 couples means total 182 SequenceID are with couple.
clusters:Five cluster files, each for one cluster.

Processed data source

  • GP41_R13_couples_checkcls.csv

    1. read the couples data, for each couple (two IDs), add columns for their ClusterID.
    2. calculate the Catergory. The “Category” column has three values: IN, OUT, INOUT.
    • IN – If the couple in same cluster
    • OUT – If the couple not in cluster
    • INOUT – If only one of the couple in cluster
    • c1id,c1region,c1subtype,
    • c2id,c2region,c2subtype,
    • c1clsid_01,c2clsid_01,Category01,
    • c1clsid_02,c2clsid_02,Category02,
    • c1clsid_03,c2clsid_03,Category03,
    • c1clsid_04,c2clsid_04,Category04,
    • c1clsid_05,c2clsid_05,Category05,
    • c1clsid_053,c2clsid_053,Category053
  • GP41_R13_couples_category.csv

    1. Same as the couples_checkcls.csv file, but drop the ClusterID columns.
    2. Create a new dataframe with reversed c1 and c2 columns, leave the Category columns unchanged. Then concatenate two dataframe. This is to make sure columne “c1id” covers all IDs with couple.
    • c1id,c1region,c1subtype,
    • c2id,c2region,c2subtype,
    • Category01,Category02,Category03,Category04,Category05,Category053
  • GP41_R13_with_cluster.csv

    1. read all sample data, filter out R13 subset
    2. read cluster datasets, add clusterID columns for each SequenceID
    • original data columns
    • plus the new clusterID columns: ClusterID_01,ClusterID_02,ClusterID_03,ClusterID_04,ClusterID_05,ClusterID_053

    Here the clusterID columns replaces the dropped clusterID columns in couples_category.csv file.

Created data source for mapping

Open the Regions image in Illustrator, use the pen tool and trace a path for each region area. Click each path, from menu Window->Attributes, set “image map” to “polygon”, and “URL” an identification you want, e.g. “#R7”. Then from menu File->Save for Web and Devices, save the image map as web html file. Open the html file in a text editor, there you can find a list of control points for each path identification (URL). Add the point list for each Region to your csv file, with incrementing PointOrder (start from 1)

X range:0-668
Y range:0-669

data join

Left join:GP41_R13_with_cluster.csv (Sequence Name) with GP41_R13_couples_category.csv (c1id), data type: String
Inner join:GP41_R13_with_cluster.csv (Region) with region_polygon.csv (Region), data type: Number (Whole)

Calculated fields in Tableau

coupleStatus:IF hasCouple THEN “Couple” ELSE “NoCouple”
 IF coupleInCluster THEN “IN” ELSE “OUT”

Worksheets and Dashboards:

Cluster tree map:

color by “number of records”

Regions filled map:

plot AVG(X) and AVG(Y), turn “Region” to dimension and set as detail. path by “PointOrder”, color by “number of records”. Also set map as background image.

Subtype Pie chart:

color by “Subtype”, angle by “number of records”

Subtype bar chart:

color by “Subtype”, size by “number of records”

hasCouple pie chart:

color by “hasCouple”, angle by “number of records”

coupleInCluster pie chart:

color by “coupleInCluster”, angle by “number of records”

dual axis charts:
  • Donut chart of “hasCouple/Subtype” dual axis
  • Donut chart of “coupleInCluster/Subtype” dual axis


Answer the questions:

  1. What’s the most affected Region, and what’s the least?
  2. Explore subType distribution among regions
  3. For each cluster, do the subtype distribution change?
  4. For each cluster, how many couples get into the same cluster?