T-SNE, PCA and other dimensionality reduction algorithms
have always been a good way to visualize high-dimensionality datasets.
Unfortunately, most of the time, the visualization stops there. If
another view of the data is needed, a new chart needs to be computed,
projected and displayed.
been a great help to build interactive charts, but the analysis
dimension has always been a bit neglected. Last week-end, I took a few
hours to glue some Bokeh components together and built an interactive
visualization explorer: Chart Miner.
The tool is available on github where you're free to either download,
clone or even fork it. Get the code
Chart Miner takes a JSON file as an input; this file describe the
dataset to be displayed. The format is pretty simple: every data point
is identified by a unique identifier (id), name (title), coordinates (x
and y) and a bunch of attributes - each having a name and a list of
values. The attributes are the variables that can be filtered out, used
as dimensions, etc.
An input data file would look like this (see the
file in the repo):
"title":"name of point 1",
"values" : ["tag 1", "tag 2", "tag 3", "tag 4"]
Chart Miner has many dependencies; most of them are linked to the bokeh
toolkit. A "pip install bokeh" should install Bokeh and the related
dependencies. Once those dependencies resolved, the next step is to
launch the bokeh server instance (using "bokeh serve") in the code
repository. Then, you can launch Chart Miner directly by executing it
directly from the command line:
This will launch a web browser window and open a page with two visible
panes as shown in the above screenshot.
The left pane shows the data points shaded by
the dimensions (or attributes) available in the drop down menu
underneath the chart. By using the selection tools available in the
chart menu, the attributes of the selected data points will be displayed
in the table located on the right pane.
This right panes contains two sections: the first section is an
attributes selection drop down menu that will show only a given
dimension of the points. The threshold field is used to filter out
unimportant data that can clutter the table. The second section is the
table that counts the number of dimension (as selected in the drop down
At my day job, we've been using this tool as a visual aid to understand
our video recommendation data. Chart Miner give us a bird eye view of
our catalog and help us optimize the recommendations. While exploration
was possible before using this system, it was more cumbersome and would
introduce a lag as every time a chart was shown more questions were
raised and new charts would have to be computed before communicating
them to the leadership.
Now, we use this tool and it's a great conversation starter and enables
us to raise - and answer - questions as they arise.