Data Visualization using Bokeh package in Python

Tavish Aggarwal

In previous post Data Visualization using Python, we have learned about various visualization techniques to plot data. We saw various visualizations provided by Seaborn and matplotlib package. Continuing the visualizing the data, In this post, we will be looking at the technique used to visualize data using Bokeh package.

In the examples shown below, we will be using house property sales dataset from the kaggle. The data has been loaded in housePropertyDataset variable.

import pandas as pd

housePropertyDataset = pd.read_csv('house_property_sales.csv')

Let's get started.

Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets.

Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

Let's look at the sample code to draw bokeh visualization:

from bokeh.plotting import figure, show
from bokeh.io import output_file

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price', tools='box_select')

p.circle(x=housePropertyDataset['GarageArea'], y=housePropertyDataset['SalePrice'], size=5, 
                   selection_color='green', nonselection_alpha=0.1, color='red', alpha=0.5)

output_file('out_sales_price_garbage_area.html')
show(p)

OUTPUT: LINK

Here in the example shown above, we have used circle glyphs to generate visualization. There are lot more other glyphs available like line, cross, patches, etc that are provided by bokeh package.

We have also passed tools argument to figure method. Tools parameter will contain comma separated values which will add additional capabilities to the plot.

There are three most important arguments used in the plot: size, color, and alpha. These control the size of the glyphs, color of the glyphs and transparency of the plot respectively.

ColumnDataSource

We can even create ColumnDataSource from the dataset. ColumnDataSource is a way to store data for bokeh visualization. Let's look at an example shown below:

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')
source = ColumnDataSource(housePropertyDataset)
p.circle(x= 'GarageArea', y = 'SalePrice', source=source)
output_file('out_sales_price_garbage_area_cds.html')
show(p)

OUTPUT: LINK

Hover Tool

As discussed earlier we can pass tools option to the figure method and specify the tools that we want to use in our visualization.

In a very similar way, we can use the hover tool as well. 

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file
from bokeh.models import HoverTool

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')

source = ColumnDataSource(housePropertyDataset)

p.circle(x='GarageArea', y='SalePrice', size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white', source=source)

hover = HoverTool(tooltips=[('sales price', '@SalePrice'),
                           ('Condition', '@SaleCondition')])
p.add_tools(hover)

output_file('out_sales_price_garbage_area_ht.html')
show(p)

OUTPUT: LINK

In the example shown above, we are using the add_tools method. Now we will have hover functionality where the plot will be highlighted on hover.

Color Mapping

We can also have a color mapping plot with bokeh package, where we can generate colored plot based on the category.

Refer the example as shown below:

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file
from bokeh.models import HoverTool, CategoricalColorMapper

p = figure(x_axis_label='Installment Commitment', y_axis_label='Credit Amount')
source = ColumnDataSource(housePropertyDataset)

color_mapper = CategoricalColorMapper(factors=['GasA', 'GasW', 'Grav', 'Wall', 'OthW', 'Floor'],
                                 palette=['red', 'green', 'orange', 'blue', 'pink', 'yellow'])

p.circle(x='GarageArea', y='SalePrice', source=source, 
         color=dict(field='Heating', transform=color_mapper), legend='Heating')

hover = HoverTool(tooltips= [('Heating','@Heating')], mode='hline')
p.add_tools(hover)
p.legend.location = 'top_right'
p.legend.background_fill_color = 'lightgray'
output_file('out_credit_score.html')
show(p)

OUTPUT: LINK

In the example shown above, we have defined a category color mapper where we have specified the color for the particular category.

After defining the category color mapper then we have specified the mapper to the color argument in glyph function (i.e. circle function).

Also, note that we have added legends to the plot and positioned them to the top right of the plot. Additionally, we have added the background color for visibility.

Layouts

We can also have multiple plots in a single HTML file. In order to do so, we need to define the layout of the display of the plots. Bokeh provides the various function that we can use to define the layouts.

Before defining layouts let's define the figures that we want to plot:

from bokeh.plotting import figure, ColumnDataSource

source = ColumnDataSource(housePropertyDataset)

p1 = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')
p1.circle(x='GarageArea', y='SalePrice', source=source)

p2 = figure(x_axis_label='Overall Qualification', y_axis_label='Sales Price')
p2.circle(x='OverallQual', y='SalePrice', source=source)

p3 = figure(x_axis_label='Total Basement Area', y_axis_label='Sales Price')
p3.circle(x='TotalBsmtSF', y='SalePrice', source=source)

p4 = figure(x_axis_label='Year Remod Add', y_axis_label='Sales Price')
p4.x(x='YearRemodAdd', y='SalePrice', source=source)

Now, we have figures defined that we need to plot. Let's define layouts to plot the figures.

Rows and columns

We can define figures in rows and columns. Let's look at an example shown below:

from bokeh.layouts import column, row
from bokeh.io import output_file

layout = column(row(p1, p2), row(p3, p4))
output_file('out_column_row.html')
show(layout)

OUTPUT: LINK

As you can see in the output that there are four plots plotted in row and column fashion.

Gridplots

Instead of using rows and column we can use grid plots which will give similar results. And will be much easy to define.

from bokeh.layouts import gridplot
from bokeh.io import output_file

row1  = [p1, p2]
row2  = [p3, p4]

layout = gridplot([row1, row2])

output_file('out_grid_output.html')
show(layout)

OUTPUT: LINK

Tabbed Layout

We can also create a tabbed view of the plots for much better user experience.

from bokeh.models.widgets import Tabs, Panel
from bokeh.io import output_file

# Create tab1 from plot p1: tab1
tab1 = Panel(child=p1, title='Garbage Area vs Sales Price curve')

# Create tab2 from plot p2: tab2
tab2 = Panel(child=p2, title='Overall Qualification vs Sales Price curve')

# Create tab3 from plot p3: tab3
tab3 = Panel(child=p3, title='Total Basement Area vs Sales Price curve')

# Create tab4 from plot p4: tab4
tab4 = Panel(child=p4, title='Year Remod Add vs Sales Price curve')

# Create a Tabs layout: layout
layout = Tabs(tabs=[tab1, tab2, tab3, tab4])

# Specify the name of the output_file and show the result
output_file('out_tabs_output.html')
show(layout)

OUTPUT: LINK

Linked Plots

As we have plotted individual plots, it's time to see how to establish a relationship between two or more than two plots. Here we will be discussing linked axes, linked selection and linked brushing.

Linked Axes

Linked axes plots are the plots which share their y-axis or x-axis with other plots plotted. Consider an example shown below:

from bokeh.io import output_file

p1.y_range = p2.y_range
row1  = [p1, p2]

layout = gridplot([row1])

output_file('out_shared_axes_output.html')
show(layout)

OUTPUT: LINK

Here is an example where we are using the same figures as defined in the Layouts section. As you might have observed in output that the y-axis is shared among two plots.

In a very similar way, we can also share x-axis as well or both x-axis and y-axis.

Linked Selection

When a region is selected using a selector tool in one of the plots, in all the other plots the specific area is selected. This is called Linked Selection.

In order to achieve linked selection, we have to make sure that the figures are sharing the same data source (i.e. ColumnDataSource object). Since our figure is sharing the same source, you can see this happening in any of the above examples.

Summary

In this post, we have explored creating a visualization using bokeh package. We have used various tools like hover and color mapping to plot bokekh plots. After having a basic understanding of the bokeh plots we then looked at techniques to plot multiple plots as a single output. Then we saw how to create linked plots using techniques like linked axes, linked selection etc.

Hope you like the post. Don't forget to leave your opinions in the comment section below.

Author Info

Tavish Aggarwal

Website: http://tavishaggarwal.com

Tavish Aggarwal is a front-end Developer working in a Hyderabad. He is very passionate about technology and loves to work in a team.