Data Visualization using Bokeh package in Python

Tavish Aggarwal

December 13, 2023

In this post, we will be looking at the technique used to visualize data using the Bokeh package. We will be using the house property sales dataset from the kaggle. The data has been loaded in housePropertyDataset variable.

import pandas as pd

housePropertyDataset = pd.read_csv('house_property_sales.csv')

Let's get started.

Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets.

Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

Let's look at the sample code to draw a bokeh visualization:

from bokeh.plotting import figure, show
from bokeh.io import output_file

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price', tools='box_select')

p.circle(x=housePropertyDataset['GarageArea'], y=housePropertyDataset['SalePrice'], size=5, 
                   selection_color='green', nonselection_alpha=0.1, color='red', alpha=0.5)

output_file('out_sales_price_garbage_area.html')
show(p)

OUTPUT: LINK

Here in the example shown above, we have used circle glyphs to generate visualization. There are a lot more other glyphs available like line, cross, patches, etc that are provided by the bokeh package.

We have also passed the tools argument to the figure method. Tools parameter will contain comma-separated values which will add additional capabilities to the plot.

There are three most important arguments used in the plot: size, color, and alpha. These control the size of the glyphs, the color of the glyphs, and the transparency of the plot respectively.

ColumnDataSource

We can even create ColumnDataSource from the dataset. ColumnDataSource is a way to store data for bokeh visualization. Let's look at an example shown below:

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')
source = ColumnDataSource(housePropertyDataset)
p.circle(x= 'GarageArea', y = 'SalePrice', source=source)
output_file('out_sales_price_garbage_area_cds.html')
show(p)

OUTPUT: LINK

Hover Tool

As discussed earlier we can pass the tools option to the figure method and specify the tools that we want to use in our visualization.

In a very similar way, we can use the hover tool as well. 

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file
from bokeh.models import HoverTool

p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')

source = ColumnDataSource(housePropertyDataset)

p.circle(x='GarageArea', y='SalePrice', size=10,
         fill_color='grey', alpha=0.1, line_color=None,
         hover_fill_color='firebrick', hover_alpha=0.5,
         hover_line_color='white', source=source)

hover = HoverTool(tooltips=[('sales price', '@SalePrice'),
                           ('Condition', '@SaleCondition')])
p.add_tools(hover)

output_file('out_sales_price_garbage_area_ht.html')
show(p)

OUTPUT: LINK

In the example shown above, we are using the add_tools method. Now we will have hover functionality where the plot will be highlighted on hover.

Color Mapping

We can also have a color mapping plot with the bokeh package, where we can generate colored plots based on the category.

Refer to the example shown below:

from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.io import output_file
from bokeh.models import HoverTool, CategoricalColorMapper

p = figure(x_axis_label='Installment Commitment', y_axis_label='Credit Amount')
source = ColumnDataSource(housePropertyDataset)

color_mapper = CategoricalColorMapper(factors=['GasA', 'GasW', 'Grav', 'Wall', 'OthW', 'Floor'],
                                 palette=['red', 'green', 'orange', 'blue', 'pink', 'yellow'])

p.circle(x='GarageArea', y='SalePrice', source=source, 
         color=dict(field='Heating', transform=color_mapper), legend='Heating')

hover = HoverTool(tooltips= [('Heating','@Heating')], mode='hline')
p.add_tools(hover)
p.legend.location = 'top_right'
p.legend.background_fill_color = 'lightgray'
output_file('out_credit_score.html')
show(p)

OUTPUT: LINK

In the example shown above, we have defined a category color mapper where we have specified the color for the particular category.

After defining the category color mapper we have specified the mapper to the color argument in the glyph function (i.e. circle function).

Also, note that we have added legends to the plot and positioned them to the top right of the plot. Additionally, we have added the background color for visibility.

Layouts

We can also have multiple plots in a single HTML file. To do so, we need to define the layout of the display of the plots. Bokeh provides the various functions that we can use to define the layouts.

Before defining layouts let's define the figures that we want to plot:

from bokeh.plotting import figure, ColumnDataSource

source = ColumnDataSource(housePropertyDataset)

p1 = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price')
p1.circle(x='GarageArea', y='SalePrice', source=source)

p2 = figure(x_axis_label='Overall Qualification', y_axis_label='Sales Price')
p2.circle(x='OverallQual', y='SalePrice', source=source)

p3 = figure(x_axis_label='Total Basement Area', y_axis_label='Sales Price')
p3.circle(x='TotalBsmtSF', y='SalePrice', source=source)

p4 = figure(x_axis_label='Year Remod Add', y_axis_label='Sales Price')
p4.x(x='YearRemodAdd', y='SalePrice', source=source)

Now, we have figures defined that we need to plot. Let's define layouts to plot the figures.

Rows and columns

We can define figures in rows and columns. Let's look at an example shown below:

from bokeh.layouts import column, row
from bokeh.io import output_file

layout = column(row(p1, p2), row(p3, p4))
output_file('out_column_row.html')
show(layout)

OUTPUT: LINK

As you can see in the output there are four plots plotted in row and column fashion.

Gridplots

Instead of using rows and columns, we can use grid plots which will give similar results. And will be much easier to define.

from bokeh.layouts import gridplot
from bokeh.io import output_file

row1  = [p1, p2]
row2  = [p3, p4]

layout = gridplot([row1, row2])

output_file('out_grid_output.html')
show(layout)

OUTPUT: LINK

Tabbed Layout

We can also create a tabbed view of the plots for a much better user experience.

from bokeh.models.widgets import Tabs, Panel
from bokeh.io import output_file

# Create tab1 from plot p1: tab1
tab1 = Panel(child=p1, title='Garbage Area vs Sales Price curve')

# Create tab2 from plot p2: tab2
tab2 = Panel(child=p2, title='Overall Qualification vs Sales Price curve')

# Create tab3 from plot p3: tab3
tab3 = Panel(child=p3, title='Total Basement Area vs Sales Price curve')

# Create tab4 from plot p4: tab4
tab4 = Panel(child=p4, title='Year Remod Add vs Sales Price curve')

# Create a Tabs layout: layout
layout = Tabs(tabs=[tab1, tab2, tab3, tab4])

# Specify the name of the output_file and show the result
output_file('out_tabs_output.html')
show(layout)

OUTPUT: LINK

Linked Plots

As we have plotted individual plots, it's time to see how to establish a relationship between two or more than two plots. Here we will be discussing linked axes, linked selection, and linked brushing.

Linked Axes

Linked axes plots are the plots that share their y-axis or x-axis with other plots plotted. Consider an example shown below:

from bokeh.io import output_file

p1.y_range = p2.y_range
row1  = [p1, p2]

layout = gridplot([row1])

output_file('out_shared_axes_output.html')
show(layout)

OUTPUT: LINK

Here is an example where we are using the same figures as defined in the Layouts section. As you might have observed in the output the y-axis is shared among two plots.

In a very similar way, we can also share the x-axis as well or both the x-axis and y-axis.

Linked Selection

When a region is selected using a selector tool in one of the plots, in all the other plots the specific area is selected. This is called Linked Selection.

To achieve linked selection, we have to make sure that the figures share the same data source (i.e. ColumnDataSource object). Since our figure is sharing the same source, you can see this happening in any of the above examples.

Summary

In this post, we have explored creating a visualization using the bokeh package. We have used various tools like hover and color mapping to plot bokekh plots. After having a basic understanding of the bokeh plots we then looked at techniques to plot multiple plots as a single output. Then we saw how to create linked plots using techniques like linked axes, linked selection, etc.

Hope you like the post. Don't forget to leave your opinions in the comment section below.

Author Info

Tavish Aggarwal

Website: http://tavishaggarwal.com

Living in Hyderabad and working as a research-based Data Scientist with a specialization to improve the major key performance business indicators in the area of sales, marketing, logistics, and plant productions. He is an innovative team leader with data wrangling out-of-the-box capabilities such as outlier treatment, data discovery, data transformation with a focus on yielding high-quality results.