Data Visualization using Bokeh package in Python
In previous post Data Visualization using Python, we have learned about various visualization techniques to plot data. We saw various visualizations provided by Seaborn and matplotlib package. Continuing the visualizing the data, In this post, we will be looking at the technique used to visualize data using Bokeh package.
In the examples shown below, we will be using house property sales dataset from the kaggle. The data has been loaded in
import pandas as pd housePropertyDataset = pd.read_csv('house_property_sales.csv')
Let's get started.
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets.
Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Let's look at the sample code to draw bokeh visualization:
from bokeh.plotting import figure, show from bokeh.io import output_file p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price', tools='box_select') p.circle(x=housePropertyDataset['GarageArea'], y=housePropertyDataset['SalePrice'], size=5, selection_color='green', nonselection_alpha=0.1, color='red', alpha=0.5) output_file('out_sales_price_garbage_area.html') show(p)
Here in the example shown above, we have used circle glyphs to generate visualization. There are lot more other glyphs available like line, cross, patches, etc that are provided by bokeh package.
We have also passed tools argument to figure method. Tools parameter will contain comma separated values which will add additional capabilities to the plot.
There are three most important arguments used in the plot: size, color, and alpha. These control the size of the glyphs, color of the glyphs and transparency of the plot respectively.
We can even create ColumnDataSource from the dataset. ColumnDataSource is a way to store data for bokeh visualization. Let's look at an example shown below:
from bokeh.plotting import figure, show, ColumnDataSource from bokeh.io import output_file p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price') source = ColumnDataSource(housePropertyDataset) p.circle(x= 'GarageArea', y = 'SalePrice', source=source) output_file('out_sales_price_garbage_area_cds.html') show(p)
As discussed earlier we can pass tools option to the figure method and specify the tools that we want to use in our visualization.
In a very similar way, we can use the hover tool as well.
from bokeh.plotting import figure, show, ColumnDataSource from bokeh.io import output_file from bokeh.models import HoverTool p = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price') source = ColumnDataSource(housePropertyDataset) p.circle(x='GarageArea', y='SalePrice', size=10, fill_color='grey', alpha=0.1, line_color=None, hover_fill_color='firebrick', hover_alpha=0.5, hover_line_color='white', source=source) hover = HoverTool(tooltips=[('sales price', '@SalePrice'), ('Condition', '@SaleCondition')]) p.add_tools(hover) output_file('out_sales_price_garbage_area_ht.html') show(p)
In the example shown above, we are using the add_tools method. Now we will have hover functionality where the plot will be highlighted on hover.
We can also have a color mapping plot with bokeh package, where we can generate colored plot based on the category.
Refer the example as shown below:
from bokeh.plotting import figure, show, ColumnDataSource from bokeh.io import output_file from bokeh.models import HoverTool, CategoricalColorMapper p = figure(x_axis_label='Installment Commitment', y_axis_label='Credit Amount') source = ColumnDataSource(housePropertyDataset) color_mapper = CategoricalColorMapper(factors=['GasA', 'GasW', 'Grav', 'Wall', 'OthW', 'Floor'], palette=['red', 'green', 'orange', 'blue', 'pink', 'yellow']) p.circle(x='GarageArea', y='SalePrice', source=source, color=dict(field='Heating', transform=color_mapper), legend='Heating') hover = HoverTool(tooltips= [('Heating','@Heating')], mode='hline') p.add_tools(hover) p.legend.location = 'top_right' p.legend.background_fill_color = 'lightgray' output_file('out_credit_score.html') show(p)
In the example shown above, we have defined a category color mapper where we have specified the color for the particular category.
After defining the category color mapper then we have specified the mapper to the color argument in glyph function (i.e. circle function).
Also, note that we have added legends to the plot and positioned them to the top right of the plot. Additionally, we have added the background color for visibility.
We can also have multiple plots in a single HTML file. In order to do so, we need to define the layout of the display of the plots. Bokeh provides the various function that we can use to define the layouts.
Before defining layouts let's define the figures that we want to plot:
from bokeh.plotting import figure, ColumnDataSource source = ColumnDataSource(housePropertyDataset) p1 = figure(x_axis_label='Garbage Area', y_axis_label='Sales Price') p1.circle(x='GarageArea', y='SalePrice', source=source) p2 = figure(x_axis_label='Overall Qualification', y_axis_label='Sales Price') p2.circle(x='OverallQual', y='SalePrice', source=source) p3 = figure(x_axis_label='Total Basement Area', y_axis_label='Sales Price') p3.circle(x='TotalBsmtSF', y='SalePrice', source=source) p4 = figure(x_axis_label='Year Remod Add', y_axis_label='Sales Price') p4.x(x='YearRemodAdd', y='SalePrice', source=source)
Now, we have figures defined that we need to plot. Let's define layouts to plot the figures.
Rows and columns
We can define figures in rows and columns. Let's look at an example shown below:
from bokeh.layouts import column, row from bokeh.io import output_file layout = column(row(p1, p2), row(p3, p4)) output_file('out_column_row.html') show(layout)
As you can see in the output that there are four plots plotted in row and column fashion.
Instead of using rows and column we can use grid plots which will give similar results. And will be much easy to define.
from bokeh.layouts import gridplot from bokeh.io import output_file row1 = [p1, p2] row2 = [p3, p4] layout = gridplot([row1, row2]) output_file('out_grid_output.html') show(layout)
We can also create a tabbed view of the plots for much better user experience.
from bokeh.models.widgets import Tabs, Panel from bokeh.io import output_file # Create tab1 from plot p1: tab1 tab1 = Panel(child=p1, title='Garbage Area vs Sales Price curve') # Create tab2 from plot p2: tab2 tab2 = Panel(child=p2, title='Overall Qualification vs Sales Price curve') # Create tab3 from plot p3: tab3 tab3 = Panel(child=p3, title='Total Basement Area vs Sales Price curve') # Create tab4 from plot p4: tab4 tab4 = Panel(child=p4, title='Year Remod Add vs Sales Price curve') # Create a Tabs layout: layout layout = Tabs(tabs=[tab1, tab2, tab3, tab4]) # Specify the name of the output_file and show the result output_file('out_tabs_output.html') show(layout)
As we have plotted individual plots, it's time to see how to establish a relationship between two or more than two plots. Here we will be discussing linked axes, linked selection and linked brushing.
Linked axes plots are the plots which share their y-axis or x-axis with other plots plotted. Consider an example shown below:
from bokeh.io import output_file p1.y_range = p2.y_range row1 = [p1, p2] layout = gridplot([row1]) output_file('out_shared_axes_output.html') show(layout)
Here is an example where we are using the same figures as defined in the Layouts section. As you might have observed in output that the y-axis is shared among two plots.
In a very similar way, we can also share x-axis as well or both x-axis and y-axis.
When a region is selected using a selector tool in one of the plots, in all the other plots the specific area is selected. This is called Linked Selection.
In order to achieve linked selection, we have to make sure that the figures are sharing the same data source (i.e. ColumnDataSource object). Since our figure is sharing the same source, you can see this happening in any of the above examples.
In this post, we have explored creating a visualization using bokeh package. We have used various tools like hover and color mapping to plot bokekh plots. After having a basic understanding of the bokeh plots we then looked at techniques to plot multiple plots as a single output. Then we saw how to create linked plots using techniques like linked axes, linked selection etc.
Hope you like the post. Don't forget to leave your opinions in the comment section below.