SQLAlchemy - ORM for Python

Tavish Aggarwal

February 9, 2022

 

I

n this post, we will discuss SQLAlchemy package which is used for connecting to the Object -Relational databases like:

  1. SQLite
  2. Postgres
  3. MySql and lot more

Some of you might wonder why we need a package to connect to the database when we can connect it directly?

SQLAlchemy package is an ORM which sorts out our development efforts and is very useful. You will understand the advantage of using SQLAlchemy package by end of this post.

ORM (Object Relational Model) is always useful and it helps us to speed up the development.

I am also sharing my Jupyter Notebook to follow along with the post.

Prerequisites

Basic knowledge about the relational database is a must before diving into this post. Below are the topics that you should know before starting with the post:

  1. Creating a table
  2. Insert statements to insert data in a table
  3. Select statements to select rows based on conditions
  4. Update statements to update data in the table
  5. Ordering and aggregating data 
  6. Group by clause and joining the tables

We will start with the SQLite sample database: Chinook. If you are not able to download from the link specified you can always use a file from GitHub.

Topics Covered

Below are the topics covered in this post that SQLAlchemy package has to provide:

  1. Connecting to the database
  2. Creating a Table
  3. Inserting data in a table
    1. Inserting single row in a table
    2. Inserting multiple rows in a table
  4. Selecting data from a table
    1. Filtering data using where clause
    2. Filtering data using in clause
    3. Filtering data using and/or/not operator
    4. Ordering data
      1. Ordering data based on a single column
      2. The reverse ordering of data
      3. Ordering data based on multiple columns
      4. Aggregating and grouping data
        1. Aggregating data
        2. Grouping Data
      5. Renaming column labels
      6. Using case and cast statements
      7. Performing arithmetic operations
  5. Joining tables
  6. Update Operations
  7. Creating Pandas Dataframe out of Result Set
  8. Delete Operation
    1. Deleting data based on a condition
    2. Deleting all of the Data
    3. Deleting Table from the Database

Let's see how to connect to the database engine using SQLAlchemy package.

Connecting to the database

To connect to the database we need create_engine provided by SQLAlchemy package. Below is the code demonstrating to connect to SQLite engine:

# Import create_engine
from sqlalchemy import create_engine

engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')
# Print table names
print(engine.table_names())

# Defining metadata object
metadata = MetaData()

Also, we are using a table_names() function to get all the tables in the database to which we are connecting.

NOTE: Here we are using SQLite database, you can use any other SQL database as well, the only difference would be the connection string rest of the commands shared in post will remain same for any of the SQL engine.

Once we are connected to the database, the next step is to create tables in the database.

Creating a Table

Using SQLAlchemy package we can create a table in the database. The code used to create a table is as shown below:

from sqlalchemy import MetaData, create_engine, Table, Column, String, Integer, Float
from sqlalchemy import Boolean, delete, insert, select,func, and_, desc, case, cast


engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')
metadata = MetaData()

data = Table('instrument', metadata,
             Column('InstrumentID', Integer(), unique=True),
             Column('ArtistID', Integer(), unique=True),
             Column('Instrument_Name', String(500)),
             Column('Cost', Integer())
)
# Use the metadata to create the table
metadata.create_all(engine)


# We have created instrument table having artistID as well

Here, we are using metadata and creating a table in the metadata, it is then created to an engine. Let's understand why we are creating tables in metadata?Metadata is one central object where all the tables created are linked to. Whenever we have to make an update in the table structure we will make it in metadata which thereafter will be done in the database.

It is also useful to add foreign keys (linking tables) in the database.

Once the table is created let's see how we can access the information of the table.

# Get table info
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

# Print the column names
print(instrument.columns.keys())

# Print the table details
print(repr(metadata.tables['instrument']))

Once our table is created it's time to insert data in the tables.

Inserting data in a table

Inserting single row in a table

Let's see how to insert data in the table created above. To insert data we are using insert function provided by SQLAlchemy package. 

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = insert(instrument).values(InstrumentID=1, ArtistID = 1, Instrument_Name='Guitar', 
Cost=245)

results = engine.execute(stmt)

# Returns number of rows affected
print(results.rowcount)

We are printing rowcount once the data is inserted in the table. It will return the number of rows affected when the statement is executed.

Inserting multiple rows in a table

We can even insert multiple rows in the table at once. We can do this by creating an array of dictionary object. Let's see this in action:

## Inserting multiple row in the table
instrument_list = [
    {'InstrumentID': 2, 'ArtistID': 2, 'Instrument_Name': 'Tabla', 'Cost': 200},
    {'InstrumentID': 3, 'ArtistID': 3, 'Instrument_Name': 'Vilon', 'Cost': 500}
]

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = insert(instrument)
results = engine.execute(stmt, instrument_list)

print(results.rowcount)

We can insert a csv file into the database by converting a csv file to the array of dictionary object. Let's see an example to insert a csv file into the table:

# Reading from csv file
instruments_csv = pd.read_csv('instruments.csv')
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = insert(instrument)

instrument_list = []

for index, row in instruments_csv.iterrows():
    data = {'InstrumentID': row[0], 'ArtistID': row[1], 
                'Instrument_Name': row[2], 'Cost': row[3]}
    instrument_list.append(data)

results = engine.execute(stmt, instrument_list)
print(results.rowcount)

Once we have tables created and data inserted into the table. Let's see how we can select the data from tables based on condition and lot more.

Selecting data from a table

Let's see how to select all the data from a table without using SQLAlchemy package. 

# Selecting data without using SQLAlchemy package
stmt = 'SELECT * FROM instrument'

results = engine.execute(stmt).fetchall()
print(results)

NOTE: Reason for showing the above code is that sometimes we have a very complex query to write which is difficult to write using ORM.

Now let's see how to select data from the table using SQLAlchemy package. 

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument])

# SQL Statement while will be executed
print(stmt)

result = engine.execute(stmt).fetchall()

# First row
first_row = result[0]

print(first_row)

# First column in first row
print(first_row[0])

print(first_row['Instrument_Name'])

There are various types of select queries that we can write in SQLAlchemy package. Below is the list:

  1. Filtering data using where clause
  2. Filtering data using in clause
  3. Filtering data using and/or/not operator
  4. Ordering data
    1. Ordering data based on a single column
    2. The reverse ordering of data
    3. Ordering data based on multiple columns
  5. Aggregating and grouping data
  6. Renaming column labels
  7. Using case and cast statements
  8. Performing arithmetic operations

Filtering data using where clause

We can even filter our result using where clause as shown in an example below:

# Selecting using where clause
customer = Table('customer', metadata, autoload=True, autoload_with=engine)

stmt = select([customer])

stmt = stmt.where(customer.columns.PostalCode == '12227-000')

# To get an equivalent SQL query that will be executed
print(stmt) 

results = engine.execute(stmt).fetchall()

for result in results:
    print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT customer."CustomerId", customer."FirstName", customer."LastName", 
customer."Company", customer."Address", customer."City", customer."State", 
customer."Country", customer."PostalCode", customer."Phone", customer."Fax", 
customer."Email", customer."SupportRepId" 
FROM customer 
WHERE customer."PostalCode" = :PostalCode_1

Filtering data using in clause

We can filter our result using in clause as shown in an example below:

# Selecting using in clause
customer = Table('customer', metadata, autoload=True, autoload_with=engine)

stmt = select([customer])

stmt = stmt.where(customer.columns.PostalCode.in_(['12227-000', '14700']))

results = engine.execute(stmt).fetchall()

for result in results:
    print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT customer."CustomerId", customer."FirstName", customer."LastName", 
customer."Company", customer."Address", customer."City", customer."State", 
customer."Country", customer."PostalCode", customer."Phone", customer."Fax", 
customer."Email", customer."SupportRepId" 
FROM customer 
WHERE customer."PostalCode" IN (:PostalCode_1, :PostalCode_2) 
ORDER BY customer."PostalCode"

Filtering data using and/or/not operator

Filtering data is not just limited to 'where' or 'in' clause. We can filter out results using and_(), or_() and not_() function as well. The example shown below represents a use of and_() function. In the very similar way, we can use other functions as well.

# Selecting using and clause
customer = Table('customer', metadata, autoload=True, autoload_with=engine)

stmt = select([customer])

stmt = stmt.where(
    and_(
        customer.columns.Country == 'Brazil',
        customer.columns.Email == '[email protected]'
    )
)

results = engine.execute(stmt).fetchall()

for result in results:
    print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT customer."CustomerId", customer."FirstName", customer."LastName", 
customer."Company", customer."Address", customer."City", customer."State", 
customer."Country", customer."PostalCode", customer."Phone", customer."Fax", 
customer."Email", customer."SupportRepId" 
FROM customer
WHERE customer."Country" = :Country_1 AND customer."Email" = :Email_1

Ordering data

Ordering our results is an operation that we need to perform to get ordered dataset. SQLAlchemy package allows us to order our data from a table in ascending or descending order. We can also order our data based on one or multiple columns. Examples shown below demonstrate ordering of data

Ordering data based on a single column

Let's begin with a basic ordering of data based on one column. The default order will return results sorted in ascending order.

#ordering data with one column
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument])

stmt = stmt.order_by(instrument.columns.Instrument_Name)

result = engine.execute(stmt).fetchall()
print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."InstrumentID", instrument."ArtistID", 
instrument."Instrument_Name", instrument."Cost" 
FROM instrument 
ORDER BY instrument."Instrument_Name"

The reverse ordering of data

We can even sort the data from a table in descending order. To sort data in descending order we need to use an additional function called desc().

#ordering data with one column in reverse order
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument])

stmt = stmt.order_by(desc(instrument.columns.Instrument_Name))

result = engine.execute(stmt).fetchall()
print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."InstrumentID", instrument."ArtistID", 
instrument."Instrument_Name", instrument."Cost" 
FROM instrument 
ORDER BY instrument."Instrument_Name" DESC

Ordering data based on multiple columns

What if the column that we are ordering have same values, then maybe we need to consider different column to order data correctly. In this case, we need to order data based on multiple columns. The example shown below demonstrate ordering based on multiple columns:

# ordering data with more than one column
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument])

stmt = stmt.order_by(desc(instrument.columns.Cost), instrument.columns.Instrument_Name)

result = engine.execute(stmt).fetchall()
print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."InstrumentID", instrument."ArtistID", 
instrument."Instrument_Name", instrument."Cost" 
FROM instrument 
ORDER BY instrument."Cost" DESC, instrument."Instrument_Name"

Aggregating and grouping data

Aggregation and grouping our similar data is again a common operation that Data Scientist need to solve his daily work problems. Let's explore the different type of aggregation operations and how to group our data in SQLAlchemy package.

Aggregating data

We can perform various type of aggregation functions like:

  1. Sum
  2. Avg
  3. Count
  4. Max
  5. Min

The example shown below demonstarte use of count() aggregation function. But in a similar way, we can use other aggregation functions as well.

# Aggregating data
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([func.count(instrument.columns.Instrument_Name.distinct())])

distinct_instruments = engine.execute(stmt).scalar()

print(distinct_instruments)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT count(DISTINCT instrument."Instrument_Name") AS count_1 
FROM instrument

Grouping Data

We can group data based on a specific column. The example shown below groups data based on the instrument_name column from instrument table which we have created in this post.

# Grouping up the data
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument.columns.Instrument_Name, 
               func.count(instrument.columns.Instrument_Name), instrument.columns.Cost])

stmt = stmt.group_by(instrument.columns.Instrument_Name)

results = engine.execute(stmt).fetchall()

print(results)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."Instrument_Name", count(instrument."Instrument_Name") AS count_1, 
instrument."Cost" 
FROM instrument 
GROUP BY instrument."Instrument_Name"

Renaming column labels

When we perform any aggregation operation or an arithmetic operation on a column we need to give a name to it. We can use label function to rename the column. The example to rename column is as demonstrated below:

# renaming column
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

instruments_count = func.count(instrument.columns.Instrument_Name).label('instruments_count')

stmt=select([instruments_count, instrument.columns.Instrument_Name, instrument.columns.Cost])

stmt = stmt.group_by(instrument.columns.Instrument_Name)

results = engine.execute(stmt).fetchall()

print(results)

print(results[0].keys())

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT count(instrument."Instrument_Name") AS instruments_count, 
instrument."Instrument_Name", instrument."Cost" 
FROM instrument 
GROUP BY instrument."Instrument_Name"

Using case and cast statements

If you have experience working on SQL databases, then you might have come across the case and cast statements.

Case statements are basically selecting up the data based on condition and cast statements are used to convert the data types.

Let's see in an example shown below to use case and cats statements:

# case and cast statements
instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

price_gt_250 = case([
        (instrument.columns.Cost < 250, True)
    ], else_= False)

total_pop2000 = cast(price_gt_250, Boolean)

stmt = select([instrument.columns.Instrument_Name, price_gt_250])

result = engine.execute(stmt).fetchall()
# Print the percentage
print(result)

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."Instrument_Name", 
CASE WHEN (instrument."Cost" < :Cost_1) THEN :param_1 ELSE :param_2 END AS anon_1 
FROM instrument

Performing arithmetic operations

We can perform arithmetic operations using SQLAlchemy package. The basic arithmetic operations that we can perform are like addition, subtraction, multiplication, division and lot more.

The example shown below demonstrates the addition of two string columns:

customer = Table('customer', metadata, autoload=True, autoload_with=engine)

stmt = select([customer.columns.CustomerId, (customer.columns.FirstName + 
           ' ' + customer.columns.LastName).label('customer_name')])

stmt = stmt.order_by('customer_name')

stmt = stmt.limit(5)

results = engine.execute(stmt).fetchall()

for result in results:
    print(result, ' ')

You can also get equivalent SQL command as well when you print 'stmt' before executing the query. The equivalent SQL of the above query is:

SELECT customer."CustomerId", 
customer."FirstName" || :FirstName_1 || customer."LastName" AS customer_name 
FROM customer 
ORDER BY customer_name
LIMIT :param_1

Joining tables

Joining of tables is one of the operations that every data scientist needs in his bucket. As the data we are interested in is hardly stored in one table. So we need to join multiple tables and get the data that we are looking for.

Let's see the example shown below demonstrating the joining of two tables:

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)
artist = Table('Artist', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument, artist])

stmt = stmt.select_from(
    instrument.join(artist, instrument.columns.ArtistID == artist.columns.ArtistId))

result = engine.execute(stmt).fetchall()

# Printing first two items from result set
print(result[:2])

You can also get equivalent SQL command as well when you print ' stmt ' before executing the query. The equivalent SQL of the above query is:

SELECT instrument."InstrumentID", instrument."ArtistID", instrument."Instrument_Name", 
instrument."Cost", "Artist"."ArtistId", "Artist"."Name" 
FROM instrument 
JOIN "Artist" ON instrument."ArtistID" = "Artist"."ArtistId"

Update Operations

We can also perform update operations on the dataset. We can either update a single row in the table or we can update multiple rows in the table. Let's look at an example shown below:

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = update(instrument).values(Cost = 200).where(instrument.columns.Cost < 250)

results = engine.execute(stmt)

print(results.rowcount)

# checking data after updating it
stmt = select([instrument])
results = engine.execute(stmt).fetchall()
print(results)

Creating Pandas Dataframe out of Result Set

This is a very interesting feature that we have with pandas. The result set that we get from the SQLAlchemy as an output can be converted to pandas dataset.

Yup, you heard it right!!

Let's see the example shown below:

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = select([instrument])
results = engine.execute(stmt).fetchall()

df = pd.DataFrame(results)

df.columns = results[0].keys()

print(df)

Delete Operation

We can delete data from the table or table from the database based on conditions. Let's see what are different delete operations that we can perform using SQLAlchemy package.

  1. Deleting data based on a condition
  2. Deleting all of the data
  3. Deleting table from the database

Deleting data based on a condition

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = delete(instrument)
stmt = stmt.where(instrument.columns.Cost < 250)

results = engine.execute(stmt)

print(results.rowcount)

Deleting all of the Data

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)

stmt = delete(instrument)

results = engine.execute(stmt)

print(results.rowcount)

Deleting Table from the Database

instrument = Table('instrument', metadata, autoload=True, autoload_with=engine)
customer = Table('customer', metadata, autoload=True, autoload_with=engine)

instrument.drop(engine)
print(instrument.exists(engine))

# Dropping all tables
metadata.drop_all(engine)
print(customer.exists(engine))

I have tried to cover topics that any data Scientist needs on the daily basis when dealing with relational database. There is lot more that SQLAlchemy package has to offer. You can always know more concepts on the Offical documentation of SQLAlchemy package. Please leave a comment if you want me to explain any specific command of SQLAlchemy package.

Author Info

Tavish Aggarwal

Website: http://tavishaggarwal.com

Living in Hyderabad and working as a research-based Data Scientist with a specialization to improve the major key performance business indicators in the area of sales, marketing, logistics, and plant productions. He is an innovative team leader with data wrangling out-of-the-box capabilities such as outlier treatment, data discovery, data transformation with a focus on yielding high-quality results.