Python for Data Science - part 3

Tavish Aggarwal

et's go one more step further and understand some more concepts in Python. In my previous posts, I have covered python concepts which are essential to be a good data scientist. If you have missed the posts I strongly recommend you to go through the post:
  1. Python for Data Science - part 1
  2. Python for Data Science - part 2

Here in this post, I will be covering following topics:

  1. Random Generators using numpy
  2. Functions in Depth
  3. Error Handling
  4. Iterators
  5. zip keyword
  6. List comprehension
  7. Dictionary Comprehension
  8. Generators
  9. yield keyword

Let's get started.

Random Generators using numpy

As we have already explored numpy in our previous articles. If not I recommend reading the blog: Python packages for Data Science In numpy array, we can also generate random numbers using rand() function. The code shown below is used to generate random numbers:

import numpy as np

# It will generate random number every time

But what if we want to generate a random number with specific range. Consider a scenario of rolling a dice. When we roll a dice we get an output as 1,2,3,4,5 and 6. Below code demonstrates the scenario of rolling a dice:

import numpy as np

dice = np.random.randint(1,7)

We have used seed function above to constraint output within range. We can pass any value in seed function.

Functions in Depth

We have discussed functions in Python for Data Science - part 2. Now its time to go one more step further and explore more about functions. Here I will be talking about:

  1. The scope of variables in functions
  2. Default and flexible arguments in functions
  3. Passing key-value pair to function
  4. Lambda functions

Let's start with a scope of variables in or outside of the function. There is basically two level of variable scopes. These are:

  1. Global scope
  2. Local scope

Python provides global keyword with which we can declare a variable as global. Let see this in action. Consider we have a variable defined outside of the function:

number = 1

def fun1():
    number = 10 # local copy of number is created

def fun2():
    global number 
    number= 30 # with global keyword number value will be updated in global scope

print('Number value in global scope: ' + str(number)) # 1
print('Number value in global scope after executing fun1: ' + str(number)) # 1
print('Number value in global scope after executing fun2: ' + str(number)) # 30

hmmm! Strange, how come the number value changes to 30 after executing fun2? This happened because with the global keyword new memory is not allocated for the variable. Instead, the global variable reference is updated with the value 30.

Error Handling

Sometimes, we don't want the execution of code to break at runtime. So, it's important to catch any errors if occurred during code execution. In Python, we handle exceptions using try-except block. Let's explore more into it with help of example shown below:

    word = 'test' * '23'
    print('Error Occurred.')

As we know we cannot multiply string with string so the flow will go to except block and 'Error Occurred.'

Another way to handle an error is using raise keyword.

    raise Exception('Exception handled using raise keyword')
except Exception as e:

The code shown above is using raise keyword to handle an exception. 

NOTE: It is generally not a good idea to handle general exception. We should be very specific to handle the type of Exception. For e.g. `ValueError`.

Iterators in Python

We can iterate over the data structures using for loop as shown below:

names = ['Jack', 'Alley', 'Wann', 'Haley']

for name in names:

The above is the more common way to iterate over the list. But we can also iterate over the list using iterators as shown below:

names = ['Jack', 'Alley', 'Wann', 'Haley']

person = iter(names)

# Iterating over the list when needed


The advantage of using Iterators is that we can iterate over the list on demand. It plays a major role when we are dealing with very large datasets.

zip keyword in Python

The zip() function takes iterables, makes an iterator that aggregates elements, and returns an iterator of tuples. To know how zip function works look at the example shown below:

names = ['Rock', 'Bob', 'Rony']
classes = ['10', '1', 'A1']
subjects = ['Maths', 'English', 'Computer science']

zipped_data = zip(names, classes, subjects)

zipped_to_list = list(zipped_data)
print('Zip to list: ' + str(zipped_to_list))

# unpack zipobject:
zipped_data = zip(names, classes, subjects)

print('Unpacking zip object')
for value1, value2, value3 in zipped_data:
    print(value1, value2, value3)

<zip object at 0x10339cd88>
Zip to list: [('Rock', '10', 'Maths'), ('Bob', '1', 'English'), ('Rony', 'A1', 'Computer science')]

Unpacking zip object
Rock 10 Maths
Bob 1 English
Rony A1 Computer science

In the code shown above, we have created a zip object out of three lists. Then we have converted a zipped object to list. Also, we have written code to unpack the zip object using for loop. There is another way to unpack the zip object:

names = ['Rock', 'Bob', 'Rony']
classes = ['10', '1', 'A1']
subjects = ['Maths', 'English', 'Computer science']

zipped_data = zip(names, classes, subjects)

zipped_data = zip(names, classes, subjects)
result1, result2, result3 = zip(*zipped_data)
print(result1, result2, result3)

# output
('Rock', '10', 'Maths') ('Bob', '1', 'English') ('Rony', 'A1', 'Computer science')
('Rock', '10', 'Maths') ('Bob', '1', 'English') ('Rony', 'A1', 'Computer science')

List Comprehension

List Comprehension is an effective way of creating a list. It also reduces no of lines of code that we would have written with for loop. Let's see the syntax to create our very first list comprehension:

squares = [i* i for i in range(10)]

# output
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The syntax seems to be confusing at first sight. But as you practice you will become a pro writing list comprehensions. To above code is very similar to as shown below, if we would like to generate same output without using a list comprehension:

square = [];
for value in range(10):
   square.append(value * value)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can even generate a nested list comprehension. To demonstrate we will be creating a 3 x 3 matrix using a list comprehension:

matrix = [[col for col in range(0,3)] for row in range(0,3)]

# output
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

To some of you syntax seems to be confusing, but as you go around with list comprehension you will master it.

We also create a conditional list comprehension.

subjects = ['Maths', 'English', 'Computer science']
math_subject = [subject for subject in subjects if subject == 'Maths']


We can use if else block also to generate conditional list comprehension:

subjects = ['Maths', 'English', 'Computer science']
math_subject = [subject if subject == 'Maths' else subject + '!' for subject in subjects ]

['Maths', 'English!', 'Computer science!']

Dictionary Comprehension

We also create a dictionary using dictionary comprehension. The syntax to generate dictionary comprehension is very similar to that of generating list comprehension. The only difference is we use '[]' to generate list comprehension and '{}' to generate dictionary comprehension.

Consider the example shown below to generate dictionary comprehension:

subjects = ['Maths', 'English', 'Computer science']
subject_dictionary = {member: len(member) for member in subjects}

# output
{'Maths': 5, 'English': 7, 'Computer science': 16}

Above you can see that we have created keys as subject name and values as a length of characters in the subject name.


List comprehensions and generator expressions look very similar in their syntax, except for the use of parentheses () in generator expressions and brackets [] in list comprehensions.

Difference between generators and list comprehension:

The generator expression is doing the same thing as a list comprehension. The only difference between list comprehension and generator expression is list comprehension returns the list but generator expression returns an object that can be iterated over.

Let's see how to create a generator expression:

subjects = ['Maths', 'English', 'Computer science']

subject_generators = (member for member in subjects)


# output
<generator object <genexpr> at 0x10b754a40>

yield keyword

We can even use yield keyword to return a result from a generator function. Example showing the use of yield keyword:

subjects = ['Maths', 'English', 'Computer science']

def toUpperCase(inputList):
    for value in inputList:
        yield value.upper()

for subject in toUpperCase(subjects):

Difference between yield and return:

To understand the difference between yield and return keyword consider the example shown below:

subjects = ['Maths', 'English', 'Computer science']

def toUpperCase(inputList):
    for value in inputList:
        yield value.upper()

subject = toUpperCase(subjects)

# output

But if use return keyword in function, then we can't use next as we are using above. So to simplify yield keyword returns generators which we can iterate over as needed, but return keyword return the list.

That all from the post. Here we have explored some more new concepts in Python. Stay tuned for upcoming posts on Python and data science.

Author Info

Tavish Aggarwal


Tavish Aggarwal is a front-end Developer working in a Hyderabad. He is very passionate about technology and loves to work in a team.