Python for Data Science - part 1

P

ython is really vast programming language which provides lot of packages that we can use. Using python we can develop:

  1. Automation
  2. Desktop application
  3. Android
  4. Web
  5. IOT like home automation
  6. Data Science and list goes on.

In this post, our major focus will be learning about essentials that are required to be a data scientist. Below are the list of topics covered in this post:

  1. Downloading and Installing Python on system
  2. Executing Python Scripts
  3. Variables and types (Integer, Float, String, Boolean, None, Lists, Dictionaries)
    • Type Hinting
    • Escape Sequence
    • List Slicing
    • Looping over the list
  4. If/else blocks
  5. Truthy Values
  6. Exception Handling
  7. Other data Types (Complex, bytes and bytearray, tuple, set and frozenset)

Let's get started.

Download Python

Which python version should I use Python 2 or Python 3?

I would say just go for python 3. Though python 2 is maintained but no new features are added. Whereas in python 3 lot of improvements are happening.

Link from where you can download Python: https://www.python.org/downloads/

Python scripts

All the python related files are stored with extension .py. This file contains a list of commands that are executed by the Python compiler to achieve the desired results.

Let us create a simple <code>index.py</code> file. Below is the code where you can see that we have used #. It is used for adding comments to python file. And print is the function which is used to output our results.

#index.py

print(2 + 3)
print(4/10)
print(2**4) # ** is for exponential


#output:
5
0.4
16

As you can see in above post we have performed basic arithmetic operations. The shell where python code is executed is known as IPython Shell.

Variable and Types

In python, it is not necessary to create types.

In python we can save our data in a variable as shown below:

score = 32.7

And if we want to check the type of our variable, we can simply use:

score = 32.7
type(score)

# output
float

Below are the types that are supported by Python:

  1. Integer
  2. Float
  3. String
  4. Boolean
  5. None
  6. Lists
  7. Dictionaries

Type Hinting

As we know now there is no need to define types in python while declaring a variable. But sometimes from a development perspective, it is useful to add type hintings to avoid runtime errors. Look at the example as shown below:

def add_number ( x : int, y : int) -> int :
     return a + b

Here in above code now if we try to call function as <code>add_number(5, "name")</code> our IDE will start complaining even before the code is executed. 

So my recommendation is it is always preferred to provide hinting to our codebase to avoid runtime errors.

 

Now we are all good and set to explore different types that python supports.

Integer and float

As we have seen above we can define integer and float in python simply as:

id = 3
pi = 3.14

To convert an integer to float or from float to integer we can simply do it as shown below:

int(3.14) # 3

float(3) # 3.0

Strings

We can simply declare a string in Python as shown below:

name = "John"

There are lot of methods that exist in the string:

"hello".capitalize() # Hello - Capitalize first letter of string

"hello".replace('e','a') # hallo

"hello".isalpha() # True

"123".isdigit() # True

'hello,hi, there'.split(',')  # ['hello','hi','there']

String format

We can even format our string to display it in a way we want as shown below:

"Hello {0}".format("Tavish") # Hello Tavish

Another format:

name = 'Tavish'

f"Hello {name}"

# output
Hello Tavish

Escape Sequence

Similar to any another programming language Python also has escape Sequence. For e.g new line. Refer the code shown below:

description = 'I am learning\nPython' # \n for new line
print(description)

#output
I am learning
Python

Follow the link to know the whole list of escape sequence that Python supports.

Boolean

Boolean is simply True and false. And in Python we define a variable as boolean:

valid = True
invalid = False

None

We also have a data type in Python known as None. If we have some variable whose value is not known, then we can define it as None.

name = None

Lists

name = ['name1', 'name2'];

print(name[0]) # name1

name.append('name3') # appended at end of list

'name1' in name # True

len(name) # gives length of the list

In python, we can add multiple types to the list.

We can even delete the value from the list as shown below:

del name[0]

list slicing:

If we have a requirement where we need to do functions only with the part of the string then we can do it as shown below:

name[1:] # skips first element

name[1:-1]  # skips first and last element in the list

NOTE: When we are slicing the list in way name[1,2] name[1] is included int he list whereas name[2] will not be included the list.

Looping:

To loop over all elements of the list we can use for loop.

names =['john', 'rohit', 'jass']

for name in names:
      print(name)

A range is a function which defines how many numbers of the time do we want our loop to execute. In the example shown below, the loop will be executed 5 times.

for index in range(5)
     print(index)

Range function can also accept three parameters:

  1. Start index
  2. End index
  3. Increment

For e.g:

range (5,10,2) will generate a list as [5,7,9] to iterate over.

If we need a counter to iterate over the list we can use enumerate() function.

names =['john', 'rohit', 'jass']

for i,a in enumerate(names) :
        print('student  ' + str(i) + ' name is  '+ str(a))

# output:
student  0 name is  john
student  1 name is  rohit
student  2 name is  jass

Dictionaries

Dictionaries allow us to store key-value pair as similar as a JSON object.

student = {
   "name": "Python",
    "code" : 123,
    "valid" : True
}

If we want all the keys of dictionary we can get by:

student.keys()

And if we want to check whether the key exists in the dictionary:

print('name' in student)

#output
True

If we want all the data in the dictionary we can get by:

student.values()

To delete both key and value from the dictionary:

del student['name']

Similar to list dictionaries can also have multiple data types.

We can also create a list of dictionaries.

students = [
   {"name": "Python", "code" : 123, "valid" : True},
   {"name": "Post", "code" : 12, "valid" : False}
];

If/else

Python uses indentation instead of curly braces for defining code blocks.

Example showing if else:

name = 'Python'

if name == 'Python':
      print('I am python')
elif name == 'Learing Python':
      print('I am Learing Python')
else:
      print('I am not python')

Instead of nesting else and if Python provides a combined version if it <code>elif</code>

Truthy values

We can also perform boolean check operations in if loop and based on the truthy results the expressions are evaluated, Let's look at the example shown below:

number = 10

if number and number > 5:
     print('I am defined')

Exception handling

We have explored different data types that are there in Python. It's always a good practice to handle errors in code so that our code doesn't break if something goes wrong. Let explore how to handle errors.

Suppose we have a student dictionary as defined below:

student = {
    "name": "Python",
    "code" : 123,
    "valid" : True
}

and we try to access key that doesn't exist as shown below:

student['firstName']

It will result in an error. But wait we can handle these errors as shown below:

try:
     student['firstName']

except KeyError as error:

     print("error occurred while accessing firstName") 
     print(error)

Here in above case, we are specifically catching only KeyError but what if we don't know what kind of error will occur in the application. Then how will we handle that?

To handle all the error we can simply use Exception instead of KeyError. We can improve our above code in a way as shown below:

try:
     student['firstName']

except Exception:

     print("error occurred while accessing firstName")

We do have some other data types in Python 3 as well like:

  1. Complex
  2. bytes and bytearray
  3. tuple
  4. set and frozenset

Let's look at them in detail:

Complex

The complex is data type to store complex number. Syntax to define complex number is as shown below:

complex(3,5) # where 3 is real part and 5 is imaginary part of a complex number

The default value of imaginary part of complex number is zero.

bytes and bytearray

Bytes and bytearray are almost similar. The only difference between bytes and bytearray is mutable and bytes are un mutable.

Let look how to create a byte in Python 3:

arr = [1,2,30] # Array which we need to convert to byte
byt = bytes(arr) # Converted to byte
list(byt) # To convert bytes to original data

Code to declare bytearray in Python 3:

arr = [1,2,30] # Array which we need to convert to bytearray
byt = bytearray(arr) # Converted to bytearray
list(byt) # To convert bytearray to original data

tuple

We also have data type tuple. A tuple is similar to list but is immutable. Refer the code shown below to declare tuple:

tup = (1,4, "Python")

And we can access a value from tuple as shown below:

tup = (1,4, "Python")
print(tup[0]) # 1
print(tup[2]) # Python

set and frozenset

Set and frozeset are also similar to list but they have only unique values. The difference between set and frozenset is that set is mutable whereas frozenset is immutable.

Let's look at the sample code to declare set as shown below:

set([1,1,'name','Python',4, 'Python'])

# output
{1, 'name', 4, 'Python'}

Let's look at the sample code to declare frozenset as shown below:

frozenset([1,1,'name','Python',4, 'Python'])

# output
{1, 'name', 4, 'Python'}

That's all in this post. In the upcoming post, I will share some more features that we have in python. And is useful to know to be a data science. If you have any questions please let me know in the comment section.

Author Info

Tavish Aggarwal

Website: http://tavishaggarwal.com

Tavish Aggarwal is a front-end Developer working in a Hyderabad. He is very passionate about technology and loves to work in a team.

Category