Skip to main content

This post covers python topics such as Reading files, Numpy ,Pandas, Matplotlib, Generators and Iterators.









1) Reading Files:

We will start with very basic file which is a iris dataset. You can download the file from here
https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv

We are going to use to methods for reading files at once here:

1) Read
2) Readlines


Here is the program to read the file:

import time


f=open("/home/amitplatinum/iris.csv",'rb')


##read

start = time.time()
f11=f.read()
end = time.time()
print(end - start)

##readlines

start = time.time()
f12=f.readlines()
end = time.time()
print(end - start)

f.close() 


print(f11)

print(f12)

So we have used read() as well as readlines(). Both the functions reads file at once in python. But their are some difference that we should be aware of.

read() function read file into a string object. On the other hand readlines() dump the file into an list object. Both are having advantages and disadvantages. If you want to do some kind of substitution in a file then read() will be better as you can easily use regular expression on string object and get the desired output, while readlines() is useful if you want to do line by line operation like removing lines from a file.

In terms of speed I have found read()  faster than readlines():
readlines() execution time: 2.59876251221e-05
read() execution time : 6.91413879395e-06


What if our file is big and we want to read it line by line or we want to read only first 5 lines? The answer is readline()  and for loop which reads file line by line. Let right a simple program for readline().


f=open("/home/amitplatinum/iris.csv",'rb')
i=0
while i<2:
    n=f.readline()
    print(n)
    i=i+1

The output will will be:
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa

You can also use for loop to read file line by line:

for line in f:
      print(line)

The output is:
sepal_length,sepal_width,petal_length,petal_width,species

5.1,3.5,1.4,0.2,setosa

4.9,3,1.4,0.2,setosa

4.7,3.2,1.3,0.2,setosa

4.6,3.1,1.5,0.2,setosa

5,3.6,1.4,0.2,setosa

.................so on


2) Numpy Arrays:

Numpy short for Numerical Python is the fundamental package required for high performance scientific computing and data anlysis.

Below are some of the things that it provides:

1) A fast and space-efficient multidemnsional array providing vectorized artihermetic operations called as ndarray

2) Standard mathematical functions for fast operations on entire array of data.

3) Numpy provides low level data analytical functionality.


Numpy ndarray: 

  • It is a multidimensional container for homogeneous data.
  • An array has shape which is represented by a tuple value represents no. of rows and columns.
  • A dtype object, describing the data type of the array.


numpy tutorial axis



Creating ndarrays:

1) List Conversion: simple list

                                data1=[6,7.5,8,0,1]

                                arr1=np.array(data1)

                                arr1-> [ 6.   7.5  8.   0.   1. ]

                                arr1.ndim ->1

                                type;> <type 'numpy.ndarray'>


2) List Conversion: Lists of List

                                 Lets say you have nested list instead of simple list:

                                 data1=[[6,7.5,],[8,0,1]]

                                 arr1=np.array(data1)
                               
                                 arr1.ndim -> 2
                                
                                 type -> <type 'numpy.ndarray'>



 






Comments