NumPy (Numerical Python) is one of the famous libraries, which is used heavily in data science. If you have worked in any data science problem you might have heard about them.
PS: Second part is also released and linked at the end of this post.
What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. Source
So What? Why to use NumPy, isn't Python lists already there?
There are several reasons to use NumPy over Python lists.
- NumPy is fast, like blazing fast then Python lists.
- It facilitates advanced mathematical and similar scientific and computational operations on large numbers. NumPy handles it with less code and is executed more efficiently.
Let's see a glimpse about, how NumPy is handy?
We have two lists we want to multiply respective elements and store it in another list. it is given that both the lists have same length. If we want to perform the same in Python lists we will do something similar to this.
rows = 5 a = [1, 2, 3, 4, 5] b = [6, 7, 8, 9, 10] c =  for i in range(row): c[i] = a[i]*b[i]
Now assume we have a two dimensional array and we want to multiply restive elements of the given matrices and form a new matrix. Then we would probably write something like this.
for i in range(rows): for j in range(columns): c[i][j] = a[i][j]*b[i][j]
Now here is the catch, with numpy we can simply write:
c = a * b
It is cool, isn't it? 😇
Difference between NumPy and Python Lists?
|NumPy provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it.||Lists are used to store multiple items in a single variable.|
|NumPy arrays have a fixed size at creation||Python lists can grow dynamically|
|The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.||The elements in Lists can be of different data types.|
|NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data.||Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.|
If you use
pip, you can install NumPy with:
pip install numpy
If you use
conda, you can install NumPy from the
# Best practice, use an environment rather than install in the base env conda create -n my-env conda activate my-env # If you want to install from conda-forge conda config --env --add channels conda-forge # The actual install command conda install numpy
Playing with NumPy
import numpy as np
Let's create an equivalent of list in NumPy.
normal_list = [1, 2, 3, 4, 5] np_list = np.array(normal_list)
Let's check the speed of NumPy
normal_list = list(range(1, 10000000)) np_list = np.array(normal_list)
Let's subtract 1 from eash element
%%time a = [x-1 for x in normal_list]
CPU times: user 426 ms, sys: 132 ms, total: 558 ms Wall time: 571 ms
%%time a =  for x in normal_list: a.append(x-1)
CPU times: user 1.12 s, sys: 155 ms, total: 1.28 s Wall time: 1.28 s
%%time a = np_list-1
CPU times: user 109 ms, sys: 20.3 ms, total: 129 ms Wall time: 129 ms
Why NumPy is so fast?
- Vectorized Code
- Less lines of code resulting in less bugs
- Code resembles standard mathematical notations
- Pythonic Code
Indexing of NumPy Arrays
Given a NumPy array how to access specific indexes of the given
np_array. We can access the
np_array as we would with the python
a = np.array([1, 2, 3, 4, 5, 6, 7]) #Let's try to access the first, third, last and second last elements print(a, a, a[-1], a[-2]) # Let's try it with a multi-dimensional array. new_a = np.array([ [1, 2], [3, 4], [5, 6] ]) # Let's try to access 2nd row and 2nd column i.e. 4 new_a[1, 1] # Output: 4 # Let's try to access 3rd row and 1st column i.e. 5 new_a[2, 0] # Output: 5 # Let's try to access the whole second row i.e. [3, 4] new_a # Output: array([3, 4]) # Let's try to access the whole second column i.e. [2, 4, 6] new_a[:, 1] # Output: array([2, 4, 6])
It is all cool, right? Every example is self explanatory in it's own except the last example. In the index we have passed,
: what is it? why it did not throw any syntax error and so on. Well. let's take a look at that as well.
lists via indexes
Listed items can be accessed by referring to their index number
a = np.array([ [1, 2], [3, 4], [5, 6] ]) print(a) # Output: [1, 2]
a = np.array([ [1, 2], [3, 4], [5, 6] ]) print(a[-1]) # Output: [5, 6]
Range of Indexes
a = np.array([ [1, 2], [3, 4], [5, 6] ]) print(a[1 : 2]) # Output: [[3, 4]]
Note: The search will start at index 1 (included) and end at index 2 (not included).
By leaving out the start value, the range will start at the first item. And, by leaving out the last value, the range shall stop at the last item.
Range of Negative Indexes
a = np.array([ [1, 2], [3, 4], [5, 6] ]) a[-3:-1] # Output: [ [1, 2], [3, 4] ]
Note: The search will start at index -3 (included) and end at index -1 (not included).
So in a nutshell
end signifies that we want to access a range of indexes from start index to end index in which the start index in inclusive and end index is exclusive.
Coming back to our original problem
new_a[:, 1] what is this in the above solutions?
So in the given code snippet we want to access 1st column of all the rows.
There is a difference between
Guess what? We have a video about the same. Do check.