NumPy

Posted on 2023-04-22 Edited on 2023-08-09 In Tool Views: Word count in article: 1.7k Reading time ≈ 6 mins.

NumPy is a library that extends the base capabilities of python to add a richer data set including mroe numeric types, vectors, matrices, and many matrix functions.

Basic data structure

list in python can support different data type so its elements are actually pointers, which waste a lot of memori and CPU time. The basic objects of NumPy are ndarray and ufnc. ndarray store data (bool, int, float and etc.). ufunc contains function to cope with ndarray. ndarray, an indexable, n-dimensional array containing elements of the same type (dtype), where dimension is the number of indices that we need to visit a scalar of the array, is the basic data struture of NumPy .Vectors are 1-D arrays and matrices are 2-D arrays.

Use .shape and .dtype to get the dimension and element type of an array.

Instead of float64, we often use float32 to accelerate computing/

Vectors

Vectors are 1-D arrays in NumPy. To create a vector, we can use:

import numpy as np

# Create a vector with 4 elements whose values are 0 and types are float64
a = np.zeros(4)
# The same as above. We use this mode to create n-D arrays
a = np.zeros((4,)) # np.ones creates ndarray whose values are 1
# Create a vector with 4 elements whose values are random value in [0, 1) and types are float64
a = np.random.random_sample((4,))
# np.arange([start=0], stop, [step=1]). Create an arithmetic progression, [start, stop)
a = np.arange(4.)
# Create a vector with 4 elements whose values are in [0, 1) obeying uniform distribution
a = np.random.rand((4,))
# Specify values manually
a = np.array([5, 4, 3, 2, 1])
a = np.array([5.0, 4, 3, 2, 1])

Matrices

Matrices are 2-D arrays in NumPy. To create a matrix, the functions used are as those in creating vectors. For examples:

1	a = np.zeros((4, 2))

However, when specifying values, numpy specifies rows first:

1 2	a = np.array([[1, 2], [3, 4]])

We can also create a matrix from a vector:

1	a = np.zeros(6).reshape(-1, 2)

which will create a 3x2 matrix. -1 indicates that the number of row depends on the number of column. If .reshape(-1, ), we turn a matrix to a vector by concentrating the vector row by row, that is:

a = np.array([[2, 3],
              [4, 5]])
print(a.reshape(-1, ))
# We get: [2 3 4 5]

.reshape regard a matrix as a vector with $mn$ elements. Therefore, if we want to transpose a matrix, we must use a.T rather than .reshape. The return object of .reshape shares memory with initial object but .reshape doesn't change the shape of initial object.

Operations

Indexing & slicing

Arrays in NumPy can be used as list in python, which means that the indexing and slicing in arrays are the same as those in list, though the data type is a built-in type of numpy (float64, int32, ndarray, etc.).

When slicing a certain column, we should use a[:, j]

In general, there are 5 different ways to read ndarray using []:

Integer

a = np.arange(12).reshape(3, 4)
 '''
 a = [[0 1 2 3]
       [4 5 6 7]
       [8 9 10 11]]
 '''
print(a[0])
# Get [0 1 2 3], shape(4,)

print(a[0][0])
# Get 0, shape(), scalar

The return object shares memory with initial object.

Slicing

print(a[:, 1])
# Get [1 5 9], the 1 column, shape(3,)

print(a[0:2, :])
# Get [[1 2] [5 6]], shape(2, 2)

print(a[::2, ::2]) # that is set steps to 2
# Get [[0 2] [8 10]], shape(2, 2)

The return object still shares memory with initial object.

Integer list
1
2
print(a[[0, 1]])
# Get [[0 1 2 3] [4 5 6 7]], shape(2, 3)
The return object still shares memory with initial object.

Integer matrix

a = np.arange(12)
b = np.array[[1, 2, 4], [5, 6, 9]]
print(a[b])
# Get [[1 2 4] [5 6 7]], shape(2, 3)

c = np.array([[1, 2], [3, 4]])
d = np.array([[1, 0], [0, 1]])
print(c[d])
'''
[[[3 4]
  [1 2]]

 [[1 2]
  [3 4]]]
'''
# c[[1, 0]] is (2, 2) so c[d] is (2, 2, 2)

The return object is a new object. It doesn't share memory with initial object. The value of b is the index of element in specific dimension in a.

To understand the second one, you should focus on d rather than c. That is, the values of d are the indexex of c's first dimension (row). And NumPy just replaces them with value of the specified dimension.

Bool array
1
2
3
4
a = np.array([[1, 2], [3, 4]])
b = np.array([[True, False], [False, True]])
print(a[b])
# Get [1 4], shape(2, )
The return object is not a new object. It shares memory with initial object. When using bool array, Numpy will only keep the True element and return a vector.

Elementwise computations

In elementwise computations, NumPy apply the same operation to each elment of matrix. For example:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

a +-*/ b = [[1+-*/5, 2+-*/6], [3+-*/7, 4+-*/8]]
a**b = [[1**5, 2**6], [3**7, 4**8]] # equal to np.power(a, b)
# exp^a
e = np.exp(a)

Because of broadcasting, b = 5 * a b = a + 1 or b = a**2 are also valid. Such guidelines also apply to boolen operations, that is:

a = np.array([[0, 1],
              [1, 1],
              [0, 1]])
pos = a == 1 # a == 1 will be apply to each element of a and return matrix
print(pos)

We get:
[[False  True]
 [ True  True]
 [False  True]]

Broadcasting

If a is a matrix, b is a vector with the same row or column number. Then a+-*/b will +-*/ b to each row or column of a. This is the Broadcasting in numpy. See more about it on Broadcasting.

More generally, the mechanism of broadcasting is aligning the shape of each dimension of arrays to the largest one of both arrays. Broadcasting only works when two arrays have different dimensions or two arrays have the same dimensions but at least one dimension is 1. For examples:

a = np.arange(6).reshape(-1, 1) # shape (6, 1)
b = np.arange(5) # shape(5, )
c = a + b # shape(6, 5)
'''
a extends to (6, 5)
b extends to (6, 5)
'''

The procedures of broadcasting:

Compare the shape of each dimension of two arrays from the last dimension; [e.g. a(6, 1), b(5, ), compare 1 with 5]
Broadcasting the dimension: Broadcast the dimension of the array with smaller dimension from back to front; [e.g. a(6, 1), b(5, ), broadcast b(5, ) to b(1, 5)]
Broadcasting the shape of dimension: The array with shape 1 in one dimension will be stretched to match the corresponding dimension shape of another array. [e.g. a(6, 1)->a(6, 5); b(1, 5)->b(6, 5)]
Report error when one dimension can't be broadcast. Namely, the two arrays have different shapes in this dimension but neither of them have a shape 1.

Dot product

1	c = np.dot(a, b)

Others

# Add all the elements up, return a scalar
b = np.sum(a) # use [axis] to determine row or column
# Avearge of a, return a scalar
c = np.mean(a) # use [axis] to determini row or column
# Concentrate vectors to form a matrix. Each item is a column.
d = np.c_[a, a**2] # if a.shape=4, d.shape=(4,2)
# Returns the index of the maximum value of an array along a certain axis (0: column, 1: row).
f = np.argmax(a, [axis], [out]) # type: ndarray

Tile

numpy.tile(A, reps), where A is the input array and reps is the replication factor of A in each dimension, extends the dimension or shape of the original array.

A.dim > len(reps)

For examples:

1 2	a = np.array([[0, 1, 2],[3,4,5]]) b = np.tile(a, (2))

NumPy will extend the shape of b to (1, 2) from back to front. Therefore, it is equal to np.tile(a, (1, 2)):

1
2
3

b = 
[[0 1 2 0 1 2]
 [3 4 5 3 4 5]]

A.dim < len(reps)