NumPy
Basic data structure
list
in python can support different data type so its elements are actually pointers, which waste a lot of memori and CPU time. The basic objects of NumPy are ndarray
and ufnc
. ndarray
store data (bool, int, float and etc.). ufunc
contains function to cope with ndarray
. ndarray
, an indexable, n-dimensional array containing elements of the same type (dtype
), where dimension is the number of indices that we need to visit a scalar of the array, is the basic data struture of NumPy .Vectors are 1-D arrays and matrices are 2-D arrays.
Use
.shape
and.dtype
to get the dimension and element type of an array.Instead of
float64
, we often usefloat32
to accelerate computing/
Vectors
Vectors are 1-D arrays in NumPy. To create a vector, we can use:
1 | import numpy as np |
Matrices
Matrices are 2-D arrays in NumPy. To create a matrix, the functions used are as those in creating vectors. For examples:
1 | a = np.zeros((4, 2)) |
However, when specifying values, numpy specifies rows first:
1 | a = np.array([[1, 2], |
We can also create a matrix from a vector:
1 | a = np.zeros(6).reshape(-1, 2) |
which will create a 3x2 matrix. -1
indicates that the number of row depends on the number of column. If .reshape(-1, )
, we turn a matrix to a vector by concentrating the vector row by row, that is:
1 | a = np.array([[2, 3], |
.reshape
regard a matrix as a vector with $mn$ elements. Therefore, if we want to transpose a matrix, we must usea.T
rather than.reshape
. The return object of.reshape
shares memory with initial object but.reshape
doesn't change the shape of initial object.
Operations
Indexing & slicing
Arrays in NumPy can be used as list
in python, which means that the indexing and slicing in arrays are the same as those in list
, though the data type is a built-in type of numpy (float64
, int32
, ndarray
, etc.).
When slicing a certain column, we should use
a[:, j]
In general, there are 5 different ways to read ndarray
using []
:
- Integer
1
2
3
4
5
6
7
8
9
10
11a = np.arange(12).reshape(3, 4)
'''
a = [[0 1 2 3]
[4 5 6 7]
[8 9 10 11]]
'''
print(a[0])
# Get [0 1 2 3], shape(4,)
print(a[0][0])
# Get 0, shape(), scalarThe return object shares memory with initial object.
- Slicing
1
2
3
4
5
6
7
8print(a[:, 1])
# Get [1 5 9], the 1 column, shape(3,)
print(a[0:2, :])
# Get [[1 2] [5 6]], shape(2, 2)
print(a[::2, ::2]) # that is set steps to 2
# Get [[0 2] [8 10]], shape(2, 2)The return object still shares memory with initial object.
- Integer list
1
2print(a[[0, 1]])
# Get [[0 1 2 3] [4 5 6 7]], shape(2, 3)The return object still shares memory with initial object.
- Integer matrix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16a = np.arange(12)
b = np.array[[1, 2, 4], [5, 6, 9]]
print(a[b])
# Get [[1 2 4] [5 6 7]], shape(2, 3)
c = np.array([[1, 2], [3, 4]])
d = np.array([[1, 0], [0, 1]])
print(c[d])
'''
[[[3 4]
[1 2]]
[[1 2]
[3 4]]]
'''
# c[[1, 0]] is (2, 2) so c[d] is (2, 2, 2)The return object is a new object. It doesn't share memory with initial object. The value of
b
is the index of element in specific dimension ina
.To understand the second one, you should focus on
d
rather thanc
. That is, the values ofd
are the indexex ofc
's first dimension (row). And NumPy just replaces them with value of the specified dimension. - Bool array
1
2
3
4a = np.array([[1, 2], [3, 4]])
b = np.array([[True, False], [False, True]])
print(a[b])
# Get [1 4], shape(2, )The return object is not a new object. It shares memory with initial object. When using bool array, Numpy will only keep the
True
element and return a vector.
Elementwise computations
In elementwise computations, NumPy apply the same operation to each elment of matrix. For example:
1 | a = np.array([[1, 2], [3, 4]]) |
Because of broadcasting,
b = 5 * a
b = a + 1
orb = a**2
are also valid. Such guidelines also apply to boolen operations, that is:
1 | a = np.array([[0, 1], |
1 | We get: |
Broadcasting
If a
is a matrix, b
is a vector with the same row or column number. Then a+-*/b
will +-*/
b
to each row or column of a
. This is the Broadcasting in numpy. See more about it on Broadcasting.
More generally, the mechanism of broadcasting is aligning the shape of each dimension of arrays to the largest one of both arrays. Broadcasting only works when two arrays have different dimensions or two arrays have the same dimensions but at least one dimension is 1
. For examples:
1 | a = np.arange(6).reshape(-1, 1) # shape (6, 1) |
The procedures of broadcasting:
- Compare the shape of each dimension of two arrays from the last dimension; [e.g. a(6, 1), b(5, ), compare 1 with 5]
- Broadcasting the dimension: Broadcast the dimension of the array with smaller dimension from back to front; [e.g. a(6, 1), b(5, ), broadcast b(5, ) to b(1, 5)]
- Broadcasting the shape of dimension: The array with shape 1 in one dimension will be stretched to match the corresponding dimension shape of another array. [e.g. a(6, 1)->a(6, 5); b(1, 5)->b(6, 5)]
- Report error when one dimension can't be broadcast. Namely, the two arrays have different shapes in this dimension but neither of them have a shape
1
.
Dot product
1 | c = np.dot(a, b) |
Others
1 | # Add all the elements up, return a scalar |
Tile
numpy.tile(A, reps)
, where A
is the input array and reps
is the replication factor of A
in each dimension, extends the dimension or shape of the original array.
A.dim > len(reps)
For examples:
1 | a = np.array([[0, 1, 2],[3,4,5]]) |
NumPy will extend the shape of b
to (1, 2)
from back to front. Therefore, it is equal to np.tile(a, (1, 2))
:
1 | b = |
A.dim < len(reps)
For examples:
1 | a = np.array([[0, 1, 2],[3,4,5]]) # shape (2, 3) |
NumPy will extend the shape of a
to (1, 2, 3)
from back to front. Therefore, the column of a
will copy 3 times, the row of a
will copy 2 times and the first dimension will copy 1 time.
1 | b = |
All vectorizable operations use SIMD, so they are much faster than
loop
.
Random.choice
numpy.random.choice(a, size=None, replace=True, p=None)
, which is an random sampling operation and will return an array whose elements are the result of random sampling.
a
, an array or integer. Ifa
is an array, the samples it chooses come from it, otherwise, the samples it chooses come fromnp.arrange(a)
;size
, an integer or a tuple. Ifsize
is an integer, it will choosesize
samples in total. Ifsize
is an tuple (e.g.(m, n, k)
), it will producem x n x k
samples and arrange them in the shape of(m, n, k)
;replace
,True
orFalse
, whereTrue
means sampling with replacement (放回取样) andFalse
means sampling without replacement (不放回取样);p
,None
or an array. IfNone
, the probability of selecting each number is the same; if it is an array, the length of the arrayp
should be the same as the length ofa
, and the elements in the arrayp
correspond to the probability of choosing each element ina
.