Lab: Unsupervised Learning, Recommenders, Reinforcement Learning

C3_W1_PracticeLab1

To finish this lab and make the code run efficiently, we must have a good understanding of the slicing and broadcasting of numpy so that we can implement vectorized code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Exercise 1
m = X.shape[0]
# You need to return the following variables correctly
idx = np.zeros(X.shape[0], dtype=int)
### START CODE HERE ###
for i in range(m):
d = np.sum((centroids - X[i])**2, axis=1) # For each point, compute its distance to each centroid
idx[i] = np.argmin(d, axis=0) # For each point, choose the closest centroid
### END CODE HERE ###

# Exercise 2
centroids = np.zeros((K, n))
### START CODE HERE ###
for i in range(K):
p_i = idx == i # Get the index of points that were assigned to centroid i
x_i = X[p_i] # Slice
centroids[i] = np.mean(x_i, axis=0) # Compute the mean value of each column
### END CODE HERE ##

axis is the dimension or the label, that will be taken into account. In other words, in each step, the other dimensions will not change. Therefore, for a matrix A, np.sum(A, axis=1) compute the sum of each row (The first dimension does't change).

See Numpy to know more about it.

C3_W1_PracticeLab2

This lab is quite easy. However, there are some points to notice:

  • Slicing and broadcasting of numpy;
  • Divide by 0 problem.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    # Exercise 1
    ### START CODE HERE ###
    mu = np.mean(X, axis=0) # average
    var = np.mean((X- mu)**2, axis=0)
    ### END CODE HERE ###

    # Exercise 2
    best_epsilon = 0
    best_F1 = 0
    prec = 0.
    rec = 0.
    F1 = 0

    step_size = (max(p_val) - min(p_val)) / 1000

    for epsilon in np.arange(min(p_val), max(p_val), step_size):

    ### START CODE HERE ###
    actual_pos_num = np.sum(y_val) + 0. # Number of actual positive
    pred_pos = (p_val<epsilon) + 0 # Make predictions
    pred_pos_num = np.sum(pred_pos) + 0. # Number of predict positive
    tp = np.sum(y_val[pred_pos==1]) # Number of true positive
    if pred_pos_num != 0:
    prec = tp / pred_pos_num
    if actual_pos_num != 0:
    rec = tp / actual_pos_num
    if prec != 0 and rec != 0:
    F1 = 2 * prec * rec / (prec + rec)
    ### END CODE HERE ###
    ### END CODE HERE ###

C3_W2_PracticeLab1

This lab is about collaborative filtering. We only need to implement the cost function. The other parts are almost the same as linear regression.

1
2
3
4
5
6
# Vectorized
### START CODE HERE ###
reg = lambda_ / 2 * (np.sum(W**2) + np.sum(X**2)) # regularization
err = (X @ W.T + b - Y)**2
J = np.sum(err[R==1]) / 2 + reg
### END CODE HERE ###

C3_W4_PracticeLab

Install dependent libraries (it is recommended to install anaconda first):

1
2
3
4
5
6
pip install gym==0.25.1
pip install pyvirtualdisplay
conda install swig # or pip install swig
conda install -c conda-forge gym-box2d
pip install imageio[ffmpeg]
pip install [pyav]

Other configurations:

1
2
3
4
5
import os
os.environ['KMP_DUPLICATE_LIB_OK']='TRUE'
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import warnings # ignore some warnings
warnings.filterwarnings("ignore", category=Warning)

If there is still a bug in Display(visible=0, size=(840, 480)).start(), you can just comment out this code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Exercise 1
# Create the Q-Network
q_network = Sequential([
### START CODE HERE ###
Dense(units=64, activation='relu', input_dim=8),
Dense(units=64, activation='relu'),
Dense(units=4, activation='linear')
### END CODE HERE ###
])

# Create the target Q^-Network
target_q_network = Sequential([
### START CODE HERE ###
Dense(units=64, activation='relu', input_dim=8),
Dense(units=64, activation='relu'),
Dense(units=4, activation='linear')
### END CODE HERE ###
])

### START CODE HERE ###
optimizer = Adam(learning_rate=ALPHA)
### END CODE HERE ###

# Exercise 2
### START CODE HERE ###
y_targets = rewards + (1 - done_vals) * gamma * max_qsa
### END CODE HERE ###

### START CODE HERE ###
loss = MSE(q_values, y_targets)
### END CODE HERE ###