(Advanced) Indexing

To write compact code where the operations stay within numpy without writing control structures in Python, we need a way to convey numpy's intent. The tool for this is clever indexing.

Firstly, individual elements can be indexed in exactly the same way as with lists:

a = np.array([1, 2, 3, 4, 5])
a[0]  # 1
a[-1]  # 5

Areas can be indexed with slices:

a = np.array([1, 2, 3, 4, 5])
a[1:3]  # array([2, 3])

The first element to be included and the first element to be excluded are specified: i.e. including the second element with index 1 but without the fourth element with index 3. If a value is omitted, the first or last element is assumed. If both delimiters are omitted, the entire array is copied:

a = np.array([1, 2, 3, 4, 5])
a[:]   # array([1, 2, 3, 4, 5])
a[1:]  # array([2, 3, 4, 5])

In multi dimensions, the boundaries are separated by commas:

a = np.arange(9).reshape(3,3)
a[1:3, 1:3]  # array([[4, 5],
             #        [7, 8]])

The same applies to a single element:

a = np.arange(9).reshape(3,3)
a[1, 1]  # 4

This technique can also be used for assignments:

a = np.arange(9).reshape(3,3)
a[0, 0] = 42   # array([[42,  1,  2],
               #        [ 3,  4,  5],
               #        [ 6,  7,  8]])
a[1:, :] = 0   # array([[42,  1,  2],
               #        [ 0,  0,  0],
               #        [ 0,  0,  0]])

But what if I have a list of elements that I want to extract? Unlike lists, numpy also accepts lists as indices:

a = np.arange(9)**2
a[[1, 3, 5]]  # array([ 1,  9, 25])

Lists (or arrays) of truth values are also possible:

a = np.arange(3)**2
a[[False, True, True]] # array([1, 4])

This is particularly helpful if you want to apply a condition to the array and want to use broadcasting:

a = np.arange(9)
a % 2 == 0     # array([ True, False,  True, False,  True, False,  True, False,  True])
a[a % 2 == 0]  # array([0, 2, 4, 6, 8])

You have to be careful in multi dimensions because not all combinations of the several specified indices are possible:

a = np.arange(9).reshape(3,3)
a[[0, 1], [1, 2]]  # array([1, 5])

If you actually want all combinations from the two lists instead, you need np.ix_:

a = np.arange(9).reshape(3,3)
a[np.ix_([0, 1], [1, 2])]  # array([[1, 2],
                           #        [4, 5]])

A new dimension can be added with np.newaxis:

a = np.arange(3)
a[:, np.newaxis]  # array([[0],
                  #        [1],
                  #        [2]])

This is particularly useful for replacing loops.

Indexing as a loop replacement

Suppose we want to calculate the pairwise distances between particles in 1D. In normal Python, this would be a nested loop:

def pairwise_distances_python(positions: list[float]):
    distances = []
    for i, p1 in enumerate(positions):
        row = []
        for j, p2 in enumerate(positions):
            row.append(abs(p1 - p2))
        distances.append(row)
    return distances

pairwise_distances_python([1, 2, 3]) # [[0, 1, 2], [1, 0, 1], [2, 1, 0]]

The principle could also be applied to 'numpy':

def pairwise_distances_numpy_DONTDOTHIS(positions: np.ndarray):
    distances = np.zeros(positions.shape * 2)
    for i in range(positions.shape[0]):
        for j in range(positions.shape[0]):
            distances[i, j] = abs(positions[i] - positions[j])
    return distances
pairwise_distances_numpy_DONTDOTHIS(np.array([1,2,3])) # array([[0., 1., 2.],
                                                       #        [1., 0., 1.],
                                                       #        [2., 1., 0.]])

With indexing and broadcasting, however, this is much clearer:

def pairwise_distances_numpy(positions: np.ndarray):
    return np.abs(positions[:, np.newaxis] - positions[np.newaxis, :])

The trick here is that the first sub-expression positions[:, np.newaxis] creates a 2D array that stores the positions in the rows. The second positions[np.newaxis, :] creates a 2D array containing the positions in the columns. The subtraction is now performed with broadcasting, in which the first array is copied to all rows of a matrix, while the second array is copied to all rows. The result is a matrix containing the differences of all pairs of positions. This corresponds to

array([[0, 0, 0],         array([[0, 1, 2],
       [1, 1, 1],    -           [0, 1, 2],
       [2, 2, 2]])               [0, 1, 2]])