(Advanced) Indexing
To write compact code where the operations stay within numpy
without writing control structures in Python, we need a way to convey numpy
's intent. The tool for this is clever indexing.
Firstly, individual elements can be indexed in exactly the same way as with lists:
a = np.array([1, 2, 3, 4, 5])
a[0] # 1
a[-1] # 5
Areas can be indexed with slices:
a = np.array([1, 2, 3, 4, 5])
a[1:3] # array([2, 3])
The first element to be included and the first element to be excluded are specified: i.e. including the second element with index 1 but without the fourth element with index 3. If a value is omitted, the first or last element is assumed. If both delimiters are omitted, the entire array is copied:
a = np.array([1, 2, 3, 4, 5])
a[:] # array([1, 2, 3, 4, 5])
a[1:] # array([2, 3, 4, 5])
In multi dimensions, the boundaries are separated by commas:
a = np.arange(9).reshape(3,3)
a[1:3, 1:3] # array([[4, 5],
# [7, 8]])
The same applies to a single element:
a = np.arange(9).reshape(3,3)
a[1, 1] # 4
This technique can also be used for assignments:
a = np.arange(9).reshape(3,3)
a[0, 0] = 42 # array([[42, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8]])
a[1:, :] = 0 # array([[42, 1, 2],
# [ 0, 0, 0],
# [ 0, 0, 0]])
But what if I have a list of elements that I want to extract? Unlike lists, numpy
also accepts lists as indices:
a = np.arange(9)**2
a[[1, 3, 5]] # array([ 1, 9, 25])
Lists (or arrays) of truth values are also possible:
a = np.arange(3)**2
a[[False, True, True]] # array([1, 4])
This is particularly helpful if you want to apply a condition to the array and want to use broadcasting:
a = np.arange(9)
a % 2 == 0 # array([ True, False, True, False, True, False, True, False, True])
a[a % 2 == 0] # array([0, 2, 4, 6, 8])
You have to be careful in multi dimensions because not all combinations of the several specified indices are possible:
a = np.arange(9).reshape(3,3)
a[[0, 1], [1, 2]] # array([1, 5])
If you actually want all combinations from the two lists instead, you need np.ix_
:
a = np.arange(9).reshape(3,3)
a[np.ix_([0, 1], [1, 2])] # array([[1, 2],
# [4, 5]])
A new dimension can be added with np.newaxis
:
a = np.arange(3)
a[:, np.newaxis] # array([[0],
# [1],
# [2]])
This is particularly useful for replacing loops.
Indexing as a loop replacement
Suppose we want to calculate the pairwise distances between particles in 1D. In normal Python, this would be a nested loop:
def pairwise_distances_python(positions: list[float]):
distances = []
for i, p1 in enumerate(positions):
row = []
for j, p2 in enumerate(positions):
row.append(abs(p1 - p2))
distances.append(row)
return distances
pairwise_distances_python([1, 2, 3]) # [[0, 1, 2], [1, 0, 1], [2, 1, 0]]
The principle could also be applied to 'numpy':
def pairwise_distances_numpy_DONTDOTHIS(positions: np.ndarray):
distances = np.zeros(positions.shape * 2)
for i in range(positions.shape[0]):
for j in range(positions.shape[0]):
distances[i, j] = abs(positions[i] - positions[j])
return distances
pairwise_distances_numpy_DONTDOTHIS(np.array([1,2,3])) # array([[0., 1., 2.],
# [1., 0., 1.],
# [2., 1., 0.]])
With indexing and broadcasting, however, this is much clearer:
def pairwise_distances_numpy(positions: np.ndarray):
return np.abs(positions[:, np.newaxis] - positions[np.newaxis, :])
The trick here is that the first sub-expression positions[:, np.newaxis]
creates a 2D array that stores the positions in the rows. The second positions[np.newaxis, :]
creates a 2D array containing the positions in the columns. The subtraction is now performed with broadcasting, in which the first array is copied to all rows of a matrix, while the second array is copied to all rows. The result is a matrix containing the differences of all pairs of positions. This corresponds to
array([[0, 0, 0], array([[0, 1, 2],
[1, 1, 1], - [0, 1, 2],
[2, 2, 2]]) [0, 1, 2]])