NumPy arrays
The central data structure in numpy
is the np.ndarray
(array for short). Unlike a list, an array can only contain one homogeneous data type for efficiency reasons. This allows a special storage of the data, which is essential for the efficient implementation of the algorithms in numpy
and the underlying libraries.
A data type within the array is called np.dtype
. This dtype
is close in content to the data types that we already know from Python itself: integers and decimal numbers are available here again. While the decimal numbers behave in the same way as in Python, it should be noted that (again for reasons of efficiency) the integers in numpy
have a fixed value range. Frequently used data types and their restrictions are:
Data type | Range |
---|---|
np.int8 |
-128 to 127 |
np.int32 |
-2147483648 to 2147483647 |
np.int64 |
-9223372036854775808 to 9223372036854775807 |
np.uint8 |
0 to 255 |
np.uint16 |
0 to 65535 |
np.uint32 |
0 to 4294967295 |
np.uint64 |
0 to 18446744073709551615 |
np.float64 |
64-bit IEEE 754 (double precision) |
It is easy to create arrays from lists:
values = [1, 2, 3, 4, 5]
array = np.array(values)
A multidimensional array is created for nested lists of the same length:
values = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
array = np.array(values)
The size of an array can be queried using the .shape
property:
values = [[1, 2, 3], [4, 5, 6]]
array = np.array(values)
print (array.shape) # (2, 3)
Here the number of rows comes first, then the number of columns. In numpy
, a dimension is referred to as axis
. Conversely, an array can also be converted into a list:
values = [[1, 2, 3], [4, 5, 6]]
array = np.array(values)
print (array.tolist()) # [[1, 2, 3], [4, 5, 6]]
Many operations are performed on the original data as long as possible without copying it. This is called numpy
views. For example, the transpose of a matrix is accessible as a view via .T
:
values = [[1, 2, 3], [4, 5, 6]]
array = np.array(values)
print (array.T) # [[1, 4], [2, 5], [3, 6]]
Or the shape of an array can be changed using reshape
, which also creates a view, i.e. the data is not copied again.
values = [1, 2, 3, 4]
array = np.array(values)
reshaped = array.reshape(2, 2)
print (reshaped) # [[1, 2], [3, 4]]
A variety of functions can be evaluated on each array, e.g.
np.arange(4).sum() # 6
np.arange(4).mean() # 1.5
np.arange(4).std() # 1.118033988749895
np.arange(4).min() # 0
np.arange(4).max() # 3