Manipulating NumPy arrays

Manipulating NumPy arrays#

import numpy as np

Access or modify elements#

Elements in a numpy array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.

Indexes and slices#

For example, we can create an array A and perform any kind of selection operations on it.

A = np.random.randint(0, 100, size=(4, 5))
A

array([[ 2, 75, 99, 61, 74],
       [49, 32, 66, 71, 25],
       [39, 99,  3, 26, 51],
       [29, 25, 73, 63, 48]])

# Get the element from second line, first column
A[1, 0]

np.int64(49)

# Get the first two lines
A[:2]

array([[ 2, 75, 99, 61, 74],
       [49, 32, 66, 71, 25]])

# Get the last column
A[:, -1]

array([74, 25, 51, 48])

# Get the first two lines and the columns with an even index
A[:2, ::2]

array([[ 2, 99, 74],
       [49, 66, 25]])

Mask to select elements validating a condition#

cond = A > 50
print(cond)
print(A[cond])

[[False  True  True  True  True]
 [False False  True  True False]
 [False  True False False  True]
 [False False  True  True False]]
[75 99 61 74 66 71 99 51 73 63]

The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:

# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]

[[ 2 75 99 61 74]
 [49 32 66 71 25]
 [39 99  3 26 51]
 [29 25 73 63 48]]

array([[ 2, 75, 74],
       [49, 32, 25],
       [39, 99, 51],
       [29, 25, 48]])

Perform array manipulations#

Apply arithmetic operations to whole arrays (element-wise)#

(A + 5) ** 2

array([[   49,  6400, 10816,  4356,  6241],
       [ 2916,  1369,  5041,  5776,   900],
       [ 1936, 10816,    64,   961,  3136],
       [ 1156,   900,  6084,  4624,  2809]])

Apply functions element-wise#

np.exp(A)  # With numpy arrays, use the functions from numpy !

array([[7.38905610e+00, 3.73324200e+32, 9.88903032e+42, 3.10429794e+26,
        1.37338298e+32],
       [1.90734657e+21, 7.89629602e+13, 4.60718663e+28, 6.83767123e+30,
        7.20048993e+10],
       [8.65934004e+16, 9.88903032e+42, 2.00855369e+01, 1.95729609e+11,
        1.40934908e+22],
       [3.93133430e+12, 7.20048993e+10, 5.05239363e+31, 2.29378316e+27,
        7.01673591e+20]])

Setting parts of arrays#

A[:, 0] = 0.0
print(A)

[[ 0 75 99 61 74]
 [ 0 32 66 71 25]
 [ 0 99  3 26 51]
 [ 0 25 73 63 48]]

Attributes and methods of `np.ndarray`#

See the Numpy documentation.

print([s for s in dir(A) if not s.startswith("__")])

['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'device', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'mT', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'to_device', 'tobytes', 'tofile', 'tolist', 'trace', 'transpose', 'var', 'view']

# Ex1: Get the mean through different dimensions
print(A)
print("Mean value", A.mean())
print("Mean line", A.mean(axis=0))
print("Mean column", A.mean(axis=1))

[[ 0 75 99 61 74]
 [ 0 32 66 71 25]
 [ 0 99  3 26 51]
 [ 0 25 73 63 48]]
Mean value 44.55
Mean line [ 0.   57.75 60.25 55.25 49.5 ]
Mean column [61.8 38.8 35.8 41.8]

# Ex2: Convert a 2D array in 1D keeping all elements
print(A, A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)

[[ 0 75 99 61 74]
 [ 0 32 66 71 25]
 [ 0 99  3 26 51]
 [ 0 25 73 63 48]] (4, 5)
[ 0 75 99 61 74  0 32 66 71 25  0 99  3 26 51  0 25 73 63 48] (20,)

Dot product#

b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
385.0

Comparison with Matlab

	Matlab	Numpy
element wise	`.*`	`*`
dot product	`*`	`@`

`dtypes`#

numpy arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with dtypes.

dtypes = np.dtype(
    [("country", "S20"), ("density", "i4"), ("area", "i4"), ("population", "i4")]
)
x = np.array(
    [
        ("Netherlands", 393, 41526, 16928800),
        ("Belgium", 337, 30510, 11007020),
        ("United Kingdom", 256, 243610, 62262000),
        ("Germany", 233, 357021, 81799600),
    ],
    dtype=dtypes,
)
arr = np.array(x, dtype=dtypes)
arr.sort(order="density")
print(arr)

[(b'Germany', 233, 357021, 81799600)
 (b'United Kingdom', 256, 243610, 62262000)
 (b'Belgium', 337,  30510, 11007020)
 (b'Netherlands', 393,  41526, 16928800)]

In the previous example, we manipulated a one dimensional array containing quadruplets of data. This functionality can be used to load images into arrays and save arrays to images.

It can also be used when loading data of different types from a file with np.genfromtxt.

NumPy and SciPy sub-packages:#

`numpy.random`#

We already saw numpy.random to generate numpy arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.

`numpy.linalg`#

To perform linear algebra with dense matrices, we can use the submodule numpy.linalg. For instance, in order to compute the determinant of a random matrix, we use the method det

A = np.random.random([5, 5])
print(A)
np.linalg.det(A)

[[0.69328424 0.77570629 0.44299853 0.1942352  0.08219559]
 [0.7888027  0.84865805 0.9860569  0.18715172 0.16799254]
 [0.08777083 0.11183672 0.13202839 0.29350791 0.94277618]
 [0.2949529  0.95145492 0.40172679 0.30995014 0.61824681]
 [0.27180358 0.48012402 0.31643439 0.17200564 0.945656  ]]

np.float64(0.034746163386962126)

squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)

[[0.84865805 0.9860569 ]
 [0.11183672 0.13202839]]

array([[  74.60968103, -557.22400767],
       [ -63.19930043,  479.57946304]])

If the data are sparse matrices, instead of using numpy, it is recommended to use scipy.

from scipy.sparse import csr_matrix

print(csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]]))

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 5 stored elements and shape (3, 3)>
  Coords	Values
  (0, 0)	1
  (0, 1)	2
  (1, 2)	3
  (2, 0)	4
  (2, 2)	5