# Arrays in Numpy

**Draft version - do not distribute** 

## NumPy Arrays: The Foundation of Scientific Computing

**[NumPy arrays](https://numpy.org/doc/2.1/reference/arrays.ndarray.html) are the foundation of scientific computing in Python.** Arrays are the data structure that makes Python a usable tool for data science, machine learning, and numerical analysis (i.e., scientific computing). Without arrays, Python would be impractical as a scientific programming language.

Ultimately, all data has to be represented as numbers. And all data science methods are mathematical manipulations of these numbers. Arrays are the natural—perhaps the only—way to organize data. 

NumPy arrays are multi-dimensional grids of values that can represent vectors, matrices, or higher-dimensional data structures. Each element in an array has a specific location identified by a tuple of integer indices. Python uses [zero-based indexing](https://en.wikipedia.org/wiki/Zero-based_numbering), so the first element is always at index 0.

These arrays are specifically designed and optimized for efficient numerical computation and data manipulation, making them far superior to standard Python lists for mathematical operations. Objects like data frames or dictionaries cannot operate like arrays. 

**Vectorization** allows NumPy to perform operations on entire arrays simultaneously, eliminating the need for explicit loops and dramatically improving performance.

**Broadcasting** enables mathematical operations between arrays of different shapes through [NumPy's broadcasting rules](https://numpy.org/doc/stable/user/basics.broadcasting.html), providing flexibility in how you combine data.

Understanding NumPy arrays is essential for anyone working in scientific computing with Python. They form the foundation for virtually every major scientific Python library, including pandas, scikit-learn, and matplotlib. **You must be able to think in arrays.** (Goodbye dataframes, lists, dictionaries, etc.)

## Mathematical Notation for Arrays and Matrices

**Real Numbers**
A real number is any number that can be found on the number line. This includes: 
- **Natural numbers**: 1, 2, 3, 4, ...
- **Whole numbers**: 0, 1, 2, 3, 4, ...
- **Integers**: ..., -2, -1, 0, 1, 2, ...
- **Rational numbers**: Any number that can be expressed as a fraction (like 1/2, -3/4, or 0.75)
- **Irrational numbers**: Numbers that cannot be expressed as fractions (like π, √2, or e)

Real numbers are called "real" to distinguish them from imaginary numbers or complex numbers. 

Any measurement you make in the physical world—length, weight, temperature, time—will be a real number. In this course, we will deal exclusively with real numbers. 

The mathematical symbol for the set of real numbers is $\mathbb{R}$.

## Notation for Array and Matrix Dimensions

Consider the array $A = [ 1, 3.3, -10, 5.78 ]$. All the elements of $A$ are real numbers. Additionally, $A$ has 4 elements. We say that $A$ is an element of $\mathbb{R}^4$, an array of size 4 with real elements.

Consider the matrix 
\begin{align} 
B = \begin{bmatrix} 10 & 3.3 & -10 \\ 5.8 & 10.3 & -20.1 \\ 6 & \pi & 0.0 \\ -80 & 56 & 0.003 
\end{bmatrix} .
\end{align}

The matrix $B$ has 4 rows and 3 columns, and is composed of real numbers; it is a member of $\mathbb{R}^{4 \times 3}$.

In general, an array that is a member of $\mathbb{R}^{n\times m}$ has $n$ rows and $m$ columns and $n\times m$ elements. 

Likewise, an array that is a member of $\mathbb{R}^{n\times m\times l}$ has $n$ elements along the first dimension, $m$ elements along the second dimension, $l$ elements along the third dimension, and $n\times m\times l$ total elements.

## Size, Shape, Indexing, and All That 

Let us consider two arrays, one a vector, the other a matrix. 

\begin{equation} 
a= \begin{bmatrix} 3.0 \\ -2.0 \\ 8.0 
\end{bmatrix} 
\end{equation} 

\begin{equation} 
b =\begin{bmatrix} 4.0 & 3.0 & 1.0 \\ -2.0 & 0 & 1.0  \\ 6.0 & 10.0 & -5.0 
\end{bmatrix}
\end{equation} 

Note that $a$ is an member of $\mathbb{R}^3$ and $b$ is a memeber of $\mathbb{R}^{3 \times 3}$. 

These two arrays have Python representations written as: 

```python
a = np.array([3.0, -2.0, 8.0]) 
b = np.array([[4.0, 3.0, 1.0], [-2.0, 0, 1.0], [6.0, 10.0, -5.0]])
```

## The Size and Shape of a NumPy Array 

The **size** of the array is the total number of elements. You can obtain the size by using the attribute `array.size`. Here is a code snippet example: 

```python
print(a.size) 
print(b.size) 
```

For our example arrays, $a$ has size 3, and $b$ has size 9. 

The **shape** of the array is the number of elements along each dimension of the array. You can obtain the shape by using the attribute `array.shape`. Here is a code snippet example: 

```python
print(a.shape) 
print(b.shape) 
```

The attribute `shape` returns a tuple. The first element of the tuple is the size of the first dimension, the second element is the size of the second dimension, and so on.

In [24]:
import numpy as np

a= np.array([3.0, -2.0, 8.0 ]) 

b= np.array([[4.0 , 3.0 , 1.0],[ -2.0 , 0 , 1.0],[ 6.0 , 10.0 , -5.0]])

print(a.size)
print(b.size)

print(a.shape) 
print(b.shape) 

3
9
(3,)
(3, 3)


## Indexing 
The elements of an array are marked by an integer called the index. Consider our example, a= np.array([3.0, -2.0, 8.0 ]) . 
* the 0 element is 3.0
* the 1 element is -2.0
* the 2 element is 8.0

For our matrix example, the index is two dimensional. The first index marks the row location, while the second index marks the column location. Consider our example,
\begin{equation} 
b =\begin{bmatrix} 4.0 & 3.0 & 1.0 \\ -2.0 & 0 & 1.0  \\ 6.0 & 10.0 & -5.0, 
\end{bmatrix}
\end{equation}
below is code to loop through each element and display its value. 

```python
for i in range(3): 
    for j in range(3): 
        val = b[i,j]
        print(f"The value at row {i} and column {j} is {val}.") 
```



In [25]:
for i in range(3): 
    for j in range(3): 
        val = b[i,j]
        print(f"The value at row {i} and column {j} is {val}.") 

The value at row 0 and column 0 is 4.0.
The value at row 0 and column 1 is 3.0.
The value at row 0 and column 2 is 1.0.
The value at row 1 and column 0 is -2.0.
The value at row 1 and column 1 is 0.0.
The value at row 1 and column 2 is 1.0.
The value at row 2 and column 0 is 6.0.
The value at row 2 and column 1 is 10.0.
The value at row 2 and column 2 is -5.0.


### Slicing
Slicing is the process of extracting a subset of an array. 

### 1D Array Slicing
Let: 
```python
a = np.array([10, 20, 30, 40, 50])
```

| Slice Syntax | Description             | Output          |
|--------------|-------------------------|-----------------|
| `a[1:4]`     | Elements from index 1 to 3 | `[20 30 40]`    |
| `a[:3]`      | First 3 elements         | `[10 20 30]`    |
| `a[2:]`      | Last 3 elements           | `[30 40 50]`    |
| `a[::2]`     | Every 2nd element        | `[10 30 50]`    |
| `a[::-1]`    | Reversed array           | `[50 40 30 20 10]` |
| `a[1]`       | Single element at index 1 | `20`           |
| `a[1:2]`     | Slice containing one element | `[20]`       |

### 2D Array Slicing

Let:

```python
b = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
```
| Slice Syntax     | Description                  | Output                          |
|------------------|------------------------------|---------------------------------|
| `b[0, :]`        | First row                    | `[1 2 3]`                        |
| `b[:, 1]`        | Second column                | `[2 5 8]`                        |
| `b[1:3, 0:2]`    | Sub-matrix (2 rows, 2 cols)  | `[[4 5], [7 8]]`                |
| `b[::-1, :]`     | Reverse rows                 | `[[7 8 9], [4 5 6], [1 2 3]]`    |
| `b[:, ::-1]`     | Reverse columns              | `[[3 2 1], [6 5 4], [9 8 7]]`    |
| `b[1, 1]`        | Center value (row 1, col 1)  | `5`                              |

## Higher Dimensional Arrays
Higher dimensional arrays are possible but hard to visualize. 

Here is an example, using the random.rand function to generate random numbers between 0 and 1: 
```python
C = np.random.rand(3, 4, 2)

In [26]:
C=np.random.rand(3,4,2)
C

array([[[0.21468125, 0.17208442],
        [0.89635084, 0.04784276],
        [0.76861475, 0.02236646],
        [0.2372226 , 0.36257052]],

       [[0.20851043, 0.80995621],
        [0.31352837, 0.46072644],
        [0.59403694, 0.23485037],
        [0.11235353, 0.19760986]],

       [[0.01699526, 0.76034814],
        [0.56804892, 0.15987655],
        [0.161875  , 0.56975112],
        [0.67066743, 0.26290218]]])

In [27]:
C.shape

(3, 4, 2)

## Element-Wise Operations

Element-wise operations apply an operation individually to each element in an array or between matching elements of two arrays. 
Here are some examples. 
Let: 
```python
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
```



| Operation       | Code        | Output           |
|----------------|-------------|------------------|
| Addition        | `a + b`     | `[5 7 9]`         |
| Subtraction     | `a - b`     | `[-3 -3 -3]`      |
| Multiplication  | `a * b`     | `[4 10 18]`       |
| Division        | `b / a`     | `[4. 2.5 2.]`     |
| Power           | `a ** 2`    | `[1 4 9]`         |
| Scalar Addition | `b + 2`     | `[6 7 8]`         |
| Scalar Multiplication | `b * 2` | `[8 10 12]`     |
| Modulo          | `b % a`     | `[0 1 0]`         |

In [28]:
# Import quiz functions
from quiz_indexing_utils import get_all_quizzes
# Import Markdown functions
from IPython.display import Markdown, display

quizzes = get_all_quizzes()

In [29]:
display(Markdown("### Problem 1"))
quizzes[0]

### Problem 1

In [45]:
display(Markdown("### Problem 2"))
quizzes[1]

### Problem 2

In [46]:
display(Markdown("### Problem 3"))
quizzes[2]

### Problem 3

In [47]:
display(Markdown("### Problem 4"))
quizzes[3]

### Problem 4

In [48]:
display(Markdown("### Problem 5"))
quizzes[4]

### Problem 5

In [49]:
display(Markdown("### Problem 6"))
quizzes[5]

### Problem 6

In [50]:
display(Markdown("### Problem 7"))
quizzes[6]

### Problem 7

In [51]:
display(Markdown("### Problem 8"))
quizzes[7]

### Problem 8

In [52]:
display(Markdown("### Problem 9"))
quizzes[8]

### Problem 9

In [53]:
display(Markdown("### Problem 10"))
quizzes[9]

### Problem 10

In [54]:
display(Markdown("### Problem 11"))
quizzes[10]

### Problem 11

In [55]:
display(Markdown("### Problem 12"))
quizzes[11]

### Problem 12

## NumPy Array Axes 

![From: https://nustat.github.io/DataScience_Intro_python/NumPy.html ](NorthWesternArrayImage.png)

Many NumPy operations can be performed on a single dimension of a multidimensional array. 

Consider generating a 3 by 4 array of random numbers from 0 to 1. 
```python
a = np.random.rand(3,4)
print(a.shape)
```

You can take the mean of the array as follows:
```python
print(np.mean(a))
```

You can take the mean along the columns by specifying `axis=0`; operations down rows correspond to axis 0. 
```python
print(np.mean(a, axis=0))
```
The result is a 4-element array of means along each column. 

Likewise, you can take the mean along each row by specifying `axis=1`; operations across columns correspond to axis 1. 
```python
print(np.mean(a, axis=1))
```
The result is a 3-element array of means along each row.

In [56]:
a = np.random.rand(3,4)
a

array([[0.32980947, 0.88451979, 0.07035742, 0.54358753],
       [0.33987216, 0.6123377 , 0.05049225, 0.19285711],
       [0.50283055, 0.39983442, 0.96729368, 0.87817099]])

In [57]:
np.mean(a)

np.float64(0.48099692353628903)

In [58]:
np.mean(a,axis=0)

array([0.39083739, 0.63223064, 0.36271445, 0.53820521])

In [59]:
np.mean(a,axis=1)

array([0.45706855, 0.29888981, 0.68703241])