NumPy data science cover image with Python logo, array visualization, analytics charts, and modern technology graphics.

NumPy: The Foundation of Data Science in Python

NumPy for Absolute Beginners โ€” A Complete Guide with Examples

Before you start: Install NumPy by running pip install numpy in your terminal. Then import it at the top of every script with import numpy as np.


๐Ÿ“‹ Table of Contents

  1. What is NumPy?
  2. Creating Arrays from Python Lists
  3. The arange Method
  4. Eye & Diag Methods
  5. Zeros, Ones & Indices
  6. Slicing & Replicating Arrays
  7. Data Types in NumPy
  8. Shape, Reshape & Flattening
  9. Joining & Splitting Arrays
  10. Sorting & Filtering Arrays
  11. NumPy Random Module
  12. NumPy Distributions

1. What is NumPy?

  • NumPy stands for “Numerical Python.” It is a free, open-source Python library designed to work with numbers, arrays (lists of numbers), and matrices (tables of numbers) very efficiently.
  • It is used in almost every field of science and engineering โ€” from physics simulations to financial modelling to machine learning.
  • Other popular libraries depend on NumPy. Pandas, Matplotlib, scikit-learn, and SciPy all use NumPy under the hood. Learning NumPy first makes all of them easier.
  • Why not just use Python lists? NumPy arrays are much faster and use less memory. On large datasets, NumPy can be 50ร— faster than plain Python lists.
  • The core object is called ndarray (n-dimensional array). Think of it as a super-powered list that can be 1D, 2D, 3D, or even higher dimensions.
# Step 1: Install NumPy (run this in terminal, not in Python)
# pip install numpy

# Step 2: Import NumPy in every script
import numpy as np

# Step 3: Check the version
print(np.__version__)   # e.g. 1.26.4

๐Ÿ’ก The alias np is a universal convention. Every NumPy tutorial in the world uses import numpy as np. Always do this โ€” it saves typing and makes your code readable to everyone.


2. Creating Arrays from Python Lists

  • Use np.array() to convert a Python list into a NumPy array. Wrap your list in square brackets and pass it to the function.
  • A 1D array is like a single row of numbers โ€” similar to a Python list, but with superpowers like math operations on every element at once.
  • A 2D array is like a table (rows and columns). Pass a list of lists โ€” each inner list becomes one row of the table.
  • All elements must be the same data type. If you mix integers and strings, NumPy converts everything to strings automatically.
  • NumPy arrays have a fixed size. Unlike Python lists, you cannot append or remove elements after creation.
import numpy as np

# โ”€โ”€ 1D Array (single row of numbers) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
a = np.array([10, 20, 30, 40, 50])
print("1D Array:", a)
print("Type:", type(a))

# โ”€โ”€ 2D Array (table with rows and columns) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
b = np.array([[1, 2, 3],
              [4, 5, 6]])
print("\n2D Array:\n", b)

# โ”€โ”€ Mixed types โ†’ NumPy auto-converts โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
c = np.array([1, 2.5, 3])
print("\nMixed int+float:", c)   # all become float

Output:

1D Array: [10 20 30 40 50]
Type: <class 'numpy.ndarray'>

2D Array:
[[1 2 3]
 [4 5 6]]

Mixed int+float: [1.  2.5 3. ]

3. The arange Method โ€” Auto-Generate Numbers

  • np.arange() automatically creates a sequence of numbers โ€” just like Python’s built-in range(), but returns a NumPy array.
  • Syntax: np.arange(start, stop, step). The stop value is never included (exclusive). If you only give one number, it starts from 0.
  • You can specify the data type using the dtype parameter. For example, dtype=float gives decimal numbers instead of integers.
  • Useful for creating axes, counters, or any evenly spaced sequence without typing every number manually.
import numpy as np

# arange(stop) โ†’ starts from 0
a = np.arange(5)
print("0 to 4:", a)              # [0 1 2 3 4]

# arange(start, stop) โ†’ 2 up to (but not including) 8
b = np.arange(2, 8)
print("2 to 7:", b)              # [2 3 4 5 6 7]

# arange with float dtype
c = np.arange(1, 5, dtype=float)
print("As floats:", c)           # [1. 2. 3. 4.]

# arange(start, stop, step) โ†’ every 2nd number
d = np.arange(0, 20, 2)
print("Even numbers:", d)        # [ 0  2  4  6  8 10 12 14 16 18]

# Countdown with negative step
e = np.arange(10, 0, -2)
print("Countdown:", e)           # [10  8  6  4  2]

Output:

0 to 4: [0 1 2 3 4]
2 to 7: [2 3 4 5 6 7]
As floats: [1. 2. 3. 4.]
Even numbers: [ 0  2  4  6  8 10 12 14 16 18]
Countdown: [10  8  6  4  2]

4. Eye & Diag Methods โ€” Special Matrices

  • np.eye(n) creates an identity matrix โ€” a square grid of zeros with 1s running diagonally from top-left to bottom-right. Widely used in linear algebra and machine learning.
  • np.eye(n, m) creates a rectangular identity matrix of n rows and m columns. The 1s still start at the top-left diagonal.
  • np.diag([list]) creates a square matrix with the given list on the diagonal and zeros everywhere else.
  • np.diag(existing_2D_array) extracts the diagonal values from a matrix and returns them as a 1D array.
  • The second argument to np.diag() shifts the diagonal โ€” positive shifts it up-right, negative shifts it down-left.
import numpy as np

# Identity matrix 3ร—3
a = np.eye(3)
print("3ร—3 Identity:\n", a)

# Rectangular identity 3ร—5
b = np.eye(3, 5)
print("\n3ร—5 Identity:\n", b)

# Diagonal matrix from a list
c = np.diag([5, 10, 15])
print("\nDiag matrix:\n", c)

# Shifted diagonal
d = np.diag([1, 2, 3], 2)
print("\nShifted diagonal:\n", d)

# Extract diagonal from existing 2D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
e = np.diag(matrix)
print("\nDiagonal of matrix:", e)   # [1 5 9]

Output:

3ร—3 Identity:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Diag matrix:
[[ 5  0  0]
 [ 0 10  0]
 [ 0  0 15]]

Diagonal of matrix: [1 5 9]

5. Zeros, Ones & Indices โ€” Starter Arrays

  • np.zeros(shape) creates an array filled entirely with 0s. Pass a tuple like (rows, cols) for a 2D array, or just an integer for 1D. Useful when you want to build an array and fill it later.
  • np.ones(shape) creates an array filled entirely with 1s. Works the same way. You can multiply it by any number to create an array filled with that number instantly.
  • np.indices(dimensions) creates coordinate arrays. It returns one array per dimension showing the index value at each position โ€” handy for vectorised grid calculations.
  • The dtype parameter works with all three. By default they produce floats (float64). Add dtype=int to get integer zeros and ones.
import numpy as np

# 1D array of zeros
a = np.zeros(5)
print("1D zeros:", a)              # [0. 0. 0. 0. 0.]

# 2D array of zeros (3 rows, 4 columns)
b = np.zeros((3, 4))
print("\nZeros 3ร—4:\n", b)

# 2D array of ones
c = np.ones((2, 3))
print("\nOnes 2ร—3:\n", c)

# Trick: array filled with 7s using ones ร— 7
sevens = np.ones((2, 3)) * 7
print("\nAll 7s:\n", sevens)

# 3D array of zeros
d = np.zeros((2, 3, 2))
print("\n3D zeros shape:", d.shape)  # (2, 3, 2)

# Indices for a 3ร—3 grid
idx = np.indices((3, 3))
print("\nRow indices:\n", idx[0])
print("Col indices:\n", idx[1])

Output:

1D zeros: [0. 0. 0. 0. 0.]

Zeros 3ร—4:
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Ones 2ร—3:
[[1. 1. 1.]
 [1. 1. 1.]]

All 7s:
[[7. 7. 7.]
 [7. 7. 7.]]

6. Slicing & Replicating Arrays

  • Slicing lets you extract parts of an array using the array[start:stop] syntax. Just like Python strings and lists โ€” but it works across multiple dimensions too.
  • Negative indices count from the end. arr[-1] is the last element, arr[-2] is the second-to-last, and so on.
  • โš ๏ธ Warning: slices are views, not copies! If you slice an array and change the slice, the original array changes too. This is a very common beginner mistake.
  • Use .copy() to make an independent copy of a slice if you want to change it without affecting the original.
  • For 2D arrays, use arr[row_start:row_stop, col_start:col_stop] โ€” the comma separates row and column slices.
import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])

# Basic slicing
print(arr[1:4])      # [20 30 40]  (index 1, 2, 3)
print(arr[:3])       # [10 20 30]  (first 3 elements)
print(arr[3:])       # [40 50 60]  (from index 3 onward)
print(arr[-2])       # 50          (2nd from the end)
print(arr[-6:-2])    # [10 20 30 40]

# โš ๏ธ View vs Copy โ€” critical difference!
view  = arr[0:3]          # this is a VIEW (linked to original)
clone = arr[0:3].copy()   # this is a COPY (independent)

view[0] = 999
print("Original changed:", arr[0])    # 999 โ€” view changed original!
print("Clone unchanged:", clone[0])   # 10  โ€” copy is safe!

# 2D Slicing
table = np.array([[1, 2, 3, 4, 5],
                  [6, 7, 8, 9, 10]])

print(table[0:2, 2])      # column index 2 of both rows โ†’ [3 8]
print(table[0:2, 1:4])    # rows 0-1, columns 1-3 โ†’ [[2,3,4],[7,8,9]]

Output:

[20 30 40]
[10 20 30]
[40 50 60]
50
[10 20 30 40]
Original changed: 999
Clone unchanged: 10
[3 8]
[[2 3 4]
 [7 8 9]]

โš ๏ธ The View Trap: NumPy does NOT copy data when you slice โ€” it creates a “view” of the same memory. Changing a slice changes the original. Always use .copy() when you need independence.


7. Data Types in NumPy

  • Every NumPy array has one data type (dtype) โ€” all elements in an array share the same type. This is what makes NumPy so fast.
  • NumPy uses one-letter codes as shorthand. For example, 'i' means integer, 'f' means float, 'U' means unicode string.
  • Set the dtype at creation using the dtype= parameter. You can also convert later using .astype().
  • Choosing the right dtype saves memory. int8 uses only 1 byte; int64 uses 8 bytes. For large arrays, this difference is enormous.
CodeTypeExample
iInteger1, 2, 3
fFloat1.5, 2.7
bBooleanTrue, False
uUnsigned Integer0, 1, 255
cComplex Float1+2j
mTimedeltatime differences
MDatetime2024-01-01
SString (bytes)b'hello'
UUnicode String'hello'
OObjectany Python object
VVoidfixed-size memory chunk
import numpy as np

# Check dtype of an array
a = np.array([1, 2, 3])
print(a.dtype)            # int64 (default integer)

b = np.array([1.5, 2.5])
print(b.dtype)            # float64 (default float)

# Set dtype at creation
c = np.array([1, 2, 3], dtype='f')
print(c)                  # [1. 2. 3.]  โ†’ float32
print(c.dtype)            # float32

d = np.array([1, 2, 3], dtype='i')
print(d.dtype)            # int32

# Boolean array
e = np.array([True, False, True], dtype=bool)
print(e.dtype)            # bool

# Convert type with astype()
f = a.astype(float)
print(f)                  # [1. 2. 3.]

# String array
g = np.array(['apple', 'banana', 'cherry'])
print(g.dtype)            # <U6 (unicode string, max 6 chars)

Output:

int64
float64
[1. 2. 3.]
float32
int32
bool
[1. 2. 3.]
<U6

8. Shape, Reshape & Flattening

  • array.shape tells you the dimensions of an array as a tuple. For a 2D array (3, 4) means 3 rows and 4 columns.
  • array.reshape(new_shape) reorganises the data into a new shape without changing the actual values. The total number of elements must stay the same โ€” 3ร—4 = 12 must still equal 12.
  • Use -1 as a wildcard dimension. NumPy calculates the correct size automatically. reshape(-1, 3) means “however many rows are needed, with 3 columns.”
  • Flattening converts any array to 1D. Use reshape(-1) or .flatten(). The difference: .flatten() always returns a copy; reshape(-1) returns a view when possible.
  • ndmin sets a minimum number of dimensions when creating an array. ndmin=4 ensures the array is at least 4-dimensional.
import numpy as np

# Start with 12 numbers in a 1D array
a = np.arange(1, 13)           # [1, 2, 3, ..., 12]
print("Shape:", a.shape)       # (12,)

# Reshape to 3 rows ร— 4 columns (2D)
b = a.reshape(3, 4)
print("\n3ร—4 grid:\n", b)

# Reshape to 2ร—2ร—3 (3D)
c = a.reshape(2, 2, 3)
print("\n2ร—2ร—3 (3D):\n", c)

# Wildcard -1: NumPy calculates rows automatically
d = a.reshape(-1, 4)           # 12 รท 4 = 3 rows
print("\nAuto rows (-1, 4):\n", d)

# ndmin: ensure at least 4 dimensions
e = np.array([1, 2, 3, 4], ndmin=4)
print("\nndmin=4:", e)
print("Shape:", e.shape)       # (1, 1, 1, 4)

# Flatten back to 1D
flat = b.reshape(-1)
print("\nFlattened:", flat)

# flatten() always returns a copy
flat2 = b.flatten()
flat2[0] = 999
print("b unchanged:", b[0, 0])  # 1 โ€” original safe!

Output:

Shape: (12,)

3ร—4 grid:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Auto rows (-1, 4):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Flattened: [ 1  2  3  4  5  6  7  8  9 10 11 12]
b unchanged: 1

9. Joining & Splitting Arrays

  • np.concatenate() joins arrays along an existing axis. Use axis=0 to stack vertically (add rows), axis=1 to join horizontally (add columns).
  • np.vstack() stacks vertically (row by row) โ€” like piling one table on top of another. Arrays must have the same number of columns.
  • np.hstack() stacks horizontally (column by column) โ€” like putting two tables side by side. Arrays must have the same number of rows.
  • np.dstack() stacks along the third axis (depth-wise) โ€” useful for combining 2D arrays into a 3D structure.
  • np.array_split(arr, n) splits an array into n pieces. If the array can’t be split evenly, the pieces will be different sizes โ€” it won’t throw an error.
  • The axis parameter in split lets you split along rows (axis=0) or columns (axis=1) in 2D arrays.
import numpy as np

# โ”€โ”€ 1D Concatenation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
A = np.array([1, 2, 3])
B = np.array([4, 5, 6])

C = np.concatenate((A, B))
print("Joined 1D:", C)              # [1 2 3 4 5 6]

# โ”€โ”€ 2D Stack Methods โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
X = np.array([[1, 2], [3, 4]])
Y = np.array([[5, 6], [7, 8]])

vjoined = np.vstack((X, Y))         # stack top + bottom
print("\nvstack:\n", vjoined)

hjoined = np.hstack((X, Y))         # stack left + right
print("\nhstack:\n", hjoined)

djoined = np.dstack((X, Y))         # stack into 3D
print("\ndstack shape:", djoined.shape)  # (2, 2, 2)

# โ”€โ”€ Splitting โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
nums = np.arange(1, 10)             # [1, 2, ..., 9]

# Split into 3 equal parts
parts = np.array_split(nums, 3)
print("\nSplit into 3:", parts)

# Uneven split โ€” no error!
uneven = np.array_split(nums, 4)
print("Uneven split:", uneven)

# Split 2D array along columns
arr2d = np.array([[1, 2, 3], [4, 5, 6],
                  [7, 8, 9], [10, 11, 12]])
cols = np.array_split(arr2d, 3, axis=1)
print("\nColumn split:", cols)

Output:

Joined 1D: [1 2 3 4 5 6]

vstack:
[[1 2]
 [3 4]
 [5 6]
 [7 8]]

hstack:
[[1 2 5 6]
 [3 4 7 8]]

Split into 3: [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Uneven split: [array([1, 2, 3]), array([4, 5, 6]), array([7, 8]), array([9])]

10. Sorting & Filtering Arrays

  • np.sort(arr) returns a sorted copy of the array in ascending order. The original array is not modified. For descending, use np.sort(arr)[::-1].
  • For 2D arrays, the axis parameter controls whether each row is sorted separately (axis=1) or each column (axis=0).
  • Filtering uses a boolean array as a mask. Create a True/False array using a condition, then pass it inside square brackets to select only the True elements.
  • Conditions can be combined using & (and) and | (or). Always wrap each condition in parentheses when combining.
  • This is called “boolean indexing” and is one of NumPy’s most powerful features. It is far faster than a Python loop for selecting elements.
import numpy as np

nums = np.array([42, 7, 19, 3, 55, 28])

# Sort ascending
print("Sorted:", np.sort(nums))           # [ 3  7 19 28 42 55]

# Sort descending
print("Descending:", np.sort(nums)[::-1]) # [55 42 28 19  7  3]

# Sort 2D โ€” each row independently
matrix = np.array([[13, 12, 14],
                   [15, 10, 11]])
print("\nSorted 2D (rows):\n", np.sort(matrix))

# โ”€โ”€ Boolean Filtering โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
scores = np.array([45, 82, 61, 90, 33, 75])

# Step 1: create a boolean mask
mask = scores > 60
print("\nMask:", mask)          # [F  T  T  T  F  T]

# Step 2: apply the mask
passing = scores[mask]
print("Passing scores:", passing)   # [82 61 90 75]

# One-liner filtering
evens = scores[scores % 2 == 0]
print("Even scores:", evens)        # [82 90]

# Combined condition: > 50 AND even
combo = scores[(scores > 50) & (scores % 2 == 0)]
print("Over 50 & even:", combo)     # [82 90]

# OR condition
low_or_high = scores[(scores < 40) | (scores > 80)]
print("Low or High:", low_or_high)  # [82 90 33]

Output:

Sorted: [ 3  7 19 28 42 55]
Descending: [55 42 28 19  7  3]

Sorted 2D (rows):
[[12 13 14]
 [10 11 15]]

Mask: [False  True  True  True False  True]
Passing scores: [82 61 90 75]
Even scores: [82 90]
Over 50 & even: [82 90]
Low or High: [82 90 33]

11. NumPy Random Module

  • NumPy’s random sub-module generates pseudo-random numbers. “Pseudo” means they are generated by an algorithm โ€” they look random but are reproducible if you set the same seed.
  • random.randint(high, size=n) generates random integers from 0 up to (but not including) high. Use the size parameter to get multiple numbers at once, or pass a tuple like size=(3, 2) for a 2D array.
  • random.rand(n) generates n random floats between 0 and 1 (uniform distribution). For a 2D array, use random.rand(rows, cols).
  • random.choice(arr) picks a random element from an existing array. Use size= to pick multiple, and p= to assign probabilities to each element.
  • random.shuffle(arr) rearranges elements in place (modifies the original array). random.permutation(arr) does the same but returns a new array, leaving the original unchanged.
from numpy import random
import numpy as np

# Set seed for reproducible results
random.seed(42)

# Random integer 0 to 99
print(random.randint(100))               # single integer

# 5 random integers between 0 and 49
print(random.randint(50, size=5))

# 2D array of random integers
print(random.randint(10, size=(3, 2)))

# Random float between 0 and 1
print(random.rand())

# 1D array of 4 random floats
print(random.rand(4))

# 2D array of random floats
print(random.rand(2, 3))

# Pick one from a list
fruits = np.array(['apple', 'banana', 'cherry', 'mango'])
print(random.choice(fruits))

# Pick 3 with equal probability
print(random.choice(fruits, size=3))

# Pick 5 with custom probabilities (must sum to 1.0)
print(random.choice(fruits, size=5, p=[0.5, 0.2, 0.2, 0.1]))

# Shuffle vs Permutation
nums = np.array([1, 2, 3, 4, 5])
random.shuffle(nums)               # modifies nums in place
print("Shuffled:", nums)

original = np.array([1, 2, 3, 4, 5])
perm = random.permutation(original)  # returns new array
print("Permutation:", perm)
print("Original safe:", original)    # unchanged!

๐Ÿ’ก Reproducible randomness: Use np.random.seed(42) before generating random numbers. The same seed always produces the same sequence โ€” essential for reproducible machine learning experiments.


12. NumPy Distributions

  • A distribution describes the pattern of how random numbers are spread. Real-world data rarely follows a purely uniform pattern โ€” heights, test scores, and errors all follow specific distributions.
  • Normal Distribution random.normal(loc, scale, size) โ€” the classic “bell curve.” Most values cluster around the mean (loc), spreading based on standard deviation (scale). Human heights follow this pattern.
  • Binomial Distribution random.binomial(n, p, size) โ€” counts successes in n trials where each has probability p. Example: how many heads in 10 coin flips.
  • Poisson Distribution random.poisson(lam, size) โ€” counts how many times something happens in a fixed interval. Example: number of customers arriving per hour.
  • Uniform Distribution random.uniform(low, high, size) โ€” every value in the range is equally likely. A dice roll is a discrete version of this.
  • Exponential Distribution random.exponential(scale, size) โ€” models the time between events. Example: time between phone calls at a call centre.
  • Chi-Square Distribution random.chisquare(df, size) โ€” used heavily in statistical hypothesis testing. The df parameter is the degrees of freedom.
from numpy import random

# โ”€โ”€ Normal Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# mean=170 (avg height in cm), std=10, generate 6 samples
heights = random.normal(loc=170, scale=10, size=6)
print("Heights (cm):", heights.round(1))

# โ”€โ”€ Binomial Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# 10 coin flips (50% heads chance), repeated 8 times
flips = random.binomial(n=10, p=0.5, size=8)
print("\nHeads per 10 flips:", flips)

# โ”€โ”€ Poisson Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Average 3 customers per hour, simulate 6 hours
arrivals = random.poisson(lam=3, size=6)
print("\nCustomers per hour:", arrivals)

# โ”€โ”€ Uniform Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# 4 random numbers between 10 and 20
uniforms = random.uniform(low=10, high=20, size=4)
print("\nUniform (10-20):", uniforms.round(2))

# โ”€โ”€ Exponential Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Average wait = 5 minutes, simulate 5 wait times
waits = random.exponential(scale=5, size=5)
print("\nWait times (min):", waits.round(2))

# โ”€โ”€ Chi-Square Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Degrees of freedom = 2, 5 samples
chi = random.chisquare(df=2, size=5)
print("\nChi-square (df=2):", chi.round(2))

# โ”€โ”€ Random Distribution with Probabilities โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# 60% chance of 6, 30% chance of 4, 10% chance of 2
result = random.choice([2, 4, 6, 8],
                       p=[0.1, 0.3, 0.6, 0.0],
                       size=10)
print("\nWeighted choice:", result)

Output (values vary each run):

Heights (cm): [163.1 175.4 168.8 181.2 159.7 172.3]
Heads per 10 flips: [5 6 4 7 5 3 6 5]
Customers per hour: [2 4 1 3 5 2]
Uniform (10-20): [14.23 11.87 18.94 13.56]
Wait times (min): [3.12 7.84 2.56 9.23 1.44]
Chi-square (df=2): [0.54 2.13 1.87 4.21 0.93]
Weighted choice: [6 6 4 6 6 4 6 2 6 6]

๐ŸŽฏ When to use which distribution?

  • Normal โ€” measurements like heights, weights, test scores
  • Binomial โ€” yes/no experiments repeated many times
  • Poisson โ€” counting events over a fixed time period
  • Uniform โ€” when all outcomes are equally likely
  • Exponential โ€” time between random events

13. Quick Reference Table

MethodWhat It DoesExample
np.array()Create array from listnp.array([1, 2, 3])
np.arange()Sequence of numbersnp.arange(0, 10, 2)
np.zeros()Array of all 0snp.zeros((3, 3))
np.ones()Array of all 1snp.ones((2, 4))
np.eye()Identity matrixnp.eye(4)
np.diag()Diagonal matrix / extract diagonalnp.diag([1, 2, 3])
np.indices()Coordinate index arraysnp.indices((3, 3))
.shapeDimensions of arrayarr.shape
.reshape()Change shapearr.reshape(3, 4)
.flatten()Collapse to 1D (copy)arr.flatten()
.astype()Convert data typearr.astype(float)
.copy()Independent copyarr.copy()
np.concatenate()Join arraysnp.concatenate((A, B))
np.vstack()Stack verticallynp.vstack((A, B))
np.hstack()Stack horizontallynp.hstack((A, B))
np.dstack()Stack depth-wisenp.dstack((A, B))
np.array_split()Split into partsnp.array_split(arr, 3)
np.sort()Sort arraynp.sort(arr)
arr[condition]Filter elementsarr[arr > 5]
random.seed()Set random seedrandom.seed(42)
random.randint()Random integersrandom.randint(100)
random.rand()Random floats 0โ€“1random.rand(3)
random.choice()Pick from arrayrandom.choice(arr)
random.shuffle()Shuffle in placerandom.shuffle(arr)
random.permutation()Shuffle, return newrandom.permutation(arr)
random.normal()Normal distributionrandom.normal(0, 1, 100)
random.binomial()Binomial distributionrandom.binomial(10, 0.5, 5)
random.poisson()Poisson distributionrandom.poisson(3, 10)
random.uniform()Uniform distributionrandom.uniform(0, 10, 5)
random.chisquare()Chi-square distributionrandom.chisquare(2, 5)

Data Type Quick Reference

CodeFull NameMemoryRange
i / int3232-bit integer4 bytesโˆ’2B to +2B
i / int6464-bit integer8 bytesvery large
f / float3232-bit float4 bytes~7 decimal digits
f / float6464-bit float8 bytes~15 decimal digits
b / boolBoolean1 byteTrue / False
u / uint8Unsigned 8-bit int1 byte0 to 255
SByte stringvariesASCII text
UUnicode stringvariesall text
MDatetime648 bytesdates and times

Written by CSC Pallavaram โ€” Data Science using Python

Next topics: Pandas Module ยท Matplotlib Visualization ยท Machine Learning Basics

Leave a Reply

Your email address will not be published. Required fields are marked *