Cat vs Dog Image Classifier Using Deep Learning
Concepts Covered:
built a real AI model that can look at any image and tell you whether it’s a cat or a dog:
- What deep learning actually is (in plain English)
- How to use Google Colab for free GPU power
- How to train your first neural network
- How to test your model with real images
Prerequisites: Basic Python knowledge (variables, loops, functions). That’s it!
Time Required: 25-35 minutes
Cost: $0 (completely free using Google Colab)
Before We Start: Understanding the Key Concepts
Before diving into code, let’s understand what we’re actually doing. Think of this section as your foundation – skip it, and you’ll be confused later!
What is Deep Learning?
Simple explanation: Deep learning is teaching a computer to recognize patterns by showing it thousands of examples.
Imagine teaching a child what a cat looks like. You don’t explain “cats have pointy ears and whiskers.” Instead, you show them 100 pictures of cats, and their brain learns the pattern automatically. Deep learning works the same way.
Traditional programming vs Deep Learning:
- Traditional: You write rules (“if it has 4 legs AND pointy ears → cat”)
- Deep learning: You show examples, the computer creates its own rules
What is a Neural Network?
A neural network is a computer system inspired by how your brain works. Your brain has neurons (brain cells) connected together. When you see a cat, neurons fire in sequence to recognize it.
An artificial neural network mimics this:
- Input layer: Receives the image (pixel values)
- Hidden layers: Processes patterns (edges, shapes, textures)
- Output layer: Makes the final decision (cat or dog?)
What is a CNN (Convolutional Neural Network)?
CNNs are specialized neural networks for images. Here’s why regular neural networks struggle with images:
Problem: A 224×224 pixel color image has 150,528 numbers (224 × 224 × 3 colors). A regular neural network would need millions of connections – too complex!
Solution: CNNs use “convolution” – they scan the image in small patches (like reading a book word by word, not all at once). This dramatically reduces complexity.
What CNNs learn in each layer:
- Layer 1: Simple edges and lines
- Layer 2: Shapes and curves
- Layer 3: Object parts (eyes, ears, fur texture)
- Layer 4: Whole objects (entire cat face)
What is Transfer Learning? (The Secret Weapon!)
Here’s the breakthrough that makes this project possible:
The Problem: Training a CNN from scratch needs:
- 100,000+ images
- Powerful computers
- Days of training time
- Expert knowledge
The Solution – Transfer Learning:
Think of it like this: You want to become a chef specializing in Italian food. Do you:
- Option A: Learn cooking from zero (5 years)
- Option B: Study with an expert chef who already knows cooking basics, then specialize in Italian cuisine (6 months)
Transfer learning is Option B for AI.
We’ll use MobileNetV2, a neural network that Google already trained on 14 million images from 1000 categories (cars, planes, animals, furniture, etc.). It already knows:
- What edges look like
- How to detect shapes
- What fur, eyes, and ears are
We just teach it the final step: “This combination means cat, that means dog.”
Result: Instead of needing 100,000 images and days of training, we need just 1,000 images and 5 minutes!
Understanding the Modules We’ll Use
Let’s break down every library and why we need it:
1. TensorFlow & Keras
- What it is: TensorFlow is Google’s deep learning framework. Keras is its user-friendly interface.
- Why we need it: Builds and trains neural networks
- Analogy: TensorFlow is the engine, Keras is the steering wheel
2. PIL (Python Imaging Library)
- What it is: A library for opening, manipulating images
- Why we need it: To load and display images
- What it does: Converts image files into arrays of numbers that computers understand
3. Matplotlib
- What it is: A plotting library (like Excel charts for Python)
- Why we need it: To visualize images and results
- What we’ll use it for: Displaying cat/dog images
4. ImageDataGenerator
- What it is: A Keras tool that feeds images to the neural network
- Why it’s powerful: Automatically handles batching, shuffling, and splitting data
- Real benefit: You don’t manually write code to load 1000 images – it does it automatically!
5. MobileNetV2
- What it is: A pre-trained CNN architecture by Google
- Why this one: It’s small, fast, and accurate (perfect for beginners)
- Alternative models: ResNet, VGG16, InceptionV3 (we can discuss these later)
6. NumPy
- What it is: Python’s numerical computation library
- Why we need it: Images are stored as arrays of numbers; NumPy handles array math
- Example: An image is a 224×224×3 array (height × width × color channels)
Step-by-Step Tutorial: Building Your Cat vs Dog Classifier
Now let’s build this! Follow each command in order.
COMMAND 1: Download the Dataset
What we’re doing: Getting 1000 cat and dog images for training
Instructions:
- Click this link:
https://drive.google.com/drive/folders/1NfvqNLyvT2uBNBYB-9PS97w71RcUNsPBorhttps://www.kaggle.com/datasets/anthonytherrien/dog-vs-cat - Download the entire folder to your computer
- You’ll get two folders:
cats(500 images) anddogs(500 images)
Why 1000 images? More is better, but 500 per category is the minimum for decent accuracy with transfer learning.
COMMAND 2: Set Up Google Colab
What is Google Colab?
- A free cloud-based Jupyter notebook
- Gives you free access to GPUs (graphics cards that train AI 10x faster)
- No installation needed – runs in your browser!
Steps:
- Go to
https://colab.research.google.com/ - Click “New Notebook”
- You’ll see a blank coding environment
Pro tip: Colab automatically disconnects after 90 minutes of inactivity. Don’t worry – your code is saved!
COMMAND 3: Upload Dataset to Colab
What we’re doing: Creating folders and uploading images
Steps:
- Click the folder icon on the left sidebar (Files panel)
- Right-click in the file area → New folder → Name it
animals - Inside
animals, create two folders:catsanddogs - Upload images:
- Click on the
catsfolder → Upload button → Select all cat images - Click on the
dogsfolder → Upload button → Select all dog images
- Click on the
Wait for upload to complete! You’ll see a progress indicator. 1000 images take 2-5 minutes depending on your internet speed.
Why this folder structure?
animals/
├── cats/
│ ├── cat1.jpg
│ ├── cat2.jpg
│ └── ...
└── dogs/
├── dog1.jpg
├── dog2.jpg
└── ...
ImageDataGenerator reads this structure automatically and knows:
- Everything in
cats/folder → label = “cat” - Everything in
dogs/folder → label = “dog”
COMMAND 4: Verify Upload
What we’re doing: Checking that files uploaded correctly
Code to run:
!ls
What this means:
!tells Colab to run a terminal command (not Python)ls= “list” (shows all files and folders)
Expected output:
animals sample_data
Explanation: You should see animals folder. sample_data is a default Colab folder (ignore it).
COMMAND 5: Verify Folder Contents
Code to run:
!ls animals
Expected output:
cats dogs
What this confirms: Both subfolders exist inside animals
COMMAND 6: Display a Sample Image (Sanity Check)
What we’re doing: Making sure images are readable and correctly uploaded
Code to run:
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open("/content/animals/cats/00000-4122619873.png")
plt.imshow(img)
plt.axis("off")
Line-by-line breakdown:
Line 1-2: Import libraries
from PIL import Image→ Imports the Image class from PIL libraryimport matplotlib.pyplot as plt→ Imports plotting functions
Line 4: img = Image.open("/content/animals/cats/00000-4122619873.png")
Image.open()→ Opens the image file/content/→ Default Colab working directory00000-4122619873.png→ Replace with any filename from your cats folder
Line 5: plt.imshow(img)
imshow= “image show”- Displays the image
Line 6: plt.axis("off")
- Hides the x and y axis numbers (makes it cleaner)
Expected output: You should see a cat image displayed!
Troubleshooting:
- Error: “No such file” → Check your filename exactly matches
- No image appears → Run
plt.show()after the code
COMMAND 7: List Sample Files
What we’re doing: Viewing the first 10 files in each folder to verify variety
Code to run:
!ls animals/cats | head
!ls animals/dogs | head
What this means:
!ls animals/cats→ List all files in cats folder| head→ Show only the first 10 (otherwise it’d show all 500!)
Expected output:
00000-4122619873.png
00001-2847563902.png
00002-1928374650.png
...
(10 files total)
Why this matters: Confirms you have multiple images, not just one test file.
COMMAND 8: Import ImageDataGenerator and Prepare Data
What we’re doing: Setting up the data pipeline that feeds images to our neural network
Code to run:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
# Remove unwanted .ipynb_checkpoints directory if it exists
if os.path.exists("/content/animals/.ipynb_checkpoints"):
!rm -rf "/content/animals/.ipynb_checkpoints"
print("Removed .ipynb_checkpoints directory.")
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.2
)
train_data = datagen.flow_from_directory(
"/content/animals",
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="binary",
subset="training"
)
val_data = datagen.flow_from_directory(
"/content/animals",
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="binary",
subset="validation"
)
DETAILED BREAKDOWN – READ CAREFULLY!
Lines 1-2: Import necessary modules
ImageDataGenerator→ The tool that handles image loadingos→ Operating system module (to check if folders exist)
Lines 4-7: Clean up hidden files
- Jupyter creates hidden
.ipynb_checkpointsfolders that confuse ImageDataGenerator os.path.exists()→ Checks if the folder exists!rm -rf→ Removes the folder forcefully (if found)- This prevents the error: “Found 3 classes instead of 2”
Line 9: IMG_SIZE = (224, 224)
- All images must be the same size for neural networks
- 224×224 pixels is the standard for MobileNetV2
- Your original images might be 800×600, 1920×1080, etc. – they’ll auto-resize
Line 10: BATCH_SIZE = 32
- What is a batch? Instead of feeding 1 image at a time, we feed 32 together
- Why? GPUs process multiple images in parallel (faster training)
- Analogy: Washing dishes one-by-one vs loading the dishwasher
Lines 12-15: Create the ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.2
)
Parameter explanations:
rescale=1./255 – SUPER IMPORTANT!
- Images are stored as pixels with values 0-255 (black to white)
- Neural networks work best with values 0-1 (normalized)
1./255divides every pixel by 255 → converts 0-255 to 0-1- Example: Pixel value 127 (gray) becomes 127/255 = 0.498
validation_split=0.2
- Splits data into 80% training, 20% validation
- Training data: Used to teach the model
- Validation data: Used to test if it learned correctly
- Why split? Prevents “memorization” – we want the model to generalize, not just remember training images
Lines 17-23: Create training data generator
train_data = datagen.flow_from_directory(
"/content/animals",
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="binary",
subset="training"
)
What flow_from_directory does: Automatically:
- Scans the
/content/animalsfolder - Finds subfolders (
cats,dogs) - Labels images based on folder name
- Resizes all images to 224×224
- Creates batches of 32 images
- Shuffles images randomly (prevents learning order bias)
Parameter breakdown:
"/content/animals" → Path to parent folder containing class subfolders
target_size=IMG_SIZE → Resize all images to (224, 224)
batch_size=BATCH_SIZE → Load 32 images at a time
class_mode="binary" → We have 2 classes (binary classification)
- Binary = Cat (0) or Dog (1)
- Alternative:
categoricalfor 3+ classes (cat, dog, bird)
subset="training" → Use the 80% split for training
Lines 25-31: Create validation data generator
- Same code, but
subset="validation"uses the 20% split - This is the “test” data we use to check accuracy
Expected output:
Found 800 images belonging to 2 classes.
Found 200 images belonging to 2 classes.
What this means:
- 800 images in training set (80% of 1000)
- 200 images in validation set (20% of 1000)
- 2 classes detected: cats and dogs
Common error fix: If it says “Found 3 classes” – the .ipynb_checkpoints cleanup didn’t work. Manually delete that folder from the file panel.
COMMAND 9: Import Deep Learning Libraries
Code to run:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models
What we’re importing:
MobileNetV2 → The pre-trained model
- Already trained on 14 million images
- Knows 1000 categories (dogs, cats, cars, planes, etc.)
- We’ll fine-tune it for our specific task
layers → Building blocks of neural networks
Dense→ Fully connected layerGlobalAveragePooling2D→ Compresses image features
models → Framework to combine layers
Sequential→ Stack layers in sequence (like Lego blocks)
Expected output: Nothing! If no error appears, the import worked.
COMMAND 10: Load Pre-trained MobileNetV2 Model
What we’re doing: Loading Google’s pre-trained model as our foundation
Code to run:
base_model = MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights="imagenet"
)
base_model.trainable = False
LINE-BY-LINE BREAKDOWN:
Line 1: base_model = MobileNetV2(...)
- Creates the MobileNetV2 neural network
- Stores it in the variable
base_model
Parameter: input_shape=(224, 224, 3)
- Tells the model to expect 224×224 pixel images
3= RGB color channels (Red, Green, Blue)- Why these numbers? MobileNetV2 was designed for this size
Parameter: include_top=False – CRITICAL!
- “Top” = The final classification layer
- The original model classifies 1000 categories
- We remove it because we only need 2 categories (cat vs dog)
- Analogy: Using a Swiss Army knife but removing the corkscrew (we don’t need it)
Parameter: weights="imagenet"
- Loads the pre-trained weights (learned patterns)
- ImageNet = Dataset of 14 million images
- Without this: Random starting point (would need 100k images to train)
- With this: Starts with expert knowledge (only needs 1k images to fine-tune)
Line 6: base_model.trainable = False
- Freezes the pre-trained layers
- Means: “Don’t change what you already learned”
- Why? MobileNetV2 already knows edges, shapes, textures – we keep that knowledge
- We’ll only train the new layers we add next
What happens behind the scenes:
- Downloads MobileNetV2 weights (~14 MB file)
- Loads 88 layers of neural network
- Sets all 88 layers to “frozen” mode
COMMAND 11: Add Custom Classification Layers
What we’re doing: Adding our own “brain” on top of MobileNetV2 to make cat vs dog decisions
Code to run:
model = models.Sequential([
base_model,
layers.GlobalAveragePooling2D(),
layers.Dense(1, activation="sigmoid")
])
LAYER-BY-LAYER EXPLANATION:
Line 1: model = models.Sequential([...])
Sequential= Stack layers in order (like a pipeline)- Data flows: Input → Layer 1 → Layer 2 → Layer 3 → Output
Layer 1: base_model
- The frozen MobileNetV2 (88 layers)
- Output: A 7×7×1280 tensor (compressed image features)
- Think of it as: “Here are 1280 features I detected in this image”
Layer 2: layers.GlobalAveragePooling2D()
- What it does: Compresses the 7×7×1280 tensor into a 1×1280 vector
- How: Takes the average of each 7×7 grid
- Why: Neural networks need fixed-size input; this standardizes it
- Analogy: Reading a book chapter and writing 1280 key points
Layer 3: layers.Dense(1, activation="sigmoid")
Dense= Fully connected layer (every input connects to every output)1= One output neuron- Why 1? Binary classification needs 1 output:
- Output close to 0 → Cat
- Output close to 1 → Dog
activation="sigmoid"→ Squashes output to 0-1 range (probability)
The sigmoid function:
Input: any number (-∞ to +∞)
Output: number between 0 and 1
Example: sigmoid(-5) = 0.007 (almost 0 → cat)
sigmoid(+5) = 0.993 (almost 1 → dog)
Full data flow:
Input image (224×224×3)
↓
MobileNetV2 (extracts 1280 features)
↓
GlobalAveragePooling (compresses to 1280 numbers)
↓
Dense layer (makes decision: 0-1)
↓
Output: 0.12 → Cat! (88% confident)
0.87 → Dog! (87% confident)
COMMAND 12: Compile the Model
What we’re doing: Configuring HOW the model will learn
Code to run:
model.compile(
optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"]
)
PARAMETER EXPLANATIONS:
optimizer="adam"
- What’s an optimizer? The algorithm that adjusts the neural network’s weights
- Why Adam? It’s the most popular – smart, fast, works well for most cases
- Alternatives: SGD (slower but sometimes better), RMSprop
- Analogy: Adam is like cruise control – automatically adjusts speed for optimal performance
Technical detail (optional): Adam stands for “Adaptive Moment Estimation” – it adapts the learning rate for each parameter automatically.
loss="binary_crossentropy"
- What’s a loss function? Measures how “wrong” the model’s predictions are
- Binary crossentropy = Standard loss for 2-class classification
- How it works:
- Prediction: 0.9 (dog), Actual: dog → Low loss (good!)
- Prediction: 0.1 (cat), Actual: dog → High loss (bad!)
- Goal: Training minimizes this loss
Why “crossentropy”? It comes from information theory – measures the difference between two probability distributions.
metrics=["accuracy"]
- Tracks accuracy during training
- Accuracy = Percentage of correct predictions
- Example: 80/100 correct = 80% accuracy
- This is just for monitoring – doesn’t affect training
What compilation does:
- Prepares the computational graph
- Allocates memory on GPU
- Sets up the optimization algorithm
- Ready to train!
COMMAND 13: Train the Model (The Magic Happens!)
What we’re doing: Actually teaching the model to recognize cats vs dogs
Code to run:
history = model.fit(
train_data,
validation_data=val_data,
epochs=5
)
PARAMETER BREAKDOWN:
train_data
- The training images (800 images, 80% of dataset)
- Model learns from these
validation_data=val_data
- The validation images (200 images, 20% of dataset)
- Tests the model after each epoch
- Crucial: Detects if model is just memorizing vs actually learning
epochs=5
- What’s an epoch? One complete pass through all training images
- 5 epochs means: The model sees all 800 images 5 times
- Why 5? Balance between:
- Too few (2-3) → Underfitting (doesn’t learn enough)
- Too many (20+) → Overfitting (memorizes instead of learning)
What happens during training:
Epoch 1:
- Shows 800 images to the model (in batches of 32)
- Model makes predictions
- Calculates loss (how wrong it was)
- Adjusts weights to reduce loss
- Tests on 200 validation images
- Reports accuracy
Epoch 2-5: Repeats the process
Expected output:
Epoch 1/5
25/25 [==============================] - 45s 2s/step - loss: 0.4521 - accuracy: 0.7875 - val_loss: 0.2134 - val_accuracy: 0.9150
Epoch 2/5
25/25 [==============================] - 42s 2s/step - loss: 0.2108 - accuracy: 0.9125 - val_loss: 0.1456 - val_accuracy: 0.9450
Epoch 3/5
25/25 [==============================] - 41s 2s/step - loss: 0.1523 - accuracy: 0.9375 - val_loss: 0.1123 - val_accuracy: 0.9550
Epoch 4/5
25/25 [==============================] - 40s 2s/step - loss: 0.1234 - accuracy: 0.9500 - val_loss: 0.0987 - val_accuracy: 0.9650
Epoch 5/5
25/25 [==============================] - 39s 2s/step - loss: 0.1087 - accuracy: 0.9587 - val_loss: 0.0892 - val_accuracy: 0.9700
UNDERSTANDING THE OUTPUT:
25/25 → Total batches (800 images ÷ 32 per batch = 25)
45s 2s/step → Total time and time per batch
loss: 0.4521 → Training loss (decreases each epoch = good!)
accuracy: 0.7875 → Training accuracy (78.75%)
val_loss: 0.2134 → Validation loss
val_accuracy: 0.9150 → Validation accuracy (91.5%) ← THIS IS KEY!
WHAT TO LOOK FOR:
✅ Good signs:
- Accuracy increasing each epoch
- Loss decreasing each epoch
- Validation accuracy close to training accuracy (within 5-10%)
⚠️ Warning signs:
- Overfitting: Training accuracy 98%, validation 70% (model memorized training data)
- Underfitting: Both accuracies stuck at 60% (model isn’t learning)
Why validation accuracy > training accuracy here?
- We froze MobileNetV2 layers (they don’t change)
- Only training 1 layer (the final Dense layer)
- Sometimes validation gets lucky with easier examples
Training time: 3-4 minutes on Colab’s free GPU (vs 30+ minutes on CPU)
COMMAND 14: View Model Architecture
What we’re doing: Seeing the complete structure of our neural network
Code to run:
model.summary()
Expected output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenetv2_1.00_224 (Functional) (None, 7, 7, 1280) 2,257,984
_________________________________________________________________
global_average_pooling2d (GlobalAveragePooling2D) (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 1) 1,281
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________
WHAT THIS MEANS:
Layer 1: MobileNetV2
- Output Shape:
(None, 7, 7, 1280)None= batch size (variable)7×7= spatial dimensions (compressed from 224×224)1280= feature maps (detectors for different patterns)
- Params: 2,257,984 parameters (weights)
- Status: Non-trainable (frozen)
Layer 2: GlobalAveragePooling2D
- Output Shape:
(None, 1280)– flattened to a vector - Params: 0 (it’s just averaging, no learnable weights)
Layer 3: Dense
- Output Shape:
(None, 1)– single output (cat vs dog) - Params: 1,281
- Why 1,281? (1280 inputs × 1 output) + 1 bias = 1,281
Key insight:
- Total params: 2,259,265
- Trainable: Only 1,281 (0.05%!)
- Non-trainable: 2,257,984 (frozen MobileNetV2)
This is the power of transfer learning – we only train 0.05% of the network but get 95%+ accuracy!
COMMAND 15: Save the Model (.h5 format)
What we’re doing: Saving your trained model so you don’t lose it when Colab disconnects
Code to run:
model.save("/content/dog_cat_model.h5")
What happens:
- Creates a file
dog_cat_model.h5(about 9 MB) - Contains the entire model (architecture + weights)
.h5= HDF5 format (older Keras format)
You’ll see this warning:
WARNING:absl:You are saving your model as an HDF5 file via `model.save()`.
This file format is considered legacy. We recommend using instead the native
Keras format, e.g. `model.save('my_model.keras')`.
What this means: .h5 still works but is outdated. Let’s use the new format!
COMMAND 16: Save the Model (.keras format – RECOMMENDED)
Code to run:
model.save("/content/dog_cat_model.keras")
Why .keras is better:
- Modern format (future-proof)
- Faster loading
- Better compression
- Official Keras recommendation
To download the model:
- Click the folder icon (left sidebar)
- Find
dog_cat_model.keras - Click the 3 dots → Download
- Save it on your computer!
Now you can:
- Share your model with friends
- Load it later without retraining
- Use it in a web app or mobile app
COMMAND 17: Create Upload Button for Testing
What we’re doing: Adding a button to upload test images from your computer
Code to run:
from google.colab import files
uploaded = files.upload()
What this does:
- Creates a “Choose Files” button
- Click it and select any cat or dog image from your computer
- Uploads the image to Colab
- Stores the filename in the
uploadedvariable
Expected behavior:
- You’ll see a file picker dialog
- Select an image (JPG, PNG, etc.)
- Wait for upload (shows progress bar)
- When done, the cell completes
Pro tip: Test with images NOT from your training set! Use:
- Photos from Google Images
- Your own pet photos
- Random internet images
Why? This tests if the model generalizes (works on new, unseen images).
COMMAND 18: Make Predictions on Your Test Image
What we’re doing: The moment of truth – testing if your AI actually works!
Code to run:
python
import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt
IMG_SIZE = (224, 224)
img_path = list(uploaded.keys())[0] # automatically gets file name
img = image.load_img(img_path, target_size=IMG_SIZE)
img_array = image.img_to_array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)
prediction = model.predict(img_array)
plt.imshow(img)
plt.axis("off")
if prediction[0][0] > 0.5:
print("Prediction: Dog 🐶")
else:
print("Prediction: Cat 🐱")
COMPLETE LINE-BY-LINE BREAKDOWN:
Lines 1-3: Import necessary libraries
python
import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt
numpy→ For array operationsimage→ Keras image utilities for loading/processingmatplotlib→ For displaying the image
Line 5: IMG_SIZE = (224, 224)
- Same size we used for training
- Critical: Model expects 224×224 images
Line 7: img_path = list(uploaded.keys())[0]
uploadedis a dictionary:{'filename.jpg': file_data}.keys()gets the filename[0]gets the first (and only) uploaded file- Result: Stores the filename string (e.g., “my_dog.jpg”)
Line 9: img = image.load_img(img_path, target_size=IMG_SIZE)
- Loads the image from the file
target_size=IMG_SIZE→ Automatically resizes to 224×224- Why resize? Your image might be 4000×3000 or 800×600 – model needs exactly 224×224
Line 10: img_array = image.img_to_array(img) / 255.0
image.img_to_array(img)→ Converts image to NumPy array- Result: Shape (224, 224, 3) – height × width × RGB channels
- Values: 0-255 (raw pixel values)
/ 255.0→ Normalizes to 0-1 range- Why? Remember, we trained with normalized images (rescale=1./255)
- Must match training preprocessing!
Line 11: img_array = np.expand_dims(img_array, axis=0)
- Critical transformation!
- Before: Shape (224, 224, 3) – single image
- After: Shape (1, 224, 224, 3) – batch of 1 image
- Why? Model expects batches, even if batch size = 1
axis=0adds a new dimension at the front
Visual representation:
Original: [[[R, G, B], [R, G, B], ...]] ← 2D grid of pixels
After: [[[[R, G, B], [R, G, B], ...]]] ← Batch containing 1 image
Line 13: prediction = model.predict(img_array)
- THE PREDICTION HAPPENS HERE!
- Model processes the image through all layers
- Returns: Array of probabilities
- Shape:
[[0.87]](2D array with one value)
How prediction works internally:
- Image goes through MobileNetV2 (extracts 1280 features)
- GlobalAveragePooling compresses features
- Dense layer with sigmoid outputs probability
- Output: Number between 0 and 1
Line 15-16: Display the image
python
plt.imshow(img)
plt.axis("off")
- Shows the uploaded image
- Turns off axis numbers for cleaner display
Lines 18-21: Interpret the prediction
python
if prediction[0][0] > 0.5:
print("Prediction: Dog 🐶")
else:
print("Prediction: Cat 🐱")
Understanding prediction[0][0]:
prediction=[[0.87]](2D array)prediction[0]=[0.87](first element of outer array)prediction[0][0]=0.87(actual probability value)
Decision logic:
- If > 0.5: More likely a dog (closer to 1)
- If < 0.5: More likely a cat (closer to 0)
- Threshold 0.5 is standard for binary classification
Example outputs:
prediction[0][0] = 0.92 → "Prediction: Dog 🐶" (92% confident)
prediction[0][0] = 0.13 → "Prediction: Cat 🐱" (87% confident it's NOT a dog)
prediction[0][0] = 0.51 → "Prediction: Dog 🐶" (barely, only 51% confident)
Expected full output:
- Image of your uploaded photo
- Text: “Prediction: Dog 🐶” or “Prediction: Cat 🐱”
Complete Command Summary with Correct Order
Here’s the complete numbered sequence of all commands:
SETUP PHASE:
- Download dataset from Google Drive link
- Open Google Colab and create new notebook
- Create folder structure and upload images to Colab
VERIFICATION PHASE: 4. !ls – Verify main folders 5. !ls animals – Verify subfolder structure 6. Display sample image with PIL and matplotlib 7. !ls animals/cats | head and !ls animals/dogs | head – List sample files
DATA PREPARATION PHASE: 8. Import ImageDataGenerator and prepare train/validation data splits 9. Import TensorFlow/Keras libraries (MobileNetV2, layers, models)
MODEL BUILDING PHASE: 10. Load pre-trained MobileNetV2 base model 11. Add custom classification layers on top 12. Compile the model with optimizer, loss, and metrics
TRAINING PHASE: 13. Train the model with model.fit() 14. View model architecture with model.summary()
SAVING PHASE: 15. Save model as .h5 file (legacy format) 16. Save model as .keras file (recommended format)
TESTING PHASE: 17. Create upload button for test images 18. Make predictions on uploaded images
What You’ve Accomplished! 🎉
Congratulations! You’ve just built a real AI model from scratch. Let’s recap what you’ve learned:
Technical Skills Gained:
✅ Set up a cloud-based GPU environment (Google Colab)
✅ Prepared image datasets with proper folder structure
✅ Used ImageDataGenerator for automated data preprocessing
✅ Implemented transfer learning with MobileNetV2
✅ Built a custom neural network architecture
✅ Trained a deep learning model with validation
✅ Saved and loaded trained models
✅ Made predictions on new images
Key Concepts Mastered:
✅ What deep learning and neural networks are
✅ How CNNs process images differently than regular neural networks
✅ The power of transfer learning (reusing pre-trained models)
✅ Training vs validation data and why we split them
✅ Image normalization and preprocessing
✅ Binary classification with sigmoid activation
✅ Model compilation (optimizer, loss, metrics)
Real-World Performance:
- Accuracy: 80-97% (depending on your dataset quality)
- Training time: 3-5 minutes on free GPU
- Model size: ~9 MB (portable and shareable)
- Images needed: Only 1,000 (vs 100,000+ from scratch)
Troubleshooting Common Issues
Issue 1: “Found 3 classes instead of 2”
Cause: Hidden .ipynb_checkpoints folder in animals directory Fix: Run this in a code cell:
python
!rm -rf /content/animals/.ipynb_checkpoints
Issue 2: Low accuracy (below 70%)
Possible causes:
- Poor quality images (blurry, wrong labels, duplicates)
- Too few epochs (try 7-10 instead of 5)
- Dataset imbalance (e.g., 700 cats, 300 dogs)
Fix: Check your dataset quality manually
Issue 3: “ResourceExhausted” error
Cause: GPU ran out of memory Fix: Reduce batch size:
python
BATCH_SIZE = 16 # instead of 32
Issue 4: Model predicts same class for everything
Cause: Model didn’t learn properly (overfitting or underfitting) Fix:
- Check if images uploaded correctly
- Increase training epochs to 10
- Verify rescale=1./255 is applied
Issue 5: Upload button doesn’t appear
Cause: Code didn’t run completely Fix: Run the cell again and wait for “Choose Files” button
Next Steps: Level Up Your AI Skills
Beginner Challenges:
- Improve accuracy: Try training for 10 epochs instead of 5
- Add confidence scores: Print the exact probability (e.g., “Dog: 87% confident”)
- Test multiple images: Modify code to upload and predict 5 images at once
- Visualize training: Plot accuracy curves using matplotlib
Intermediate Challenges:
- Add data augmentation: Flip, rotate, zoom images during training for better generalization
- Try different architectures: Replace MobileNetV2 with ResNet50 or VGG16
- Multi-class classification: Add a third category (e.g., cats, dogs, birds)
- Fine-tune frozen layers: Unfreeze last 10 layers of MobileNetV2 for higher accuracy
Advanced Projects:
- Build a web app: Use Streamlit or Gradio to create an interface
- Deploy to mobile: Convert model to TensorFlow Lite for Android/iOS
- Real-time video classification: Process webcam feed frame-by-frame
- Create your own dataset: Scrape images from the web and build a custom classifier
Understanding Your Results Better
What does 91% accuracy actually mean?
- Out of 100 test images, model correctly identifies 91
- 9 images are misclassified (false positives/negatives)
- Is 91% good? Yes! Professional models are 95-98%, but they use:
- 100,000+ images
- Advanced architectures
- Days of training
Why validation accuracy fluctuates?
You might see:
Epoch 1: 91%
Epoch 2: 94%
Epoch 3: 92%
Reasons:
- Validation set is small (200 images) – random variation matters
- Some batches are naturally harder than others
- Model is still learning and adjusting
What to watch: The overall trend (should increase or stay stable)
When is your model actually good?
✅ Validation accuracy within 5% of training accuracy
✅ Both accuracies above 80%
✅ Model correctly predicts YOUR test images (not just training data)
The Science Behind Transfer Learning
Why MobileNetV2 works so well:
What it learned from ImageNet:
- Low-level features (Layer 1-20): Edges, corners, colors, textures
- Mid-level features (Layer 21-50): Shapes, patterns, object parts
- High-level features (Layer 51-88): Complex objects, scenes
Universal knowledge: These patterns are universal! Whether identifying cats, cars, or planes, you need to detect edges and shapes first.
Our specialization: We only train the final layer to combine these universal features specifically for cat vs dog detection.
Analogy:
- From scratch: Teaching someone to read, write, and then become a lawyer (10 years)
- Transfer learning: Hiring a college graduate and training them in law (2 years)
Real-World Applications of This Technique
Your cat vs dog classifier uses the same technology behind:
- Medical imaging: Detecting tumors in X-rays and MRIs
- Self-driving cars: Identifying pedestrians, cars, traffic signs
- Quality control: Spotting defective products in manufacturing
- Wildlife conservation: Counting endangered species from camera traps
- Agriculture: Detecting plant diseases from leaf photos
- Security: Facial recognition systems
- Retail: Visual search (“find similar products”)
The skill you learned is in-demand! Companies pay $80,000-$150,000/year for computer vision engineers.
Final Thoughts
You’ve completed a journey that would have seemed impossible just a few years ago. In 2012, training an image classifier required:
- PhD-level knowledge
- $10,000+ in hardware
- Weeks of training time
- 1,000,000+ images
Today, you did it with:
- Basic Python knowledge
- Free cloud resources
- 5 minutes of training
- 1,000 images
This is the democratization of AI in action.
The concepts you learned here – CNNs, transfer learning, data preprocessing, model training – are the foundation of modern computer vision. Whether you’re building a startup, pursuing a career in AI, or just exploring as a hobby, you now have real, practical skills.
Remember: Every expert was once a beginner. The model you built today might seem simple, but it’s using the same principles as models that:
- Diagnose cancer
- Power autonomous vehicles
- Translate languages in real-time
Keep building, keep learning, and most importantly – have fun with AI! 🚀