Deep Learning – Dogs vs Cats Project

Cat vs Dog Image Classifier Using Deep Learning

Concepts Covered:

built a real AI model that can look at any image and tell you whether it’s a cat or a dog:

What deep learning actually is (in plain English)
How to use Google Colab for free GPU power
How to train your first neural network
How to test your model with real images

Prerequisites: Basic Python knowledge (variables, loops, functions). That’s it!

Time Required: 25-35 minutes

Cost: $0 (completely free using Google Colab)

Before We Start: Understanding the Key Concepts

Before diving into code, let’s understand what we’re actually doing. Think of this section as your foundation – skip it, and you’ll be confused later!

What is Deep Learning?

Simple explanation: Deep learning is teaching a computer to recognize patterns by showing it thousands of examples.

Imagine teaching a child what a cat looks like. You don’t explain “cats have pointy ears and whiskers.” Instead, you show them 100 pictures of cats, and their brain learns the pattern automatically. Deep learning works the same way.

Traditional programming vs Deep Learning:

Traditional: You write rules (“if it has 4 legs AND pointy ears → cat”)
Deep learning: You show examples, the computer creates its own rules

What is a Neural Network?

A neural network is a computer system inspired by how your brain works. Your brain has neurons (brain cells) connected together. When you see a cat, neurons fire in sequence to recognize it.

An artificial neural network mimics this:

Input layer: Receives the image (pixel values)
Hidden layers: Processes patterns (edges, shapes, textures)
Output layer: Makes the final decision (cat or dog?)

What is a CNN (Convolutional Neural Network)?

CNNs are specialized neural networks for images. Here’s why regular neural networks struggle with images:

Problem: A 224×224 pixel color image has 150,528 numbers (224 × 224 × 3 colors). A regular neural network would need millions of connections – too complex!

Solution: CNNs use “convolution” – they scan the image in small patches (like reading a book word by word, not all at once). This dramatically reduces complexity.

What CNNs learn in each layer:

Layer 1: Simple edges and lines
Layer 2: Shapes and curves
Layer 3: Object parts (eyes, ears, fur texture)
Layer 4: Whole objects (entire cat face)

What is Transfer Learning? (The Secret Weapon!)

Here’s the breakthrough that makes this project possible:

The Problem: Training a CNN from scratch needs:

100,000+ images
Powerful computers
Days of training time
Expert knowledge

The Solution – Transfer Learning:

Think of it like this: You want to become a chef specializing in Italian food. Do you:

Option A: Learn cooking from zero (5 years)
Option B: Study with an expert chef who already knows cooking basics, then specialize in Italian cuisine (6 months)

Transfer learning is Option B for AI.

We’ll use MobileNetV2, a neural network that Google already trained on 14 million images from 1000 categories (cars, planes, animals, furniture, etc.). It already knows:

What edges look like
How to detect shapes
What fur, eyes, and ears are

We just teach it the final step: “This combination means cat, that means dog.”

Result: Instead of needing 100,000 images and days of training, we need just 1,000 images and 5 minutes!

Understanding the Modules We’ll Use

Let’s break down every library and why we need it:

1. TensorFlow & Keras

What it is: TensorFlow is Google’s deep learning framework. Keras is its user-friendly interface.
Why we need it: Builds and trains neural networks
Analogy: TensorFlow is the engine, Keras is the steering wheel

2. PIL (Python Imaging Library)

What it is: A library for opening, manipulating images
Why we need it: To load and display images
What it does: Converts image files into arrays of numbers that computers understand

3. Matplotlib

What it is: A plotting library (like Excel charts for Python)
Why we need it: To visualize images and results
What we’ll use it for: Displaying cat/dog images

4. ImageDataGenerator

What it is: A Keras tool that feeds images to the neural network
Why it’s powerful: Automatically handles batching, shuffling, and splitting data
Real benefit: You don’t manually write code to load 1000 images – it does it automatically!

5. MobileNetV2

What it is: A pre-trained CNN architecture by Google
Why this one: It’s small, fast, and accurate (perfect for beginners)
Alternative models: ResNet, VGG16, InceptionV3 (we can discuss these later)

6. NumPy

What it is: Python’s numerical computation library
Why we need it: Images are stored as arrays of numbers; NumPy handles array math
Example: An image is a 224×224×3 array (height × width × color channels)

Step-by-Step Tutorial: Building Your Cat vs Dog Classifier

Now let’s build this! Follow each command in order.

COMMAND 1: Download the Dataset

What we’re doing: Getting 1000 cat and dog images for training

Instructions:

Click this link: https://drive.google.com/drive/folders/1NfvqNLyvT2uBNBYB-9PS97w71RcUNsPB or https://www.kaggle.com/datasets/anthonytherrien/dog-vs-cat
Download the entire folder to your computer
You’ll get two folders: cats (500 images) and dogs (500 images)

Why 1000 images? More is better, but 500 per category is the minimum for decent accuracy with transfer learning.

COMMAND 2: Set Up Google Colab

What is Google Colab?

A free cloud-based Jupyter notebook
Gives you free access to GPUs (graphics cards that train AI 10x faster)
No installation needed – runs in your browser!

Steps:

Go to https://colab.research.google.com/
Click “New Notebook”
You’ll see a blank coding environment

Pro tip: Colab automatically disconnects after 90 minutes of inactivity. Don’t worry – your code is saved!

COMMAND 3: Upload Dataset to Colab

What we’re doing: Creating folders and uploading images

Steps:

Click the folder icon on the left sidebar (Files panel)
Right-click in the file area → New folder → Name it animals
Inside animals, create two folders: cats and dogs
Upload images:
- Click on the cats folder → Upload button → Select all cat images
- Click on the dogs folder → Upload button → Select all dog images

Wait for upload to complete! You’ll see a progress indicator. 1000 images take 2-5 minutes depending on your internet speed.

Why this folder structure?

animals/
├── cats/
│   ├── cat1.jpg
│   ├── cat2.jpg
│   └── ...
└── dogs/
    ├── dog1.jpg
    ├── dog2.jpg
    └── ...

ImageDataGenerator reads this structure automatically and knows:

Everything in cats/ folder → label = “cat”
Everything in dogs/ folder → label = “dog”

COMMAND 4: Verify Upload

What we’re doing: Checking that files uploaded correctly

Code to run:

!ls

What this means:

! tells Colab to run a terminal command (not Python)
ls = “list” (shows all files and folders)

Expected output:

animals  sample_data

Explanation: You should see animals folder. sample_data is a default Colab folder (ignore it).

COMMAND 5: Verify Folder Contents

Code to run:

!ls animals

Expected output:

cats  dogs

What this confirms: Both subfolders exist inside animals

COMMAND 6: Display a Sample Image (Sanity Check)

What we’re doing: Making sure images are readable and correctly uploaded

Code to run:

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("/content/animals/cats/00000-4122619873.png")
plt.imshow(img)
plt.axis("off")

Line-by-line breakdown:

Line 1-2: Import libraries

from PIL import Image → Imports the Image class from PIL library
import matplotlib.pyplot as plt → Imports plotting functions

Line 4: img = Image.open("/content/animals/cats/00000-4122619873.png")

Image.open() → Opens the image file
/content/ → Default Colab working directory
00000-4122619873.png → Replace with any filename from your cats folder

Line 5: plt.imshow(img)

imshow = “image show”
Displays the image

Line 6: plt.axis("off")

Hides the x and y axis numbers (makes it cleaner)

Expected output: You should see a cat image displayed!

Troubleshooting:

Error: “No such file” → Check your filename exactly matches
No image appears → Run plt.show() after the code

COMMAND 7: List Sample Files

What we’re doing: Viewing the first 10 files in each folder to verify variety

Code to run:

!ls animals/cats | head
!ls animals/dogs | head

What this means:

!ls animals/cats → List all files in cats folder
| head → Show only the first 10 (otherwise it’d show all 500!)

Expected output:

00000-4122619873.png
00001-2847563902.png
00002-1928374650.png
...
(10 files total)

Why this matters: Confirms you have multiple images, not just one test file.

COMMAND 8: Import ImageDataGenerator and Prepare Data

What we’re doing: Setting up the data pipeline that feeds images to our neural network

Code to run:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os

# Remove unwanted .ipynb_checkpoints directory if it exists
if os.path.exists("/content/animals/.ipynb_checkpoints"):
    !rm -rf "/content/animals/.ipynb_checkpoints"
    print("Removed .ipynb_checkpoints directory.")

IMG_SIZE = (224, 224)
BATCH_SIZE = 32

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="training"
)

val_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="validation"
)

DETAILED BREAKDOWN – READ CAREFULLY!

Lines 1-2: Import necessary modules

ImageDataGenerator → The tool that handles image loading
os → Operating system module (to check if folders exist)

Lines 4-7: Clean up hidden files

Jupyter creates hidden .ipynb_checkpoints folders that confuse ImageDataGenerator
os.path.exists() → Checks if the folder exists
!rm -rf → Removes the folder forcefully (if found)
This prevents the error: “Found 3 classes instead of 2”

Line 9: IMG_SIZE = (224, 224)

All images must be the same size for neural networks
224×224 pixels is the standard for MobileNetV2
Your original images might be 800×600, 1920×1080, etc. – they’ll auto-resize

Line 10: BATCH_SIZE = 32

What is a batch? Instead of feeding 1 image at a time, we feed 32 together
Why? GPUs process multiple images in parallel (faster training)
Analogy: Washing dishes one-by-one vs loading the dishwasher

Lines 12-15: Create the ImageDataGenerator

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

Parameter explanations:

rescale=1./255 – SUPER IMPORTANT!

Images are stored as pixels with values 0-255 (black to white)
Neural networks work best with values 0-1 (normalized)
1./255 divides every pixel by 255 → converts 0-255 to 0-1
Example: Pixel value 127 (gray) becomes 127/255 = 0.498

validation_split=0.2

Splits data into 80% training, 20% validation
Training data: Used to teach the model
Validation data: Used to test if it learned correctly
Why split? Prevents “memorization” – we want the model to generalize, not just remember training images

Lines 17-23: Create training data generator

train_data = datagen.flow_from_directory(
    "/content/animals",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="binary",
    subset="training"
)

What flow_from_directory does: Automatically:

Scans the /content/animals folder
Finds subfolders (cats, dogs)
Labels images based on folder name
Resizes all images to 224×224
Creates batches of 32 images
Shuffles images randomly (prevents learning order bias)

Parameter breakdown:

"/content/animals" → Path to parent folder containing class subfolders

target_size=IMG_SIZE → Resize all images to (224, 224)

batch_size=BATCH_SIZE → Load 32 images at a time

class_mode="binary" → We have 2 classes (binary classification)

Binary = Cat (0) or Dog (1)
Alternative: categorical for 3+ classes (cat, dog, bird)

subset="training" → Use the 80% split for training

Lines 25-31: Create validation data generator

Same code, but subset="validation" uses the 20% split
This is the “test” data we use to check accuracy

Expected output:

Found 800 images belonging to 2 classes.
Found 200 images belonging to 2 classes.

What this means:

800 images in training set (80% of 1000)
200 images in validation set (20% of 1000)
2 classes detected: cats and dogs

Common error fix: If it says “Found 3 classes” – the .ipynb_checkpoints cleanup didn’t work. Manually delete that folder from the file panel.

COMMAND 9: Import Deep Learning Libraries

Code to run:

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models

What we’re importing:

MobileNetV2 → The pre-trained model

Already trained on 14 million images
Knows 1000 categories (dogs, cats, cars, planes, etc.)
We’ll fine-tune it for our specific task

layers → Building blocks of neural networks

Dense → Fully connected layer
GlobalAveragePooling2D → Compresses image features

models → Framework to combine layers

Sequential → Stack layers in sequence (like Lego blocks)

Expected output: Nothing! If no error appears, the import worked.

COMMAND 10: Load Pre-trained MobileNetV2 Model

What we’re doing: Loading Google’s pre-trained model as our foundation

Code to run:

base_model = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights="imagenet"
)

base_model.trainable = False

LINE-BY-LINE BREAKDOWN:

Line 1: base_model = MobileNetV2(...)

Creates the MobileNetV2 neural network
Stores it in the variable base_model

Parameter: input_shape=(224, 224, 3)

Tells the model to expect 224×224 pixel images
3 = RGB color channels (Red, Green, Blue)
Why these numbers? MobileNetV2 was designed for this size

Parameter: include_top=False – CRITICAL!

“Top” = The final classification layer
The original model classifies 1000 categories
We remove it because we only need 2 categories (cat vs dog)
Analogy: Using a Swiss Army knife but removing the corkscrew (we don’t need it)

Parameter: weights="imagenet"

Loads the pre-trained weights (learned patterns)
ImageNet = Dataset of 14 million images
Without this: Random starting point (would need 100k images to train)
With this: Starts with expert knowledge (only needs 1k images to fine-tune)

Line 6: base_model.trainable = False

Freezes the pre-trained layers
Means: “Don’t change what you already learned”
Why? MobileNetV2 already knows edges, shapes, textures – we keep that knowledge
We’ll only train the new layers we add next

What happens behind the scenes:

Downloads MobileNetV2 weights (~14 MB file)
Loads 88 layers of neural network
Sets all 88 layers to “frozen” mode

COMMAND 11: Add Custom Classification Layers

What we’re doing: Adding our own “brain” on top of MobileNetV2 to make cat vs dog decisions

Code to run:

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(1, activation="sigmoid")
])

LAYER-BY-LAYER EXPLANATION:

Line 1: model = models.Sequential([...])

Sequential = Stack layers in order (like a pipeline)
Data flows: Input → Layer 1 → Layer 2 → Layer 3 → Output

Layer 1: base_model

The frozen MobileNetV2 (88 layers)
Output: A 7×7×1280 tensor (compressed image features)
Think of it as: “Here are 1280 features I detected in this image”

Layer 2: layers.GlobalAveragePooling2D()

What it does: Compresses the 7×7×1280 tensor into a 1×1280 vector
How: Takes the average of each 7×7 grid
Why: Neural networks need fixed-size input; this standardizes it
Analogy: Reading a book chapter and writing 1280 key points

Layer 3: layers.Dense(1, activation="sigmoid")

Dense = Fully connected layer (every input connects to every output)
1 = One output neuron
Why 1? Binary classification needs 1 output:
- Output close to 0 → Cat
- Output close to 1 → Dog
activation="sigmoid" → Squashes output to 0-1 range (probability)

The sigmoid function:

Input: any number (-∞ to +∞)
Output: number between 0 and 1
Example: sigmoid(-5) = 0.007 (almost 0 → cat)
         sigmoid(+5) = 0.993 (almost 1 → dog)

Full data flow:

Input image (224×224×3) 
    ↓
MobileNetV2 (extracts 1280 features)
    ↓
GlobalAveragePooling (compresses to 1280 numbers)
    ↓
Dense layer (makes decision: 0-1)
    ↓
Output: 0.12 → Cat! (88% confident)
        0.87 → Dog! (87% confident)

COMMAND 12: Compile the Model

What we’re doing: Configuring HOW the model will learn

Code to run:

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

PARAMETER EXPLANATIONS:

optimizer="adam"

What’s an optimizer? The algorithm that adjusts the neural network’s weights
Why Adam? It’s the most popular – smart, fast, works well for most cases
Alternatives: SGD (slower but sometimes better), RMSprop
Analogy: Adam is like cruise control – automatically adjusts speed for optimal performance

Technical detail (optional): Adam stands for “Adaptive Moment Estimation” – it adapts the learning rate for each parameter automatically.

loss="binary_crossentropy"

What’s a loss function? Measures how “wrong” the model’s predictions are
Binary crossentropy = Standard loss for 2-class classification
How it works:
- Prediction: 0.9 (dog), Actual: dog → Low loss (good!)
- Prediction: 0.1 (cat), Actual: dog → High loss (bad!)
Goal: Training minimizes this loss

Why “crossentropy”? It comes from information theory – measures the difference between two probability distributions.

metrics=["accuracy"]

Tracks accuracy during training
Accuracy = Percentage of correct predictions
Example: 80/100 correct = 80% accuracy
This is just for monitoring – doesn’t affect training

What compilation does:

Prepares the computational graph
Allocates memory on GPU
Sets up the optimization algorithm
Ready to train!

COMMAND 13: Train the Model (The Magic Happens!)

What we’re doing: Actually teaching the model to recognize cats vs dogs

Code to run:

history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=5
)

PARAMETER BREAKDOWN:

train_data

The training images (800 images, 80% of dataset)
Model learns from these

validation_data=val_data

The validation images (200 images, 20% of dataset)
Tests the model after each epoch
Crucial: Detects if model is just memorizing vs actually learning

epochs=5

What’s an epoch? One complete pass through all training images
5 epochs means: The model sees all 800 images 5 times
Why 5? Balance between:
- Too few (2-3) → Underfitting (doesn’t learn enough)
- Too many (20+) → Overfitting (memorizes instead of learning)

What happens during training:

Epoch 1:

Shows 800 images to the model (in batches of 32)
Model makes predictions
Calculates loss (how wrong it was)
Adjusts weights to reduce loss
Tests on 200 validation images
Reports accuracy

Epoch 2-5: Repeats the process

Expected output:

Epoch 1/5
25/25 [==============================] - 45s 2s/step - loss: 0.4521 - accuracy: 0.7875 - val_loss: 0.2134 - val_accuracy: 0.9150
Epoch 2/5
25/25 [==============================] - 42s 2s/step - loss: 0.2108 - accuracy: 0.9125 - val_loss: 0.1456 - val_accuracy: 0.9450
Epoch 3/5
25/25 [==============================] - 41s 2s/step - loss: 0.1523 - accuracy: 0.9375 - val_loss: 0.1123 - val_accuracy: 0.9550
Epoch 4/5
25/25 [==============================] - 40s 2s/step - loss: 0.1234 - accuracy: 0.9500 - val_loss: 0.0987 - val_accuracy: 0.9650
Epoch 5/5
25/25 [==============================] - 39s 2s/step - loss: 0.1087 - accuracy: 0.9587 - val_loss: 0.0892 - val_accuracy: 0.9700

UNDERSTANDING THE OUTPUT:

25/25 → Total batches (800 images ÷ 32 per batch = 25)

45s 2s/step → Total time and time per batch

loss: 0.4521 → Training loss (decreases each epoch = good!)

accuracy: 0.7875 → Training accuracy (78.75%)

val_loss: 0.2134 → Validation loss

val_accuracy: 0.9150 → Validation accuracy (91.5%) ← THIS IS KEY!

WHAT TO LOOK FOR:

✅ Good signs:

Accuracy increasing each epoch
Loss decreasing each epoch
Validation accuracy close to training accuracy (within 5-10%)

⚠️ Warning signs:

Overfitting: Training accuracy 98%, validation 70% (model memorized training data)
Underfitting: Both accuracies stuck at 60% (model isn’t learning)

Why validation accuracy > training accuracy here?

We froze MobileNetV2 layers (they don’t change)
Only training 1 layer (the final Dense layer)
Sometimes validation gets lucky with easier examples

Training time: 3-4 minutes on Colab’s free GPU (vs 30+ minutes on CPU)

COMMAND 14: View Model Architecture

What we’re doing: Seeing the complete structure of our neural network

Code to run:

model.summary()

Expected output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
mobilenetv2_1.00_224 (Functional) (None, 7, 7, 1280)   2,257,984 
_________________________________________________________________
global_average_pooling2d (GlobalAveragePooling2D) (None, 1280) 0         
_________________________________________________________________
dense (Dense)                (None, 1)                 1,281     
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________

WHAT THIS MEANS:

Layer 1: MobileNetV2

Output Shape:(None, 7, 7, 1280)
- None = batch size (variable)
- 7×7 = spatial dimensions (compressed from 224×224)
- 1280 = feature maps (detectors for different patterns)
Params: 2,257,984 parameters (weights)
Status: Non-trainable (frozen)

Layer 2: GlobalAveragePooling2D

Output Shape: (None, 1280) – flattened to a vector
Params: 0 (it’s just averaging, no learnable weights)

Layer 3: Dense

Output Shape: (None, 1) – single output (cat vs dog)
Params: 1,281
- Why 1,281? (1280 inputs × 1 output) + 1 bias = 1,281

Key insight:

Total params: 2,259,265
Trainable: Only 1,281 (0.05%!)
Non-trainable: 2,257,984 (frozen MobileNetV2)

This is the power of transfer learning – we only train 0.05% of the network but get 95%+ accuracy!

COMMAND 15: Save the Model (.h5 format)

What we’re doing: Saving your trained model so you don’t lose it when Colab disconnects

Code to run:

model.save("/content/dog_cat_model.h5")

What happens:

Creates a file dog_cat_model.h5 (about 9 MB)
Contains the entire model (architecture + weights)
.h5 = HDF5 format (older Keras format)

You’ll see this warning:

WARNING:absl:You are saving your model as an HDF5 file via `model.save()`. 
This file format is considered legacy. We recommend using instead the native 
Keras format, e.g. `model.save('my_model.keras')`.

What this means: .h5 still works but is outdated. Let’s use the new format!

COMMAND 16: Save the Model (.keras format – RECOMMENDED)

Code to run:

model.save("/content/dog_cat_model.keras")

Why .keras is better:

Modern format (future-proof)
Faster loading
Better compression
Official Keras recommendation

To download the model:

Click the folder icon (left sidebar)
Find dog_cat_model.keras
Click the 3 dots → Download
Save it on your computer!

Now you can:

Share your model with friends
Load it later without retraining
Use it in a web app or mobile app

COMMAND 17: Create Upload Button for Testing

What we’re doing: Adding a button to upload test images from your computer

Code to run:

from google.colab import files

uploaded = files.upload()

What this does:

Creates a “Choose Files” button
Click it and select any cat or dog image from your computer
Uploads the image to Colab
Stores the filename in the uploaded variable

Expected behavior:

You’ll see a file picker dialog
Select an image (JPG, PNG, etc.)
Wait for upload (shows progress bar)
When done, the cell completes

Pro tip: Test with images NOT from your training set! Use:

Photos from Google Images
Your own pet photos
Random internet images

Why? This tests if the model generalizes (works on new, unseen images).

COMMAND 18: Make Predictions on Your Test Image

What we’re doing: The moment of truth – testing if your AI actually works!

Code to run:

python

import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt

IMG_SIZE = (224, 224)

img_path = list(uploaded.keys())[0]   # automatically gets file name

img = image.load_img(img_path, target_size=IMG_SIZE)
img_array = image.img_to_array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)

prediction = model.predict(img_array)

plt.imshow(img)
plt.axis("off")

if prediction[0][0] > 0.5:
    print("Prediction: Dog 🐶")
else:
    print("Prediction: Cat 🐱")

COMPLETE LINE-BY-LINE BREAKDOWN:

Lines 1-3: Import necessary libraries

python

import numpy as np
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt

numpy → For array operations
image → Keras image utilities for loading/processing
matplotlib → For displaying the image

Line 5: IMG_SIZE = (224, 224)

Same size we used for training
Critical: Model expects 224×224 images

Line 7: img_path = list(uploaded.keys())[0]

uploaded is a dictionary: {'filename.jpg': file_data}
.keys() gets the filename
[0] gets the first (and only) uploaded file
Result: Stores the filename string (e.g., “my_dog.jpg”)

Line 9: img = image.load_img(img_path, target_size=IMG_SIZE)

Loads the image from the file
target_size=IMG_SIZE → Automatically resizes to 224×224
Why resize? Your image might be 4000×3000 or 800×600 – model needs exactly 224×224

Line 10: img_array = image.img_to_array(img) / 255.0

image.img_to_array(img) → Converts image to NumPy array
- Result: Shape (224, 224, 3) – height × width × RGB channels
- Values: 0-255 (raw pixel values)
/ 255.0 → Normalizes to 0-1 range
- Why? Remember, we trained with normalized images (rescale=1./255)
- Must match training preprocessing!

Line 11: img_array = np.expand_dims(img_array, axis=0)

Critical transformation!
Before: Shape (224, 224, 3) – single image
After: Shape (1, 224, 224, 3) – batch of 1 image
Why? Model expects batches, even if batch size = 1
axis=0 adds a new dimension at the front

Visual representation:

Original: [[[R, G, B], [R, G, B], ...]]  ← 2D grid of pixels
After:    [[[[R, G, B], [R, G, B], ...]]] ← Batch containing 1 image

Line 13: prediction = model.predict(img_array)

THE PREDICTION HAPPENS HERE!
Model processes the image through all layers
Returns: Array of probabilities
Shape: [[0.87]] (2D array with one value)

How prediction works internally:

Image goes through MobileNetV2 (extracts 1280 features)
GlobalAveragePooling compresses features
Dense layer with sigmoid outputs probability
Output: Number between 0 and 1

Line 15-16: Display the image

python

plt.imshow(img)
plt.axis("off")

Shows the uploaded image
Turns off axis numbers for cleaner display

Lines 18-21: Interpret the prediction

python

if prediction[0][0] > 0.5:
    print("Prediction: Dog 🐶")
else:
    print("Prediction: Cat 🐱")

Understanding prediction[0][0]:

prediction = [[0.87]] (2D array)
prediction[0] = [0.87] (first element of outer array)
prediction[0][0] = 0.87 (actual probability value)

Decision logic:

If > 0.5: More likely a dog (closer to 1)
If < 0.5: More likely a cat (closer to 0)
Threshold 0.5 is standard for binary classification

Example outputs:

prediction[0][0] = 0.92 → "Prediction: Dog 🐶" (92% confident)
prediction[0][0] = 0.13 → "Prediction: Cat 🐱" (87% confident it's NOT a dog)
prediction[0][0] = 0.51 → "Prediction: Dog 🐶" (barely, only 51% confident)

Expected full output:

Image of your uploaded photo
Text: “Prediction: Dog 🐶” or “Prediction: Cat 🐱”

Complete Command Summary with Correct Order

Here’s the complete numbered sequence of all commands:

SETUP PHASE:

Download dataset from Google Drive link
Open Google Colab and create new notebook
Create folder structure and upload images to Colab

VERIFICATION PHASE: 4. !ls – Verify main folders 5. !ls animals – Verify subfolder structure 6. Display sample image with PIL and matplotlib 7. !ls animals/cats | head and !ls animals/dogs | head – List sample files

DATA PREPARATION PHASE: 8. Import ImageDataGenerator and prepare train/validation data splits 9. Import TensorFlow/Keras libraries (MobileNetV2, layers, models)

MODEL BUILDING PHASE: 10. Load pre-trained MobileNetV2 base model 11. Add custom classification layers on top 12. Compile the model with optimizer, loss, and metrics

TRAINING PHASE: 13. Train the model with model.fit() 14. View model architecture with model.summary()

SAVING PHASE: 15. Save model as .h5 file (legacy format) 16. Save model as .keras file (recommended format)

TESTING PHASE: 17. Create upload button for test images 18. Make predictions on uploaded images

What You’ve Accomplished! 🎉

Congratulations! You’ve just built a real AI model from scratch. Let’s recap what you’ve learned:

Technical Skills Gained:

✅ Set up a cloud-based GPU environment (Google Colab)
✅ Prepared image datasets with proper folder structure
✅ Used ImageDataGenerator for automated data preprocessing
✅ Implemented transfer learning with MobileNetV2
✅ Built a custom neural network architecture
✅ Trained a deep learning model with validation
✅ Saved and loaded trained models
✅ Made predictions on new images

Key Concepts Mastered:

✅ What deep learning and neural networks are
✅ How CNNs process images differently than regular neural networks
✅ The power of transfer learning (reusing pre-trained models)
✅ Training vs validation data and why we split them
✅ Image normalization and preprocessing
✅ Binary classification with sigmoid activation
✅ Model compilation (optimizer, loss, metrics)

Real-World Performance:

Accuracy: 80-97% (depending on your dataset quality)
Training time: 3-5 minutes on free GPU
Model size: ~9 MB (portable and shareable)
Images needed: Only 1,000 (vs 100,000+ from scratch)

Troubleshooting Common Issues

Issue 1: “Found 3 classes instead of 2”

Cause: Hidden .ipynb_checkpoints folder in animals directory Fix: Run this in a code cell:

python

!rm -rf /content/animals/.ipynb_checkpoints

Issue 2: Low accuracy (below 70%)

Possible causes:

Poor quality images (blurry, wrong labels, duplicates)
Too few epochs (try 7-10 instead of 5)
Dataset imbalance (e.g., 700 cats, 300 dogs)

Fix: Check your dataset quality manually

Issue 3: “ResourceExhausted” error

Cause: GPU ran out of memory Fix: Reduce batch size:

python

BATCH_SIZE = 16  # instead of 32

Issue 4: Model predicts same class for everything

Cause: Model didn’t learn properly (overfitting or underfitting) Fix:

Check if images uploaded correctly
Increase training epochs to 10
Verify rescale=1./255 is applied

Issue 5: Upload button doesn’t appear

Cause: Code didn’t run completely Fix: Run the cell again and wait for “Choose Files” button

Next Steps: Level Up Your AI Skills

Beginner Challenges:

Improve accuracy: Try training for 10 epochs instead of 5
Add confidence scores: Print the exact probability (e.g., “Dog: 87% confident”)
Test multiple images: Modify code to upload and predict 5 images at once
Visualize training: Plot accuracy curves using matplotlib

Intermediate Challenges:

Add data augmentation: Flip, rotate, zoom images during training for better generalization
Try different architectures: Replace MobileNetV2 with ResNet50 or VGG16
Multi-class classification: Add a third category (e.g., cats, dogs, birds)
Fine-tune frozen layers: Unfreeze last 10 layers of MobileNetV2 for higher accuracy

Advanced Projects:

Build a web app: Use Streamlit or Gradio to create an interface
Deploy to mobile: Convert model to TensorFlow Lite for Android/iOS
Real-time video classification: Process webcam feed frame-by-frame
Create your own dataset: Scrape images from the web and build a custom classifier

Understanding Your Results Better

What does 91% accuracy actually mean?

Out of 100 test images, model correctly identifies 91
9 images are misclassified (false positives/negatives)
Is 91% good? Yes! Professional models are 95-98%, but they use:
- 100,000+ images
- Advanced architectures
- Days of training

Why validation accuracy fluctuates?

You might see:

Epoch 1: 91%
Epoch 2: 94%
Epoch 3: 92%

Reasons:

Validation set is small (200 images) – random variation matters
Some batches are naturally harder than others
Model is still learning and adjusting

What to watch: The overall trend (should increase or stay stable)

When is your model actually good?

✅ Validation accuracy within 5% of training accuracy
✅ Both accuracies above 80%
✅ Model correctly predicts YOUR test images (not just training data)

The Science Behind Transfer Learning

Why MobileNetV2 works so well:

What it learned from ImageNet:

Low-level features (Layer 1-20): Edges, corners, colors, textures
Mid-level features (Layer 21-50): Shapes, patterns, object parts
High-level features (Layer 51-88): Complex objects, scenes

Universal knowledge: These patterns are universal! Whether identifying cats, cars, or planes, you need to detect edges and shapes first.

Our specialization: We only train the final layer to combine these universal features specifically for cat vs dog detection.

Analogy:

From scratch: Teaching someone to read, write, and then become a lawyer (10 years)
Transfer learning: Hiring a college graduate and training them in law (2 years)

Real-World Applications of This Technique

Your cat vs dog classifier uses the same technology behind:

Medical imaging: Detecting tumors in X-rays and MRIs
Self-driving cars: Identifying pedestrians, cars, traffic signs
Quality control: Spotting defective products in manufacturing
Wildlife conservation: Counting endangered species from camera traps
Agriculture: Detecting plant diseases from leaf photos
Security: Facial recognition systems
Retail: Visual search (“find similar products”)

The skill you learned is in-demand! Companies pay $80,000-$150,000/year for computer vision engineers.

Final Thoughts

You’ve completed a journey that would have seemed impossible just a few years ago. In 2012, training an image classifier required:

PhD-level knowledge
$10,000+ in hardware
Weeks of training time
1,000,000+ images

Today, you did it with:

Basic Python knowledge
Free cloud resources
5 minutes of training
1,000 images

This is the democratization of AI in action.

The concepts you learned here – CNNs, transfer learning, data preprocessing, model training – are the foundation of modern computer vision. Whether you’re building a startup, pursuing a career in AI, or just exploring as a hobby, you now have real, practical skills.

Remember: Every expert was once a beginner. The model you built today might seem simple, but it’s using the same principles as models that:

Diagnose cancer
Power autonomous vehicles
Translate languages in real-time

Keep building, keep learning, and most importantly – have fun with AI! 🚀

Cat vs Dog Image Classifier Using Deep Learning

Concepts Covered:

Before We Start: Understanding the Key Concepts

What is Deep Learning?

What is a Neural Network?

What is a CNN (Convolutional Neural Network)?

What is Transfer Learning? (The Secret Weapon!)

Understanding the Modules We’ll Use

1. TensorFlow & Keras

2. PIL (Python Imaging Library)

3. Matplotlib

4. ImageDataGenerator

5. MobileNetV2

6. NumPy

Step-by-Step Tutorial: Building Your Cat vs Dog Classifier

COMMAND 1: Download the Dataset

COMMAND 2: Set Up Google Colab

COMMAND 3: Upload Dataset to Colab

COMMAND 4: Verify Upload

COMMAND 5: Verify Folder Contents

COMMAND 6: Display a Sample Image (Sanity Check)

COMMAND 7: List Sample Files

COMMAND 8: Import ImageDataGenerator and Prepare Data

DETAILED BREAKDOWN – READ CAREFULLY!

COMMAND 9: Import Deep Learning Libraries

COMMAND 10: Load Pre-trained MobileNetV2 Model

LINE-BY-LINE BREAKDOWN:

COMMAND 11: Add Custom Classification Layers

LAYER-BY-LAYER EXPLANATION:

COMMAND 12: Compile the Model

PARAMETER EXPLANATIONS:

COMMAND 13: Train the Model (The Magic Happens!)

PARAMETER BREAKDOWN:

UNDERSTANDING THE OUTPUT:

WHAT TO LOOK FOR:

COMMAND 14: View Model Architecture

WHAT THIS MEANS:

COMMAND 15: Save the Model (.h5 format)

COMMAND 16: Save the Model (.keras format – RECOMMENDED)

COMMAND 17: Create Upload Button for Testing

COMMAND 18: Make Predictions on Your Test Image

COMPLETE LINE-BY-LINE BREAKDOWN:

Complete Command Summary with Correct Order

What You’ve Accomplished! 🎉

Technical Skills Gained:

Key Concepts Mastered:

Real-World Performance:

Troubleshooting Common Issues

Issue 1: “Found 3 classes instead of 2”

Issue 2: Low accuracy (below 70%)

Issue 3: “ResourceExhausted” error

Issue 4: Model predicts same class for everything

Issue 5: Upload button doesn’t appear

Next Steps: Level Up Your AI Skills

Beginner Challenges:

Intermediate Challenges:

Advanced Projects:

Understanding Your Results Better

What does 91% accuracy actually mean?

Why validation accuracy fluctuates?

When is your model actually good?

The Science Behind Transfer Learning

Why MobileNetV2 works so well:

Real-World Applications of This Technique

Final Thoughts

Related Posts

Tally Prime

Google Ads

Student Registration System – Spring Boot Project Guide

Leave a ReplyCancel Reply